Difference between revisions of "Genome masking"

From CoGepedia
Jump to: navigation, search
Line 9: Line 9:
 
** CoGe's visualizations often use purple to represent genome sequence "X" to make the identification of masked reasons easy
 
** CoGe's visualizations often use purple to represent genome sequence "X" to make the identification of masked reasons easy
  
Note:  We will (eventually) have an option in CoGe that allows people to mask their own genomes using this pipeline.  Until that is available, please [[mailto:coge.genome@genome.com email the coge team]] with the [[genome id]] of the genome you would like masked.
+
Note:  We will (eventually) have an option in CoGe that allows people to mask their own genomes using this pipeline.  Until that is available, please [mailto:coge.genome@genome.com email the coge team] with the [[genome id]] of the genome you would like masked.

Revision as of 10:46, 7 October 2013

CoGe masks genomes by using NCBI's windowmasker: http://www.ncbi.nlm.nih.gov/pubmed/16287941

Currently, our pipeline for masking is:

  • Generate frequency counts for the masking database for the genome
    • windowmasker -in <genome> -mk_counts -out <counts>
  • Mask genome
    • windowmasker -in <genome> -ustat <counts> -outfmt fasta -dust T -out <masked genome>
  • Convert soft masking (lower case sequence data) to "X".
    • CoGe's visualizations often use purple to represent genome sequence "X" to make the identification of masked reasons easy

Note: We will (eventually) have an option in CoGe that allows people to mask their own genomes using this pipeline. Until that is available, please email the coge team with the genome id of the genome you would like masked.