Difference between revisions of "Genome masking"

From CoGepedia
Jump to: navigation, search
(Created page with 'CoGe masks genomes by using NCBI's windowmasker: http://www.ncbi.nlm.nih.gov/pubmed/16287941 Currently, our pipeline for masking is: * Generate frequency counts for the masking ...')
 
Line 8: Line 8:
 
*Convert soft masking (lower case sequence data) to "X".
 
*Convert soft masking (lower case sequence data) to "X".
 
** CoGe's visualizations often use purple to represent genome sequence "X" to make the identification of masked reasons easy
 
** CoGe's visualizations often use purple to represent genome sequence "X" to make the identification of masked reasons easy
 +
 +
Note:  We will (eventually) have an option in CoGe that allows people to mask their own genomes using this pipeline.  Until that is available, please [mailto: coge.genome@genome.com email the coge team] with the [[genome id]] of the genome you would like masked.

Revision as of 10:44, 7 October 2013

CoGe masks genomes by using NCBI's windowmasker: http://www.ncbi.nlm.nih.gov/pubmed/16287941

Currently, our pipeline for masking is:

  • Generate frequency counts for the masking database for the genome
    • windowmasker -in <genome> -mk_counts -out <counts>
  • Mask genome
    • windowmasker -in <genome> -ustat <counts> -outfmt fasta -dust T -out <masked genome>
  • Convert soft masking (lower case sequence data) to "X".
    • CoGe's visualizations often use purple to represent genome sequence "X" to make the identification of masked reasons easy

Note: We will (eventually) have an option in CoGe that allows people to mask their own genomes using this pipeline. Until that is available, please [mailto: coge.genome@genome.com email the coge team] with the genome id of the genome you would like masked.