Genome masking

From CoGepedia
Revision as of 10:46, 7 October 2013 by Elyons (Talk | contribs)

Jump to: navigation, search

CoGe masks genomes by using NCBI's windowmasker: http://www.ncbi.nlm.nih.gov/pubmed/16287941

Currently, our pipeline for masking is:

  • Generate frequency counts for the masking database for the genome
    • windowmasker -in <genome> -mk_counts -out <counts>
  • Mask genome
    • windowmasker -in <genome> -ustat <counts> -outfmt fasta -dust T -out <masked genome>
  • Convert soft masking (lower case sequence data) to "X".
    • CoGe's visualizations often use purple to represent genome sequence "X" to make the identification of masked reasons easy

Note: We will (eventually) have an option in CoGe that allows people to mask their own genomes using this pipeline. Until that is available, please email the coge team with the genome id of the genome you would like masked.