Genome masking

From CoGepedia
Revision as of 10:22, 1 February 2016 by Elyons (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

CoGe masks genomes by using NCBI's windowmasker: http://www.ncbi.nlm.nih.gov/pubmed/16287941

Currently, our pipeline for masking is:

  • Generate frequency counts for the masking database for the genome
    • windowmasker -in <genome> -mk_counts -out <counts>
  • Mask genome
    • windowmasker -in <genome> -ustat <counts> -outfmt fasta -dust T -out <masked genome>
  • Convert soft masking (lower case sequence data) to "X".
    • CoGe's visualizations often use purple to represent genome sequence "X" to make the identification of masked reasons easy

For users that are logged in, CoGe provides an option to mask a genome through GenomeInfo

Screen Shot 2016-02-01 at 9.20.58 AM.png