Difference between revisions of "Genome masking"
From CoGepedia
Line 9: | Line 9: | ||
** CoGe's visualizations often use purple to represent genome sequence "X" to make the identification of masked reasons easy | ** CoGe's visualizations often use purple to represent genome sequence "X" to make the identification of masked reasons easy | ||
− | Note: We will (eventually) have an option in CoGe that allows people to mask their own genomes using this pipeline. Until that is available, please [[mailto: coge.genome@genome.com email the coge team]] with the [[genome id]] of the genome you would like masked. | + | Note: We will (eventually) have an option in CoGe that allows people to mask their own genomes using this pipeline. Until that is available, please [[mailto:coge.genome@genome.com email the coge team]] with the [[genome id]] of the genome you would like masked. |
Revision as of 10:45, 7 October 2013
CoGe masks genomes by using NCBI's windowmasker: http://www.ncbi.nlm.nih.gov/pubmed/16287941
Currently, our pipeline for masking is:
- Generate frequency counts for the masking database for the genome
- windowmasker -in <genome> -mk_counts -out <counts>
- Mask genome
- windowmasker -in <genome> -ustat <counts> -outfmt fasta -dust T -out <masked genome>
- Convert soft masking (lower case sequence data) to "X".
- CoGe's visualizations often use purple to represent genome sequence "X" to make the identification of masked reasons easy
Note: We will (eventually) have an option in CoGe that allows people to mask their own genomes using this pipeline. Until that is available, please [email the coge team] with the genome id of the genome you would like masked.