Masked: Difference between revisions

From CoGepedia
Jump to navigation Jump to search
No edit summary
No edit summary
Line 3: Line 3:
Masking sequences come in two general flavors
Masking sequences come in two general flavors


''Hard mask'':  Masked sequence is converted to "X"
'''Hard mask''':  Masked sequence is converted to "X"


''Soft mask'': Masked sequence is converted to lower-case ATCG
'''Soft mask''': Masked sequence is converted to lower-case ATCG


For a popular repeat sequence identification program see: [http://www.repeatmasker.org/ RepeatMasker].  CoGe will masked genomes using RepeatMasker if a masked genome is not already present, but is requested by someone.
For a popular repeat sequence identification program see: [http://www.repeatmasker.org/ RepeatMasker].  CoGe will masked genomes using RepeatMasker if a masked genome is not already present, but is requested by someone.

Revision as of 21:58, 19 November 2011

Masked genomes/sequence refer to genomic sequence that has been scanned for some type of internal sequence and then has those sequences converted to "X". Usually, repeat sequences are identified and masked as these cause sequence comparison algorithms to spend a lot of time identifying and matching these sequences. It is recommend to use repeat masked genomes in CoGe when given an opportunity for a whole genome comparisons (e.g. in SynMap)

Masking sequences come in two general flavors

Hard mask: Masked sequence is converted to "X"

Soft mask: Masked sequence is converted to lower-case ATCG

For a popular repeat sequence identification program see: RepeatMasker. CoGe will masked genomes using RepeatMasker if a masked genome is not already present, but is requested by someone.