Extra Annotations

From CoGepedia
Revision as of 12:23, 15 December 2011 by Jschnable (Talk | contribs) (Conserved Noncoding Sequence Data)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Most genomes within CoGe are annotated only with publicly available gene models. However, the Freeling Lab has also decorated a number of genomes with additional information. For reference the specific versions with extra annotation information are listed here.

Conserved Noncoding Sequence Data

Conserved noncoding sequences are identified by comparing the non-exon regions surrounding orthologous genes in two species or homeologous genes within a single species. Most CNS datasets in CoGe were generated using the CNS Discovery Pipeline.


Species Data set group ID Data set ID Comparison Method Full Name
Arabidopsis 3 39598 Arabidopsis (homeologs) Manual Annotations and the CNS Discovery Pipeline Arabidopsis thaliana Col-0 (thale cress) (with CNS) masked repeats 50
Peach 8400 42478 Chocolate (orthologs) CNS Discovery Pipeline Prunus persica (peach) (with CNS) unmasked
Chocolate 10997 46486 Peach (orthologs) CNS Discovery Pipeline Theobroma cacao (chocolate) Belizian Criollo genotype (B97-61/B2) (with CNS)
Rice 11822 47668 Sorghum (orthologs) CNS Discovery Pipeline Oryza sativa japonica (Rice) (with CNS) masked repeats 50x
Sorghum 11821 47667 Rice (orthologs) CNS Discovery Pipeline Sorghum bicolor (with CNS) masked repeats 50x