Syntenic gene sets
Syntenic gene sets are sets of genes located in syntenic regions (genomic regions derived from the same ancestral genomic region) within and across genomes. These genes are usually in a collinear gene order and are used as evidence that the regions are syntenic. These data are made available for download for use in any publication. However, we ask that you reference CoGe or the publication from which the data was derived.
- 1 Non-published Syntenic Gene Sets
- 2 Published Syntenic Gene Sets
- 3 How to generate syntenic gene sets with links to GEvo
- 4 Want a syntelog gene set?
Non-published Syntenic Gene Sets
Sorghum and Maize
Since maize and sorghum divered ~10-15 million years ago, maize underwent an additional tetraploidy event. For more information see this page on their genomes' evolutionary history.
This dataset contains:
- sorghum-maize1-maize2 syntelogs (or missing one of the maize homeologs due to post-tetraploidy genome fractionation.
- tandem gene duplication data.
- Links to GEvo for multi-genomic region comparisons
- Where one of the maize homeologs is missing, a place-holder is added so the region is still shown in GEvo.
- Links to FeatList to extract information about the set of syntelogs including getting their sequences and sending them to other tools in CoGe for further analysis.
Uses these genome versions:
Among the four sequenced grass genomes of maize, sorghum, rice, and brachypodium, there are three large-scale genomic evolution events:
- a pre-grass whole genome duplication event shared among all grass genomes
- the radiation of the grasses
- a maize lineage specific whole genome duplication event that happened after the divergence of maize and sorghum
This dataset is organized into eleven columns. The first five list syntenic orthologs in sorghum, rice and brachypodium as well as the two potential co-orthologous genes present in the maize genome as a result of the subsequent whole genome duplication in that lineage. These columns are ordered: Maize1, Maize2, Sorghum, Rice, Brachypodium.
The second five columns list, when applicable, information on homeologous genes from the pre-grass tetraploidy from sorghum, rice, brachypodium and the two subgenomes of maize in the same order used for the first five columns.
The final column provides a GEvo link to visualize the genomic contexts of orthologous and homeologous genes, as well as the predicted locations for deleted genes, in that row.
Grape-Poplar-Papaya-Arabidopsis Syntenic gene sets
This pan-rosid dataset was generated by Haibao Tang using some new algorithms (quota align) he's been creating to both identify inferred syntenic regions when no homologous gene is present, and enforce a set syntenic relationship (i.e. 4 Arabidopsis to 1 grape) based on the whole genome duplication evolution history of each genome.
His comments on this dataset are:
The file is comma-delimited, each row contains syntelogs, and a coge link (anchoring syntelogs and proxies). Look at the header first. Aside from the syntelogs, "-" represents inferred proxy. Since this is a post-hexaploidy data set, the quota tries to match 1:1:2:4 (grape, papaya, poplar, grape). Each syntelog group is referenced by one panel, to make things easier to see and go faster. The reference panel takes the priority of (grape, papaya, poplar, grape), in that order. Having spot-checked about 30 of the links, I think this is close to the best I can do. Recognizing real genomic evolution takes all forms, this gross simplification to a "pan-rosid" data set will surely contain errors. But hopefully I have kept them at a low percentage.
Many thanks for his hard work!
Published Syntenic Gene Sets
These lists are usually from publications in MS Excel format and provide links to populate the sequence submission form in GEvo, CoGe's tool for analyzing multiple genomic regions. These links allow you to quickly start comparing syntenic regions of interest.
Arabidopsis Papaya, Poplar, and Grape: CoGe with Rosids.
Arabidopsis thaliana - Carica papaya: Since their divergence, Arabidopsis thaliana has had two tetraploidies while papaya has had none. Prepared by the Paterson lab at the Plant Center at the University of Georgia and modified by the Freeling lab to include GEvo links. From: Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups.
Arabidopsis thaliana - Carica papaya - Vitis vinifera - Populus trichocarpa(2x): This list contains both syntenic regions for poplar from its most recent genome duplication event, but only a single for Arabidopsis. From: Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups.
Want a syntelog gene set?
We are happy to help. Just e-mail Eric Lyons.