Syntenic gene sets

From CoGepedia
Jump to: navigation, search
Syntenic comparison of two regions from the genome of Arabidopsis thaliana using GEvo. This genome underwent a whole genome duplication event which created a copy of every genomic region. Over evolutionary time, many of the duplicated genes were lost by a process known as fractionation. However, many duplicated genes have been retained in duplicate and their collinear arrangement in the genome is evidence for synteny. Results can be regenerated at:

Syntenic gene sets are sets of genes located in syntenic regions (genomic regions derived from the same ancestral genomic region) within and across genomes. These genes are usually in a collinear gene order and are used as evidence that the regions are syntenic. These data are made available for download for use in any publication. However, we ask that you reference CoGe or the publication from which the data was derived.

Non-published Syntenic Gene Sets

Sorghum and Maize

Since maize and sorghum divered ~10-15 million years ago, maize underwent an additional tetraploidy event. For more information see this page on their genomes' evolutionary history.

Here is the dataset!

This dataset contains:

  1. sorghum-maize1-maize2 syntelogs (or missing one of the maize homeologs due to post-tetraploidy genome fractionation.
  2. tandem gene duplication data.
  3. Links to GEvo for multi-genomic region comparisons
    1. Where one of the maize homeologs is missing, a place-holder is added so the region is still shown in GEvo.
  4. Links to FeatList to extract information about the set of syntelogs including getting their sequences and sending them to other tools in CoGe for further analysis.

Uses these genome versions:

  1. maize
  2. sorghum

Pan-grass synteny

From: Genome-wide analysis of syntenic gene deletion in the grasses. James C. Schnable, Michael Freeling and Eric Lyons (2012) Genome Biology and Evolution

Among the four sequenced grass genomes of maize, sorghum, rice, and brachypodium, there are three large-scale genomic evolution events:

  1. a pre-grass whole genome duplication event shared among all grass genomes
  2. the radiation of the grasses
  3. a maize lineage specific whole genome duplication event that happened after the divergence of maize and sorghum

This dataset is organized into eleven columns. The first five list syntenic orthologs in sorghum, rice and brachypodium as well as the two potential co-orthologous genes present in the maize genome as a result of the subsequent whole genome duplication in that lineage. These columns are ordered: Maize1, Maize2, Sorghum, Rice, Brachypodium.

The second five columns list, when applicable, information on homeologous genes from the pre-grass tetraploidy from sorghum, rice, brachypodium and the two subgenomes of maize in the same order used for the first five columns.

The final column provides a GEvo link to visualize the genomic contexts of orthologous and homeologous genes, as well as the predicted locations for deleted genes, in that row.

Grape-Poplar-Papaya-Arabidopsis Syntenic gene sets

This pan-rosid dataset was generated by Haibao Tang using some new algorithms (quota align) he's been creating to both identify inferred syntenic regions when no homologous gene is present, and enforce a set syntenic relationship (i.e. 4 Arabidopsis to 1 grape) based on the whole genome duplication evolution history of each genome.

His comments on this dataset are:

The file is comma-delimited, each row contains syntelogs, and a coge link (anchoring syntelogs and proxies). Look at the header first.
Aside from the syntelogs, "-" represents inferred proxy. Since this is a post-hexaploidy data set, the quota tries to match 1:1:2:4 (grape, papaya, poplar, grape).
Each syntelog group is referenced by one panel, to make things easier to see and go faster. The reference panel takes the priority of (grape, papaya, poplar, grape), in that order.
Having spot-checked about 30 of the links, I think this is close to the best I can do. Recognizing real genomic evolution takes all forms, this gross simplification to a "pan-rosid" data set will surely contain errors. But hopefully I have kept them at a low percentage.

Many thanks for his hard work!

Published Syntenic Gene Sets

These lists are usually from publications in MS Excel format and provide links to populate the sequence submission form in GEvo, CoGe's tool for analyzing multiple genomic regions. These links allow you to quickly start comparing syntenic regions of interest.

Arabidopsis Papaya, Poplar, and Grape: CoGe with Rosids.

Arabidopsis thaliana - Carica papaya: Since their divergence, Arabidopsis thaliana has had two tetraploidies while papaya has had none. Prepared by the Paterson lab at the Plant Center at the University of Georgia and modified by the Freeling lab to include GEvo links. From: Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups.

Arabidopsis thaliana - Carica papaya - Vitis vinifera - Populus trichocarpa(2x): This list contains both syntenic regions for poplar from its most recent genome duplication event, but only a single for Arabidopsis. From: Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups.

How to generate syntenic gene sets with links to GEvo

SynMap allows you to compare any two genomes. Its output includes a text file of all identified syntenic gene pairs and links to GEvo.

Want a syntelog gene set?

We are happy to help. Just e-mail Eric Lyons.