Difference between revisions of "Syntenic gene sets"

From CoGepedia
Jump to: navigation, search
(Pan-grass synteny)
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
[[Image:Gevo-synteny.png|thumb|600px|right|Syntenic comparison of two regions from the genome of Arabidopsis thaliana using [[GEvo]].  This genome underwent a whole genome duplication event which created a copy of every genomic region.  Over evolutionary time, many of the duplicated genes were lost by a process known as [[fractionation]].  However, many duplicated genes have been retained in duplicate and their collinear arrangement in the genome is evidence for synteny. Results can be regenerated at: http://tinyurl.com/noul6b]]
 
[[Image:Gevo-synteny.png|thumb|600px|right|Syntenic comparison of two regions from the genome of Arabidopsis thaliana using [[GEvo]].  This genome underwent a whole genome duplication event which created a copy of every genomic region.  Over evolutionary time, many of the duplicated genes were lost by a process known as [[fractionation]].  However, many duplicated genes have been retained in duplicate and their collinear arrangement in the genome is evidence for synteny. Results can be regenerated at: http://tinyurl.com/noul6b]]
  
Syntenic gene sets are sets of genes located in [[syntenic regions]] (genomic regions derived from the same ancestral genomic region) within and across genomes.  These genes are usually in a [[collinear gene order]] and are used as evidence that the regions are syntenic.
+
Syntenic gene sets are sets of genes located in [[syntenic regions]] (genomic regions derived from the same ancestral genomic region) within and across genomes.  These genes are usually in a [[collinear gene order]] and are used as evidence that the regions are syntenic.  These data are made available for download for use in any publication.  However, we ask that you reference CoGe or the publication from which the data was derived.
  
  
Line 9: Line 9:
 
Since maize and sorghum divered ~10-15 million years ago, maize underwent an additional tetraploidy event.  For more information see [[Maize_Sorghum_Syntenic_dotplot | this page on their genomes' evolutionary history]].
 
Since maize and sorghum divered ~10-15 million years ago, maize underwent an additional tetraploidy event.  For more information see [[Maize_Sorghum_Syntenic_dotplot | this page on their genomes' evolutionary history]].
  
[http://synteny.cnr.berkeley.edu/CoGe/data/distrib/maize-sorghum-syntelogs_v4_tab.txt Here is the dataset!]
+
[http://genomevolution.org/CoGe/data/distrib/maize-sorghum-syntelogs_v4_tab.txt Here is the dataset!]
  
 
This dataset contains:
 
This dataset contains:
Line 19: Line 19:
  
 
Uses these genome versions:
 
Uses these genome versions:
#[http://synteny.cnr.berkeley.edu/CoGe/OrganismView.pl?dsgid=8062 maize]
+
#[http://genomevolution.org/CoGe/OrganismView.pl?dsgid=8062 maize]
#[http://synteny.cnr.berkeley.edu/CoGe/OrganismView.pl?dsgid=93 sorghum]
+
#[http://genomevolution.org/CoGe/OrganismView.pl?dsgid=93 sorghum]
  
 
===[[Pan-grass synteny]]===
 
===[[Pan-grass synteny]]===
 +
 +
From: [http://gbe.oxfordjournals.org/content/early/2012/01/23/gbe.evs009.short '''Genome-wide analysis of syntenic gene deletion in the grasses.  James C. Schnable, Michael Freeling and Eric Lyons''' (2012) Genome Biology and Evolution]
  
 
Among the four sequenced grass genomes of maize, sorghum, rice, and brachypodium, there are three large-scale genomic evolution events:
 
Among the four sequenced grass genomes of maize, sorghum, rice, and brachypodium, there are three large-scale genomic evolution events:
 
# a pre-grass [[whole genome duplication event]] shared among all grass genomes
 
# a pre-grass [[whole genome duplication event]] shared among all grass genomes
 
# the radiation of the grasses
 
# the radiation of the grasses
# a [[Maize_Sorghum_Syntenic_dotplot | maize lineage specific]] [[whole genomic duplication event]] that happened after the divergence of maize and sorghum
+
# a [[Maize_Sorghum_Syntenic_dotplot | maize lineage specific]] [[whole genome duplication event]] that happened after the divergence of maize and sorghum
 +
 
 +
This dataset is organized into eleven columns. The first five list syntenic orthologs in sorghum, rice and brachypodium as well as the two potential co-orthologous genes present in the maize genome as a result of the subsequent whole genome duplication in that lineage. These columns are ordered: Maize1, Maize2, Sorghum, Rice, Brachypodium.
 +
 
 +
The second five columns list, when applicable, information on homeologous genes from the pre-grass tetraploidy from sorghum, rice, brachypodium and the two subgenomes of maize in the same order used for the first five columns.
 +
 
 +
The final column provides a [[GEvo]] link to visualize the genomic contexts of orthologous and homeologous genes, as well as the predicted locations for deleted genes, in that row.
 +
 
 +
*[http://genomevolution.org/CoGe/data/distrib/Supplemental_dataset_S1_pluslinks.csv Dataset as a comma separated list]
 +
*[http://genomevolution.org/CoGe/data/distrib/Supplemental_dataset_S1_pluslinks.xls Dataset as microsoft excel file]
 +
 
 +
===Grape-Poplar-Papaya-Arabidopsis  Syntenic gene sets===
 +
This [http://genomevolution.org/CoGe/data/distrib/pan_rosid.csv pan-rosid dataset] was generated by Haibao Tang using some new algorithms ([[quota align]]) he's been creating to both identify inferred syntenic regions when no homologous gene is present, and enforce a set syntenic relationship (i.e. 4 Arabidopsis to 1 grape) based on the whole genome duplication evolution history of each genome.
 +
 
 +
His comments on this dataset are:
 +
The file is comma-delimited, each row contains syntelogs, and a coge link (anchoring syntelogs and proxies). Look at the header first.
 +
Aside from the syntelogs, "-" represents inferred proxy. Since this is a post-hexaploidy data set, the quota tries to match 1:1:2:4 (grape, papaya, poplar, grape).
 +
Each syntelog group is referenced by one panel, to make things easier to see and go faster. The reference panel takes the priority of (grape, papaya, poplar, grape), in that order.
 +
Having spot-checked about 30 of the links, I think this is close to the best I can do. Recognizing real genomic evolution takes all forms, this gross simplification to a "pan-rosid" data set will surely contain errors. But hopefully I have kept them at a low percentage.
  
These data are organized into two sets:
+
Many thanks for his hard work!
*[http://synteny.cnr.berkeley.edu/CoGe/data/distrib/pan_grass/pan_grass_ortholog_panels_v1.csv Set 1:]  Syntenic sets for the radiation of the grasses (#2) and the maize specific whole genome duplication event (#3). This would result in up to 5 syntenic regions (2x maize, 1x sorghum, 1x rice, 1x brachypodium).
+
*[http://synteny.cnr.berkeley.edu/CoGe/data/distrib/pan_grass/pan_grass_rho_panels_v0.1.csv Set 2:]Syntenic sets derived from the pre-grass whole genome duplication event (#1).  This would result in up to 10 syntenic regions (4x maize, 1x sorghum, 1x rice, 1x brachypodium).
+
  
 
==Published Syntenic Gene Sets==
 
==Published Syntenic Gene Sets==
Line 38: Line 56:
 
===Arabidopsis Papaya, Poplar, and Grape: CoGe with Rosids.===
 
===Arabidopsis Papaya, Poplar, and Grape: CoGe with Rosids.===
  
[http://synteny.cnr.berkeley.edu/CoGe/data/distrib/SI2_At-Cp4.xls Arabidopsis thaliana - Carica papaya]: Since their divergence, Arabidopsis thaliana has had two tetraploidies while papaya has had none. Prepared by the Paterson lab at the Plant Center at the University of Georgia and modified by the Freeling lab to include GEvo links. From: [http://www.plantphysiol.org/cgi/content/abstract/148/4/1772 Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups].
+
[http://genomevoution.org/CoGe/data/distrib/SI2_At-Cp4.xls Arabidopsis thaliana - Carica papaya]: Since their divergence, Arabidopsis thaliana has had two tetraploidies while papaya has had none. Prepared by the Paterson lab at the Plant Center at the University of Georgia and modified by the Freeling lab to include GEvo links. From: [http://www.plantphysiol.org/cgi/content/abstract/148/4/1772 Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups].
  
[http://synteny.cnr.berkeley.edu/CoGe/data/distrib/SI1_At-Cp-Vv-2Pt_GEvo_links.xls Arabidopsis thaliana - Carica papaya - Vitis vinifera - Populus trichocarpa(2x)]: This list contains both syntenic regions for poplar from its most recent genome duplication event, but only a single for Arabidopsis.  From: [http://www.plantphysiol.org/cgi/content/abstract/148/4/1772 Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups.]
+
[http://genomevoution.org/CoGe/data/distrib/SI1_At-Cp-Vv-2Pt_GEvo_links.xls Arabidopsis thaliana - Carica papaya - Vitis vinifera - Populus trichocarpa(2x)]: This list contains both syntenic regions for poplar from its most recent genome duplication event, but only a single for Arabidopsis.  From: [http://www.plantphysiol.org/cgi/content/abstract/148/4/1772 Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups.]
  
 
==How to generate syntenic gene sets with links to [[GEvo]]==
 
==How to generate syntenic gene sets with links to [[GEvo]]==

Latest revision as of 12:15, 16 September 2014

Syntenic comparison of two regions from the genome of Arabidopsis thaliana using GEvo. This genome underwent a whole genome duplication event which created a copy of every genomic region. Over evolutionary time, many of the duplicated genes were lost by a process known as fractionation. However, many duplicated genes have been retained in duplicate and their collinear arrangement in the genome is evidence for synteny. Results can be regenerated at: http://tinyurl.com/noul6b

Syntenic gene sets are sets of genes located in syntenic regions (genomic regions derived from the same ancestral genomic region) within and across genomes. These genes are usually in a collinear gene order and are used as evidence that the regions are syntenic. These data are made available for download for use in any publication. However, we ask that you reference CoGe or the publication from which the data was derived.


Non-published Syntenic Gene Sets

Sorghum and Maize

Since maize and sorghum divered ~10-15 million years ago, maize underwent an additional tetraploidy event. For more information see this page on their genomes' evolutionary history.

Here is the dataset!

This dataset contains:

  1. sorghum-maize1-maize2 syntelogs (or missing one of the maize homeologs due to post-tetraploidy genome fractionation.
  2. tandem gene duplication data.
  3. Links to GEvo for multi-genomic region comparisons
    1. Where one of the maize homeologs is missing, a place-holder is added so the region is still shown in GEvo.
  4. Links to FeatList to extract information about the set of syntelogs including getting their sequences and sending them to other tools in CoGe for further analysis.

Uses these genome versions:

  1. maize
  2. sorghum

Pan-grass synteny

From: Genome-wide analysis of syntenic gene deletion in the grasses. James C. Schnable, Michael Freeling and Eric Lyons (2012) Genome Biology and Evolution

Among the four sequenced grass genomes of maize, sorghum, rice, and brachypodium, there are three large-scale genomic evolution events:

  1. a pre-grass whole genome duplication event shared among all grass genomes
  2. the radiation of the grasses
  3. a maize lineage specific whole genome duplication event that happened after the divergence of maize and sorghum

This dataset is organized into eleven columns. The first five list syntenic orthologs in sorghum, rice and brachypodium as well as the two potential co-orthologous genes present in the maize genome as a result of the subsequent whole genome duplication in that lineage. These columns are ordered: Maize1, Maize2, Sorghum, Rice, Brachypodium.

The second five columns list, when applicable, information on homeologous genes from the pre-grass tetraploidy from sorghum, rice, brachypodium and the two subgenomes of maize in the same order used for the first five columns.

The final column provides a GEvo link to visualize the genomic contexts of orthologous and homeologous genes, as well as the predicted locations for deleted genes, in that row.

Grape-Poplar-Papaya-Arabidopsis Syntenic gene sets

This pan-rosid dataset was generated by Haibao Tang using some new algorithms (quota align) he's been creating to both identify inferred syntenic regions when no homologous gene is present, and enforce a set syntenic relationship (i.e. 4 Arabidopsis to 1 grape) based on the whole genome duplication evolution history of each genome.

His comments on this dataset are:

The file is comma-delimited, each row contains syntelogs, and a coge link (anchoring syntelogs and proxies). Look at the header first.
Aside from the syntelogs, "-" represents inferred proxy. Since this is a post-hexaploidy data set, the quota tries to match 1:1:2:4 (grape, papaya, poplar, grape).
Each syntelog group is referenced by one panel, to make things easier to see and go faster. The reference panel takes the priority of (grape, papaya, poplar, grape), in that order.
Having spot-checked about 30 of the links, I think this is close to the best I can do. Recognizing real genomic evolution takes all forms, this gross simplification to a "pan-rosid" data set will surely contain errors. But hopefully I have kept them at a low percentage.

Many thanks for his hard work!

Published Syntenic Gene Sets

These lists are usually from publications in MS Excel format and provide links to populate the sequence submission form in GEvo, CoGe's tool for analyzing multiple genomic regions. These links allow you to quickly start comparing syntenic regions of interest.

Arabidopsis Papaya, Poplar, and Grape: CoGe with Rosids.

Arabidopsis thaliana - Carica papaya: Since their divergence, Arabidopsis thaliana has had two tetraploidies while papaya has had none. Prepared by the Paterson lab at the Plant Center at the University of Georgia and modified by the Freeling lab to include GEvo links. From: Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups.

Arabidopsis thaliana - Carica papaya - Vitis vinifera - Populus trichocarpa(2x): This list contains both syntenic regions for poplar from its most recent genome duplication event, but only a single for Arabidopsis. From: Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups.

How to generate syntenic gene sets with links to GEvo

SynMap allows you to compare any two genomes. Its output includes a text file of all identified syntenic gene pairs and links to GEvo.


Want a syntelog gene set?

We are happy to help. Just e-mail Eric Lyons.