Identifying and characterizing plant paleopolyploidies is ongoing research. They are identified through whole genome comparisons using a combination of the data derived from genomic structure (e.g. syntenic dotplots) and evolutionary distances (e.g. synonymous mutation rates). As such, detecting these events and determining which lineages share what subset are continually changing. The images presented here represent our views right now, but are subject to change. Events that were previously undetected or missed can suddenly be seen with an improved build of a genome or the sequencing of a fortuitously placed outgroup.
This webpage in maintained by James Schnable, a member of the Freeling Lab and the CoGe development team. As new genomes become available, and previous genomes are updated, I will continue to improve these figures. If you know of a new whole genome duplication not listed here, a paper that should be credited but isn't, or interesting information about any specific whole genome duplication that you think should be included in the summaries please don't hesitate to contact me so I can get the information up here.
Synonyms: Arabidopsis Gamma
This hexaploidy (genome tripling) is shared by the core eudicots (the rosids and asterids), and may be present in additional, basal eudicots, although it will not be possible to reach this conclusion until the genomes of species from basal lineages are sequences -- AND WELL ASSEMBLED!
This whole genome duplication was first identified as the most ancient of three whole genome duplication present in the genome of Arabidopsis thaliana, and assigned the name "gamma" by in 2003 by Bowers et al. [1]:
In this paper, hampered by the fact that only two plant genomes had yet been sequenced, and the ridiculously accelerated rate of base pair substitution in arabidopsis the authors concluded the gamma event was likely shared by both monocots and eudicots and could potentially be as old as the split between gymnosperms and flowering plants 300 million years ago.
With the publication of the grape genome in 2007[2] which has not experienced any duplications since the eudicot hexaploidy and doesn't show the same acceleration of nucleotide substitutions, it became possible to conclude that the eudicot hexaploidy was NOT shared with monocots and was shared by all rosids.
More recent work in the asterids[3] indicate that this highly successful eudicot clade share the same ancient whole genome duplication.
The alpha tetraploidy of arabidopsis was first given that name in Bowers et al 2003[1]. It is shared by most or all of the crucifers (family Brassicaceae).
As of yet not a single lineage has been identified in which the beta tetraploidy (naming conventions from Bowers et al 2003[1]) is the most recent whole genome duplication. Despite what the image above might indicate, this duplication is significantly older than arabidopsis alpha, however precise dating is difficult given the acceleration of synonymous substitution rate in the arabidopsis lineage.
The hexaploidy shared by all species in the genus Brassica is well known. So well known, I don't know what the proper citation for the discovery of this whole genome duplication is. If you know, please get in touch so the scientists responsible for this discovery -- however long ago -- can get proper credit.
When the genome of poplar was released back in 2006, researchers announced that they had identified a new ancient whole genome duplication[4]. Poplar retains around 8000 pairs of duplicated genes.
The genome paper of apple also discussed an ancient whole genome duplication identified in that linage which they estimated to be >50 million years old [5]
The relatively recent whole genome duplication in soybean was long suspected based a number of different forms of analysis, from analysis of Ks peaks[6], phylogenies of individual gene families[7] and analyzing the fractionation of individual sequenced regions[8]. As expected, when the genome of soybean was published in 2010, researchers did identify a recent whole genome duplication (peak ks=0.13, estimated age 13 million years ago).
The whole genome duplication shared by all legumes was initially inferred by comparison of the medicago and lotus genome assemblies to data from peanut (Arachis hypogaea) a basal lineage of that clade[9]
An analysis of conserved orthologous gene groups (COGs) and huge numbers of ESTs identified evidence of two ancient whole genome duplications shared by both monocots and eudicots.[10] The more recent of the two, placed at 192 million years ago, occurred after the split of gymnosperms (non-flowering seed plants) but is shared by all extant flowering plant species including Amborella trichopoda.
An analysis of conserved orthologous gene groups (COGs) and huge numbers of ESTs identified evidence of two ancient whole genome duplications shared by both monocots and eudicots.[10] The more ancient of the two events is shared by all flowering plants as well as gymnosperms, but after the divergence from Selaginella, a basal vascular plant.
The suspicion that maize is an ancient polyploid can be traced back through at least a generation of maize geneticists, and finds its earlier roots in the large number of duplicate mutant loci found in maize, sometimes found in parallel orders along different chromosomes. Perhaps the most famous of these are the pairs of duplicate regulators of anthocyanin biosynthesis: aleurone1 and Purple plant1.
Brandon Gaut and John Doebley concluded maize was an allopolyploid back in 1997.[11] While whether maize is an allo- or auto- polyploid has been argued back and forth over the years, the polyploidy question was settled more than a decade before the publication of the first draft of the maize genome.
The two subgenomes of maize are estimated to have diverged ~12 million years ago[12]. If maize is an autopolyploid, the two genomes also merged into a single genome 12 million years ago, but if maize is an allopolyploid the two genomes could have evolved as separate species for several million years before the wide cross that created the polyploid ancestor of modern maize. In either case, the two ancestral genomes of maize have been contained in the same nucleus for at least five million years[12]. By comparing the organization of the maize genome to other grass species it is possible to reconstruct the ten pairs of homeologous chromosomes present in that first polyploid maize ancestor [13]. By measuring biased gene loss (fractionation) and biased expression of duplicate pairs, it is possible to assign one copy of each ancestral chromosome pair in maize to a parental subgenome, either maize1 or maize2[14]
Synonyms: Rho
All grass species sequenced to date share a common whole genome duplication. This duplication is estimated to have occurred between 70-90 million years ago, and homeologs from this duplication have a modal synonymous substitution rate ~0.9.[15]
While the fact that a significant portion of the rice is covered by duplicate syntenic region has been known for many years, the reason for this duplication was disputed, ranging from multiple segmental duplications to one -- or more -- whole genome duplications.
There were two reasons for this confusion.
Synonyms: Sigma
By using the genomes of both rice and sorghum to reconstruct a gene order present in duplicate segments prior to the shared tetraploidy in all grasses, Tang and coworkers were able to identify an even more ancient whole genome duplication in the monocot lineage. [18] They estimated age of 130 million years and a median synonymous substitution rate between gene pairs of ~1.7 for this duplication. However, the authors caution that these estimates could be very off as synonymous substitution rates of these gene pairs are close to saturation and there are potentially confounding effects from mutation rate variations between multiple lineages.
By comparing sigma duplicate regions in the grasses to the grape genome, the same research group as above[18] found that at least in some cases eight sigma regions showed detectable synteny to a region of the grape genome. This suggests there were in fact two whole genome duplications in the monocot lineage following the monocot-eudicot split but before the pre-grass duplication shared by all grasses.