Plant paleopolyploidy

From CoGepedia
Jump to: navigation, search

Identifying and characterizing plant paleopolyploidies is ongoing research. They are identified through whole genome comparisons using a combination of the data derived from genomic structure (e.g. syntenic dotplots) and evolutionary distances (e.g. synonymous mutation rates). As such, detecting these events and determining which lineages share what subset are continually changing. The images presented here represent our views right now, but are subject to change. Events that were previously undetected or missed can suddenly be seen with an improved build of a genome or the sequencing of a fortuitously placed outgroup.

This webpage in maintained by James Schnable, a member the CoGe development team. As new genomes become available, and previous genomes are updated, I will continue to improve these figures. If you know of a new whole genome duplication not listed here, a paper that should be credited but isn't, or interesting information about any specific whole genome duplication that you think should be included in the summaries please don't hesitate to contact me so I can get the information up here.

If you would like a citable reference for the data collected here and aren't comfortable citing a website directly, here's the citable version (complete with DOI): http://figshare.com/articles/Plant_Paleopolyploidy/1538627


Plant Phylogenetic Tree With WGD Marked

Phylogenetic tree of plant species with sequenced genomes, with ancient whole genome duplications marked. Branch lengths are not proportional to anything. *Please read the section on hexaploidies.

For a version of this tree with all common names replaced by scientific ones, see the bottom of this page.

Hexaploidies

Hexaploidies can form in either one step (instant triplication of the genome) or two steps ( 1) tetraploidy, 2) a tetraploid gamete fuses with a diploid gamete creating a sterile triploid which then regains fertility by doubling its genome again, creating a hexaploid).


Hexaploidy.png

The two step process certainly occurs. Bread wheat is an excellent example of this process where most of the intermediate species still exist. Einkorn wheat is a diploid containing only the A genome. At some point an A species (Einkorn or a relative) crossed with a species carrying the B genome, creating an allotetraploid with both the A and B genomes. AB wheat species are still grown today (including the durum wheat used for pasta). Finally the AB wheat crossed with goatgrass, a wild diploid species which carries the D genome, creating the bread wheat grown today around the world which carries three genomes: A, B, and D. The wheat hexaploidy is close to an ideal case with almost all the species involved still alive today.

For ancient hexaploidies it is hard to prove an event happened in 1-step or 2-steps, although we think based on patterns of gene loss that most ancient hexaploidies formed in two-step events.[1][2]

For two-step hexaploidies a related lineages could show evidence of a tetraploidy at the exact same time, indicating that this sister lineage is descended from the first step of a two-step hexaploidy (ie the durum AB wheat in the example given above).

Discussion of individual tetraploidies

1 Eudicot Hexaploidy

Synonyms: Arabidopsis Gamma

This hexaploidy (genome tripling) is shared by the core eudicots (the rosids and asterids), and may be present in additional, basal eudicots, although it will not be possible to reach this conclusion until the genomes of species from basal lineages are sequences -- AND WELL ASSEMBLED!

This whole genome duplication was first identified as the most ancient of three whole genome duplication present in the genome of Arabidopsis thaliana, and assigned the name "gamma" by in 2003 by Bowers et al. [3]:

In this paper, hampered by the fact that only two plant genomes had yet been sequenced, and the ridiculously accelerated rate of base pair substitution in arabidopsis the authors concluded the gamma event was likely shared by both monocots and eudicots and could potentially be as old as the split between gymnosperms and flowering plants 300 million years ago.

With the publication of the grape genome in 2007[4] which has not experienced any duplications since the eudicot hexaploidy and doesn't show the same acceleration of nucleotide substitutions, it became possible to conclude that the eudicot hexaploidy was NOT shared with monocots and was shared by all rosids.

More recent work in the asterids[5] indicate that this evolutionary successful eudicot clade shares the same ancient whole genome duplication.

Based on patterns of fractionation the eudicot hexaploidy is believed to have been a two-step event.[1] It is possible -- although not confirmed -- that the tetraploidy seen in columbine (WGD #10) comes from the tetraploid intermediate of this process.

2 Arabidopsis alpha

The alpha tetraploidy of arabidopsis was first given that name in Bowers et al 2003[3]. It is shared by most or all of the crucifers (family Brassicaceae).

3 Arabidopsis beta

As of yet not a single lineage has been identified in which the beta tetraploidy (naming conventions from Bowers et al 2003[3]) is the most recent whole genome duplication. Despite what the image above might indicate, this duplication is significantly older than arabidopsis alpha, however precise dating is difficult given the acceleration of synonymous substitution rate in the arabidopsis lineage.

4 Brassica hexaploidy

The first explanation the base Brassica genomes (B. rapa, B. oleracea, and B. nigra) may be comprised of three ancestral genomes that were Arabidopsis like in structure was based on mapping studies[6]; but remained controversial[7]. Studies based on comparative chromosomepainting[8] and BAC-sequencing[9][10] further established the Brassica triplication hypothesis, which was confirmed by the genome sequence of Brassica rapa[11]. Ongoing studies are attempting to confirm if the Brassica hexaploidy event occurred in one or two steps, and the (allo)polyploidy origin of the event(s). An analysis of fractionation patterns in Brassica rapa supports the two-step model for this hexaploidy.[2]

5 Poplar tetraploidy

When the genome of poplar was released back in 2006, researchers announced that they had identified a new ancient whole genome duplication[12]. Poplar retains around 8000 pairs of duplicated genes.

6 Flax tetraploidy

Occurred within the genus Linum. Domesticated flax (L usitatissimim) and its close relative Linum bienne both share this event.

7 Apple tetraploidy

The genome paper of apple also discussed an ancient whole genome duplication identified in that linage which they estimated to be >50 million years old [13]

8 Soybean tetraploidy

The relatively recent whole genome duplication in soybean was long suspected based a number of different forms of analysis, from chromosome number and early linkage mapping studies,[14] analysis of Ks peaks[15], phylogenies of individual gene families[16] and analyzing the fractionation of individual sequenced regions[17]. As expected, when the genome of soybean was published in 2010, researchers did identify a recent whole genome duplication (peak ks=0.13, estimated age of divergence between whole genome duplicates 13 million years). The minimum age of the event has been fixed at five million years based on the divergence of Glycine species carrying the duplication.[18]

9 Papilionoid tetraploidy

Synonym: Legume tetraploidy <-- note that this name is misleading as this duplication is not shared by many clades within the legumes

Linkage mapping studies in soybean led to the hypothesis of a second, older polyploidy event in that lineage/[14] This hypothesis was corroborated by evidence from Ks studies[19][20] for an ancient polyploidy event in the Medicago truncatula genome. Phylogenomic studies (Pfeil et al. 2005) provided evidence that the soybean and Medicago Ks peaks were due to a single event that occurred in their common ancestor (also shared by Lotus). Subsequently the event was also shown to be shared by peanut (Arachis hypogaea), a member of the clade sister to the Glycine-Medicago-Lotus clade[21]. Unpublished information suggests that this WGD is also found in lupin (Lupinus) in the clade sister to the Arachis-Glycine et al. clade. Phylogenomic studies in a caesalpinioid legume, Chamaecrista fasciculata showed that this species—and thus all caesalpinioid and mimosoid legumes—lacks this polyploidy event, indicating the duplication occurred at the base of (or within) the papilionoid subfamily.[22] It is unknown whether the “papilionoid WGD” occurred in the ancestor of all papilionoid legumes, because there is not yet any data for early diverging lineages within the subfamily. The “papilionoid” WGD is estimated to have occurred between 50-60 million years ago, early in the evolution of the legume family.

10 Columbine tetraploidy

11 Flowering plant tetraploidy

An analysis of conserved orthologous gene groups (COGs) and huge numbers of ESTs identified evidence of two ancient whole genome duplications shared by both monocots and eudicots.[23] The more recent of the two, placed at 192 million years ago, occurred after the split of gymnosperms (non-flowering seed plants) but is shared by all extant flowering plant species including Amborella trichopoda.

12 Seed Plant Tetraploidy

An analysis of conserved orthologous gene groups (COGs) and huge numbers of ESTs identified evidence of two ancient whole genome duplications shared by both monocots and eudicots.[23] The more ancient of the two events is shared by all flowering plants as well as gymnosperms, but after the divergence from Selaginella, a basal vascular plant.

13 Maize Tetraploidy

The suspicion that maize is an ancient polyploid can be traced back through at least a generation of maize geneticists, and finds its earlier roots in the large number of duplicate mutant loci found in maize, sometimes found in parallel orders along different chromosomes. Perhaps the most famous of these are the pairs of duplicate regulators of anthocyanin biosynthesis: aleurone1 and Purple plant1.

Brandon Gaut and John Doebley concluded maize was an allopolyploid back in 1997.[24] While whether maize is an allo- or auto- polyploid has been argued back and forth over the years, the polyploidy question was settled more than a decade before the publication of the first draft of the maize genome.

The two subgenomes of maize are estimated to have diverged ~12 million years ago[25]. If maize is an autopolyploid, the two genomes also merged into a single genome 12 million years ago, but if maize is an allopolyploid the two genomes could have evolved as separate species for several million years before the wide cross that created the polyploid ancestor of modern maize. In either case, the two ancestral genomes of maize have been contained in the same nucleus for at least five million years[25]. By comparing the organization of the maize genome to other grass species it is possible to reconstruct the ten pairs of homeologous chromosomes present in that first polyploid maize ancestor [26]. By measuring biased gene loss (fractionation) and biased expression of duplicate pairs, it is possible to assign one copy of each ancestral chromosome pair in maize to a parental subgenome, either maize1 or maize2[27]

14 Grass Tetraploidy

Synonyms: Rho

All grass species sequenced to date share a common whole genome duplication. This duplication is estimated to have occurred between 70-90 million years ago, and homeologs from this duplication have a modal synonymous substitution rate ~0.9.[28]

While the fact that a significant portion of the rice is covered by duplicate syntenic region has been known for many years, the reason for this duplication was disputed, ranging from multiple segmental duplications to one -- or more -- whole genome duplications.

There were two reasons for this confusion.

  • Only 65.7% of the rice genome is covered by syntenic duplicate regions.[29]
  • The duplicate homeologous region located on chromosomes 11 and 12 of rice and 5 and 7 of sorghum have continued to experience gene conversion, so this pair duplicate regions appears much younger than the rest of the duplicate regions in grass genomes.[30]

15 Grass tetraploidy B

Synonyms: Sigma

By using the genomes of both rice and sorghum to reconstruct a gene order present in duplicate segments prior to the shared tetraploidy in all grasses, Tang and coworkers were able to identify an even more ancient whole genome duplication in the monocot lineage. [31] They estimated age of 130 million years and a median synonymous substitution rate between gene pairs of ~1.7 for this duplication. However, the authors caution that these estimates could be very off as synonymous substitution rates of these gene pairs are close to saturation and there are potentially confounding effects from mutation rate variations between multiple lineages. Comparisons to the banana genome placed sigma in the lineage leading to grasses after the split between Poales (contains grasses) and Zingiberales (contains banana).[32]

16 Monocot Tetraploidy

By comparing sigma duplicate regions in the grasses to the grape genome, the same research group as above[31] found that at least in some cases eight sigma regions showed detectable synteny to a region of the grape genome. This suggests there were in fact two whole genome duplications in the monocot lineage following the monocot-eudicot split but before the pre-grass duplication shared by all grasses.

17 Banana Beta

The publication of the banana genome reported two whole genome duplications with roughly similar rates of divergence indicating that the two events occurred at roughly the same time[32]. This is the older of the two events and is dated to approximately 65 million years ago.

18 Banana Alpha

This is the more recent of the two tetraploidies identified in the initial analysis of the banana genome.[32] Evidence for at least one whole genome duplication in the banana lineage was reported prior to the publication of the banana genome.[33] where the duplication was estimated to have occurred roughly 61 million years ago.

19 Cotton WGD

The cotton lineage whole genome duplication has been estimated to have occurred between 13 and 20 million years ago.[34]

20 Solanum hexaploidy

The sequencing of the tomato genome revealed a hexaploidy shared by both tomato and potato and estimated to have occurred between 52-92 million years ago.[35] At the more recent end of that range the whole genome duplication would be as shown in the tree above. At the more ancient end of the range this triplication would also be shared by Monkey Flower and a large proportion of all the asterids.

21 Date Palm WGD

Date palm is believed to contain a whole genome duplication which occurred sometime after its split from the lineage leading to banana and the grasses. This duplication was inferred based on a peak in the synonymous substitution rate between duplicate date palm genes around 0.25 synonymous substitutions per site. (See supplementary figure 19 of this reference.[32]) However the Date Palm genome is very poorly assembled so this duplication cannot yet be verified by syntenic analysis.

22 Switchgrass tetraploidy

Switchgrass (Panicum virgatum) is a relatively recent tetraploid species (estimated to have occurred the last 0.5-1 million years)

23 Cleome WGD

Cleome belongs to a family of plants that is sister to the crucifers (the Brassicaceae) and at least some species in the genus (including C. spinosa and C. gynandra) carry a hexaploidy which occurred after that lineage split from the ancestor of the crucifers. This event has been identified by both synteny within individual sequenced BACs[36] and synonymous substitution rate analysis of a thousands of sequenced Cleome transcripts[37]

24 Banana Gamma

The publication of the banana genome [32] reported a third more ancestral whole genome duplication estimated to have occured around 100 million years ago. By comparing the sigma duplicate regions of poales (WGD 15) with the banana beta ancestral blocks, D'Hont and coworkers placed this gamma WGD after the split between Poales and Zingiberales.

Plant Phylogenetic Tree With WGD Marked (Scientific names)

Tree-WGD-scientific.png

References

  1. 1.0 1.1 Lyons et al (2008) "The value of nonmodel genomes and an example using SynMap within CoGe to dissect the hexaploidy that predates the rosids." Tropical Plant Biology doi: 10.1007/s12042-008-9017-y
  2. 2.0 2.1 Tang et al (2012) "Altered Patterns of Fractionation and Exon Deletions in Brassica rapa Support a Two-Step Model of Paleohexaploidy." Genetics doi:[10.1534/genetics.111.137349
  3. 3.0 3.1 3.2 Bowers JE et al (2003) "Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events." Nature DOI: 10.1038/nature01521
  4. Jaillon O et al (2007) "The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla." Nature DOI: 10.1038/nature06148
  5. Cenci A et al (2010) "Comparative sequence analyses indicate that Coffea (Asterids) and Vitis (Rosids) derive from the same paleo-hexaploid ancestral genome." Molecular Genetics and Genomics DOI: 10.1007/s00438-010-0534-7
  6. Lagercrantz U. (1998). “Comparative mapping between Arabidopsis thaliana and Brassica nigra indicates that Brassica genomes have evolved through extensive genome replication accompanied by chromosome fusions and frequent rearrangements.” Genetics 150: 1217–1228.
  7. Lukens LN et al. (2004). “Genome redundancy and plasticity within ancient and recent Brassica crop species.” Biological Journal of the Linnean Society 82: 665-674.
  8. Lysak MA et al (2005). “Chromosome triplication found across the tribe Brassiceae.” Genome Res. 15: 516–525.
  9. Yang TJ et al. (2006). “Sequence-level analysis of the diploidization process in the triplicated FLC region of Brassica rapa.” Plant Cell 18: 1339–1347.
  10. Cheung F et al. (2009). “Comparative Analysis between homoeologous genome segments of Brassica napus and its progenitor species reveals extensive sequence-level divergence.” The Plant Cell 21: 1912-1928.
  11. Wang X. et al. 2011. “The genome of the mesoploid crop species Brassicarapa.” Nature Genetics 43: 1035-1039.
  12. Tuskan GA et al (2006) "The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray)" Science DOI: 10.1126/science.1128691
  13. Valasco R et al (2010) "The genome of the domesticated apple (Malus × domestica Borkh.)" Nature Genetics DOI: 10.1038/ng.654
  14. 14.0 14.1 Shoemaker RC et al (1996) "Genome duplication in soybean (Glycine subgenus soja)." Genetics 144(1): 329–338
  15. Schlueter JA et al (2004) "Mining EST databases to resolve evolutionary events in major crop species." Genome DOI: 10.1139/g04-047
  16. Pfeil BE et al (2005) "Placing paleopolyploidy in relation to taxon divergence: a phylogenetic analysis in legumes using 39 gene families." Systematic Biology DOI: 10.1080/10635150590945359
  17. Schlueter JA et al (2008) "ractionation of synteny in a genomic region containing tandemly duplicated genes across Glycine max, Medicago truncatula, and Arabidopsis thaliana." Journal of Heredity DOI: 10.1093/jhered/esn010
  18. Doyle & Egan (2010) "Dating the origins of polyploidy events." New Phytologist doi: 10.1111/j.1469-8137.2009.03118.x
  19. Blanc & Wolfe (2004) "Widespread Paleopolyploidy in Model Plant Species Inferred from Age Distributions of Duplicate Genes" The Plant Cell doi:10.​1105/​tpc.​021345
  20. Schlueter JA et al (2008) "Fractionation of synteny in a genomic region containing tandemly duplicated genes across glycine max, Medicago truncatula, and Arabidopsis thaliana." Journal of Heredity doi: 10.1093/jhered/esn010
  21. Bertioli DJ et al (2009) "An analysis of synteny of Arachis with Lotus and Medicago sheds new light on the structure, stability and evolution of legume genomes" BMC Genomics DOI:10.1186/1471-2164-10-45.
  22. Cannon SB et al (2010) "Polyploidy did not predate the evolution of nodulation in all legumes." PLoS One doi: 10.1371/journal.pone.0011630(
  23. 23.0 23.1 Jiao Y et al (2011) Ancestral polyploidy in seed plants and angiosperms." Nature DOI: 10.1038/nature09916
  24. Gaut BS (1997) "DNA sequence evidence for the segmental allotetraploid origin of maize." Proceedings of the National Academy of Sciences DOI: NA Link.
  25. 25.0 25.1 Swigonova Z et al (2004) "Close Split of Sorghum and Maize Genome Progenitors." Genome Research DOI: 10.1101/gr.2332504
  26. Wei F et al (2007) "Physical and Genetic Structure of the Maize Genome Reflects Its Complex Evolutionary History." PLoS Genetics DOI: 10.1371/journal.pgen.0030123
  27. Schnable JC et al (2011) "Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss." Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1101368108
  28. Paterson AH et al (2004) "Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics." Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.0307901101
  29. Yu J et al (2005) "The Genomes of Oryza sativa: A History of Duplications." PLoS Biology DOI: 10.1371/journal.pbio.0030038
  30. Wang X et al (2011) "Seventy Million Years of Concerted Evolution of a Homoeologous Chromosome Pair, in Parallel, in Major Poaceae Lineages." The Plant Cell DOI: 10.1105/tpc.110.080622
  31. 31.0 31.1 Tang, H et al (2010) "Angiosperm genome comparisons reveal early polyploidy in the monocot lineage." Proceedings of the National Academy of Science DOI:10.1073/pnas.0908007107
  32. 32.0 32.1 32.2 32.3 32.4 D'Hont A et al (2012) "The banana (Musa acuminata) genome and the evolution of monocotyledonous plants." Nature DOI:10.1038/nature11241
  33. Lescot M et al (2008) "Insights into the Musa genome: Syntenic relationships to rice and between Musa species." BMC Genomics doi: 10.1186/1471-2164-9-58
  34. Wang K et al (2012) "The draft genome of a diploid cotton Gossypium raimondii." Nature Genetics doi: 10.1038/ng.2371
  35. The Tomato Genome Consortium (2012) "The tomato genome sequence provides insights into fleshy fruit evolution" Nature doi: 10.1038/nature11119
  36. Schranz & Mitchell-Olds (2006) "Independent Ancient Polyploidy Events in the Sister Families Brassicaceae and Cleomaceae" Plant Cell doi: 10.1105/tpc.106.041111
  37. Barker et al (2009) "Paleopolyploidy in the Brassicales: Analyses of the Cleome Transcriptome Elucidate the History of Genome Duplications in Arabidopsis and Other Brassicales" Genome Biol Evol doi: 10.1093/gbe/evp040