This site attempts to track all plant genomes with published sequences, and at least some of the genomes currently in the process of being sequenced. Genomes are divided into four states:
Questions? Comments? Have we missed a published genome sequence? Get in touch and let me know!
For a table of sequenced plant genomes with additional statistics and information: Plant Genome Statistics
Contents
|
Tree up to date as of January 26th 2012
Graph up to date as of September 19th 2011
A detailed table of plant genome publications.
Amborella (Amborella trichopoda)) is believed to represent the earliest diverging lineage of flowering plants (angiosperms) still alive today. While that doesn't mean it represents the ancestral state of flowering plants, comparing amborella to the major flowering plant lineages -- the eudicots, monocots, and magnoliids (the last of which still doesn't have even a single published genome, someone please get on that!) -- can tell us a lot about what that common ancestor. As the species is found only in New Caledonia and isn't exactly common even there, we are very fortunate that this sole representative species has survived to the present day.
The first draft of the Amborella trichopoda genome was released at the twentieth Plant and Animal Genome Conference in January of 2012. It is composed of 5,745 scaffolds covering 706 megabases out of an estimated total genome size of ~870 megabases. The genome is currently covered by Fort Lauderdale restrictions, but is available for download from the Amborella Sequencing project's website.
The eudicots are the largest group of flowering plants on the planet.
(genome available but unpublished)
Columbine (Aquilegia sp.) comes from a group of eudicots, the Ranunculales, whose ancestors split from the ancestors of the major eudicot groups (like rosids and asterids) a long, long, time ago (somewhere in the neighborhood of 115-130 million years ago). Comparing the columbine genome sequence with other eudicot genomes should be very interesting for several groups of plant biologists (comparative genomicists and evolutionary biologists in particular).
The columbine genome was sequenced to 8-fold coverage by JGI and a pre-publication release of the genome is available for download from phytozome. The current assembly is only to the scaffold level (no pseudomolecules) and consists of 302 megabases of sequence spread over 971 scaffolds. Current gene annotations identify 25,784 genes identified by a mixture of EST sequencing and homology to other sequenced genomes. You can view in CoGe with GenomeView
As with all sequenced angiosperm genomes, columbine has an ancient whole genome duplication. However, is this the paleohexaploidy event shared among the rosids and asterids? Columbine's whole genome duplication
The sugar beet -- a cultivar of the common beet (Beta vulgaris) -- accounts for ~20% of sugar production worldwide and is a favored crop in countries too cold to support a local sugar cane industry including Russia, much of the EU, and most of America. Sugar beets are a relatively recent agricultural innovation with selective breeding of beets for high sugar content only starting in 1784 and production not being adopted on a wide scale until the Napoleonic wars, during which large parts of Europe were essentially cut off from trade with the Caribbean, until then Europe's primary source of sugar from sugar cane.
Beets belong to the Caryophyllales an order of flowering plants which also includes the true cacti and many carnivorous plants. The Caryophyllales are currently believed to be more closely related to the Asterids than the Rosids, but are not included within either group.
The beet genome is currently at version 0.9 and encompasses 590 MB of sequence data split across 82,305 scaffolds and contigs. The genome is based upon a doubled haploid line called KWS2320. Version 1.0 is expected to include made improvements from correction of homopolymer errors introduced by next generation sequencing as well as integrating contigs using a genetic map.
For more information and to download the genome, visit the sugar beet genome sequencing group's website.
The asterids are a group of plants within the eudicots that include species like the solanacious vegetables (Tobacco, Tomato, Potato, Peppers and Eggplant) and the sunflowers. The asterids are currently represented by only one published genome sequence (Potato) but several more unpublished or partial genome sequences are also available.
(genome incomplete)
The tomato (Solanum lycopersicum) genome project is not yet complete. The version of the genome currently loaded into CoGe is assembled into pseudomolecules[1] but does not contain [2]. The most recent assembly is 1.03 which is assembled from 22x coverage sequencing using 454 technology. Read more about the tomato genome project here or see it in GenomeView here.
Potatoes are arguably the second most important non-grass crop grown around the world. Both breeding and genomic analysis of the potato have been hampered by the fact that most cultivated potatoes are recent tetraploids. The genome of potato was published by an international consortium with corresponding authors hailing from the United States, China, and the Netherlands in 2011. It is the first publicly available genome from within the asterid clade. To avoid the complexities introduced by tetraploidy, the genome consortium focused on a diploid potato variety and used doubled-monoploid technology to create an "instantly inbred line." This assembled genome was used as a base to analyze further data generated from a hetrozygous line where a great deal of presence absence variation was detected. The potato lineage has experienced one additional tetraploidy since the ancient hexaploidy shared by the asterids and rosids.
The current genome assembly contains an estimated 86% of the total potato genome, and 74% of the total potato genome has been assembled into 12 pseudomolecules using genetic and physical maps. A total of 39,031 protein coding genes were annotated in the current assembly.
The Genome Paper: The Potato Genome Sequencing Consortium (2011). Genome sequence and analysis of the tuber crop potato. Nature, 475: 189–195 DOI 10.1038/nature10158
(genome incomplete)
The monkey flower (Mimulus guttatus) genome is not yet complete. The version of the genome currently loaded into CoGe is not assembled into pseudomolecules[1] but does contain genome models[2] Read more about the monkey flower genome on phytozomeor see the current assembly in GenomeView here.
Grape diverged early from the two main groups of species in the rosids (eurosids I and eurosids II) and has not experienced any whole genome duplications since that divergence making it an important outgroup for comparisons to other rosid species as well as providing a great resource for studying the ancient hexaploidy that preceeding the radiation of rosid species (and possibly the radiation of eudicot species).
The version of the grape genome in CoGe contains ~500 megabases of sequence and 26346 annotated genes spread across 19 chromosomes.
The genome paper:
Jaillon, O., Aury, J., Noel, B., Policriti, A., Clepet, C., Casagrande, A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., Vezzi, A., Legeai, F., Hugueney, P., Dasilva, C., Horner, D., Mica, E., Jublot, D., Poulain, J., Bruyère, C., Billault, A., Segurens, B., Gouyvenoux, M., Ugarte, E., Cattonaro, F., Anthouard, V., Vico, V., Del Fabbro, C., Alaux, M., Di Gaspero, G., Dumas, V., Felice, N., Paillard, S., Juman, I., Moroldo, M., Scalabrin, S., Canaguier, A., Le Clainche, I., Malacrida, G., Durand, E., Pesole, G., Laucou, V., Chatelet, P., Merdinoglu, D., Delledonne, M., Pezzotti, M., Lecharny, A., Scarpelli, C., Artiguenave, F., Pè, M., Valle, G., Morgante, M., Caboche, M., Adam-Blondon, A., Weissenbach, J., Quétier, F., & Wincker, P. (2007). The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla Nature, 449 (7161), 463-467 DOI: 10.1038/nature06148
(genome unpublished)
One of several species of tree referred to by the common name "Eucalyptus", the rose gum tree (Eucalyptus grandis) is native to Australia, but is considered a candidate for biofuel production in the US. The rose gum tree is a basal rosid, like grape, so in addition to the value of this genome sequence for biofuel breeding purposes, this genome serves as a valuable outgroup for the core rosids (listed as Eurosids 1 and Eurosids 2 on this site).
The rose gum genome was sequenced to 8x coverage by the Joint Genome Institute and is assembled into 11 linkage/chromosome groups. The initial release of the genome includes 691 MB of sequence data, and 41,204 protein coding genes located on the putative chromosome assemblies. Read more/download the sequence from Phytozome.
The publication also included a genetic map with a total length of 581 cM based on 1,885 markers. Resources from this version of the cucumber genome are available at this site.
Independently a group of researchers in the US have released a draft of a cucumber genome sequence of the inbred line Gy14. This version of the genome was assembled from 454 sequencing reads and the current release consists of 203 megabases of sequence and a predicted 21491 protein coding genes spread over 4219 scaffolds. This version is available at Phytozome.
A third version of the cucumber genome, this one of the cultivar Borszczagowski line B10 was produced by a Polish research group and published in 2011 and the resulting data is available here.
The genome paper: Huang, S., Li, R., Zhang, Z., Li, L., Gu, X., Fan, W., Lucas, W., Wang, X., Xie, B., Ni, P., Ren, Y., Zhu, H., Li, J., Lin, K., Jin, W., Fei, Z., Li, G., Staub, J., Kilian, A., van der Vossen, E., Wu, Y., Guo, J., He, J., Jia, Z., Ren, Y., Tian, G., Lu, Y., Ruan, J., Qian, W., Wang, M., Huang, Q., Li, B., Xuan, Z., Cao, J., Asan, ., Wu, Z., Zhang, J., Cai, Q., Bai, Y., Zhao, B., Han, Y., Li, Y., Li, X., Wang, S., Shi, Q., Liu, S., Cho, W., Kim, J., Xu, Y., Heller-Uszynska, K., Miao, H., Cheng, Z., Zhang, S., Wu, J., Yang, Y., Kang, H., Li, M., Liang, H., Ren, X., Shi, Z., Wen, M., Jian, M., Yang, H., Zhang, G., Yang, Z., Chen, R., Liu, S., Li, J., Ma, L., Liu, H., Zhou, Y., Zhao, J., Fang, X., Li, G., Fang, L., Li, Y., Liu, D., Zheng, H., Zhang, Y., Qin, N., Li, Z., Yang, G., Yang, S., Bolund, L., Kristiansen, K., Zheng, H., Li, S., Zhang, X., Yang, H., Wang, J., Sun, R., Zhang, B., Jiang, S., Wang, J., Du, Y., & Li, S. (2009). The genome of the cucumber, Cucumis sativus L. Nature Genetics, 41(12), 1275-1281 DOI: 10.1038/ng.475
The genome paper: Tuskan, G., DiFazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., Ralph, S., Rombauts, S., Salamov, A., Schein, J., Sterck, L., Aerts, A., Bhalerao, R., Bhalerao, R., Blaudez, D., Boerjan, W., Brun, A., Brunner, A., Busov, V., Campbell, M., Carlson, J., Chalot, M., Chapman, J., Chen, G., Cooper, D., Coutinho, P., Couturier, J., Covert, S., Cronk, Q., Cunningham, R., Davis, J., Degroeve, S., Dejardin, A., dePamphilis, C., Detter, J., Dirks, B., Dubchak, I., Duplessis, S., Ehlting, J., Ellis, B., Gendler, K., Goodstein, D., Gribskov, M., Grimwood, J., Groover, A., Gunter, L., Hamberger, B., Heinze, B., Helariutta, Y., Henrissat, B., Holligan, D., Holt, R., Huang, W., Islam-Faridi, N., Jones, S., Jones-Rhoades, M., Jorgensen, R., Joshi, C., Kangasjarvi, J., Karlsson, J., Kelleher, C., Kirkpatrick, R., Kirst, M., Kohler, A., Kalluri, U., Larimer, F., Leebens-Mack, J., Leple, J., Locascio, P., Lou, Y., Lucas, S., Martin, F., Montanini, B., Napoli, C., Nelson, D., Nelson, C., Nieminen, K., Nilsson, O., Pereda, V., Peter, G., Philippe, R., Pilate, G., Poliakov, A., Razumovskaya, J., Richardson, P., Rinaldi, C., Ritland, K., Rouze, P., Ryaboy, D., Schmutz, J., Schrader, J., Segerman, B., Shin, H., Siddiqui, A., Sterky, F., Terry, A., Tsai, C., Uberbacher, E., Unneberg, P., Vahala, J., Wall, K., Wessler, S., Yang, G., Yin, T., Douglas, C., Marra, M., Sandberg, G., Van de Peer, Y., & Rokhsar, D. (2006). The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) Science, 313 (5793), 1596-1604 DOI: 10.1126/science.1128691
Flax (Linum usitatissimum) is an ancient fiber crop grown to produce linen and is also used as an oilseed crop to produce linseed oil (also called, you guessed it "flaxseed oil"). Flax has a small total genome size (estimated to be ~350 megabases) and the current assembly v1.0 was produced entirely by Illumina sequencing. This early assembly consists of a huge number of scaffolds (>88,000) however 290 megabases of the flax genome is present in only 664 scaffolds, a far more manageable number. The flax genome project is a collaboration between BGI and a group of canadian researchers. The flax genome is not yet published, but is available for download through phytozome.
The woodland strawberry (Fragaria vesca) is not the species that produces most of the strawberries you see on grocery store shelves today. Those are generally from the garden strawberry. However garden strawberries are octoploid, making sequencing their genome relatively difficult, while the woodland strawberry possesses a much more manageable diploid genome. For more on the story of how the woodland strawberry came to be sequenced, check out this fascinating story from one of the scientists behind the genome paper.
The published strawberry genome consists of 7 chromosomes/pseudomolecules.
The Genome Paper: Vladimir Shulaev et al., "The genome of woodland strawberry (Fragaria vesca)," Nature Genetics 43: 109-116. DOI: 10.1038/ng.740
The castor bean (Ricinus communis) is an oilseed plant that is the source of castor oil and the deadly poison ricin. The castor bean should not to be confused with the common bean (Phaseolus vulgaris) which is in the Joint Genome Institute sequencing pipeline.
The published castor bean genome is based on a 4.6x coverage of the genome using solexa sequencing.
The current release consists of 31,237 gene models spread across 25,800 scaffolds.
The entire genome is estimated to be ~320 megabases in size and contains 10 chromosomes.
Genome Paper
Agnes P Chan et al., “Draft genome sequence of the oilseed species Ricinus communis,” Nature Biotechiology, DOI 10.1038/nbt.1674
The website of the castor bean sequencing group.
(genome incomplete)
Cassava (Manihot esculenta) is the most important crop that most people in America and Europe have never heard of (except perhaps in the form of tapioca). Originally domesticated in South America, cassava is now an important food source in Southeast Asia and Africa. The current draft genome is made available through phytozome and consists of 416 megabases of sequence spread over 11,243 contigs. This is only a little over 50% of the estimated total size of the cassava genome, but the people involved in the sequencing and assembly believe it represents the majority of the non-repetitive genome. The current release also includes 47,164 predicted genes.
The Apple (Malus x domestica) genome was published in late August of 2010. The total genome is estimated to be 742.3 MB large, spread over 17 chromosomes. The published genome includes 600 megabases of sequence assembled into 17 pseudomolecules and a number of smaller unanchored contigs. The apple genome contains 57,386 putative genes, a high number attributable, at least in part, to a whole genome duplication in the apple lineage which is dated to 30-65 million years ago. The apple genome is not yet loaded into CoGe, and does not yet appear to be available for download, however, there is an available genome browser.
Genome Paper
Riccardo Velasco et al., “The genome of the domesticated apple (Malus [times] domestica Borkh.),” Nature Genetics, DOI: 10.1038/ng.654
The genome of cannabis (Cannabis sativa) was published in Genome Biology in October 2011. The genome sequence was completed using a mixture 454 and Illumina sequencing, with mate pairs used to bridge gaps in the assembled regions. As a result, while only 534 megabases of the genome were assembled the genome spans >786 Mb of sequence (the extra 200 MB are NNNNNN's representing unassembled repeat sequences -- transposons -- of known length between sequenced regions of the genome). In addition to the genome itself, the same research group generated a great deal of tissue specific RNA-seq data from multiple cannabis cultivars.
Genome Paper:
Harm van Bakel et al "The draft genome and transcriptome of Cannabis sativa." Genome Biology, DOI: 10.1186/gb-2011-12-10-r102
Links
(genome published)
Peaches (Prunus persica) are stone fruits, meaning they're closely related to fruits such as plums, apricots, and cherries and nuts like almonds. The 1.0 version of the peach genome assembly was released by the International Peach Genome Initiative on April 1st, 2010. This version of the genome is already assembled into eight pseudomolecules covering the eight chromosomes of peach, as well as ~200 smaller unplaced contigs. The total released sequence is 227 megabases and includes 27,852 annotated genes. The genome was sequenced to 7.7x coverage using Sanger sequencing.Legumes (the plant family Fabaceae) contained within the eurosid II clade. The family is perhaps best known for the fact that many of the species it contains form symbiotic relationships with nitrogen fixing bacteria. The bacteria are sheltered and feed within special nodules in the roots of these plants and in return the plant benefits from the bacteria's ability to convert the nitrogen in our atmosphere into bio-available forms (bioavailable nitrogen is often a limiting nutrient for other plant species).
Medicago (Medicago truncatula) is small legume used as a model species for nodule formation and nitrogen fixing -- as is Lotus. The latest release of the medicago genome is Mt3.0 which includes 240 megabases of sequence associated with Medicago's eight chromosomes, plus 16.6 megabases of unanchored sequence. Read more at International Medicago Genome Annotation Group's webpage.
Genome paper: Young ND et al (2011) The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature DOI: 10.1038/nature10625
Lotus japonicus is a small legume used as a model for nodule formation and nitrogen fixation -- as is Medicago. The current release of the Lotus genome is v2.5 which includes 315 megabases of assembled sequence (an estimated 67% of the genome). In v2.5, 201 megabases of sequence have been assembled into six pseudomolecules corresponding to the six chromosomes of Lotus. Additional statistics and links to download the genome are proved by the Kazusa DNA Research Institute.
Genome paper: Sato S et al (2008) Genome Structure of the Legume, Lotus japonicus. DNA Research DOI: 10.1093/dnares/dsn008
The soybean genome was published in early 2010 and contained 950 megabases of sequence as well as a predicted 46,430 protein coding genes distributed over twenty chromosomes. The ancestors of soybean went through two whole genome duplications since the ancient hexaploidy as the base of the eudicot lineage with the older estimated to have occured 59 million years ago and the more recent estimated to have occured 13 million years ago.
The Genome Paper: Schmutz, J., Cannon, S., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D., Song, Q., Thelen, J., Cheng, J., Xu, D., Hellsten, U., May, G., Yu, Y., Sakurai, T., Umezawa, T., Bhattacharyya, M., Sandhu, D., Valliyodan, B., Lindquist, E., Peto, M., Grant, D., Shu, S., Goodstein, D., Barry, K., Futrell-Griggs, M., Abernathy, B., Du, J., Tian, Z., Zhu, L., Gill, N., Joshi, T., Libault, M., Sethuraman, A., Zhang, X., Shinozaki, K., Nguyen, H., Wing, R., Cregan, P., Specht, J., Grimwood, J., Rokhsar, D., Stacey, G., Shoemaker, R., & Jackson, S. (2010). Genome sequence of the palaeopolyploid soybean Nature, 463 (7278), 178-183 DOI: 10.1038/nature08670
Pigeon peas (Cajanus cajan) are grown in areas with low rainfall as an important source of protein for farmers and an important source of fixed nitrogen in the soil for whichever crop is grown the following year. They are consider an orphan crop (a species of great importance to feeding people around the world -- the main source of protein for 1 BILLION PEOPLE according the the genome paper -- but grown primarily by small farmers in developing countries, which means the species hasn't benefitted from the yield increases that can be produced by modern breeding practices).
The pigeon pea genome was published in Nature Biotechnology in November 2011. The genome was sequenced primarily with Illumina short reads, although assembly was assisted by a number of BAC send sequences produced using traditional Sanger-sequencing long reads. The assembly contains 606 megabases of sequence, a little under three quarters of the estimated total genome size of 833 megabases, and includes an estimated 48,680 genes. While the pidgeon pea genome is made up of 11 chromosomes, the current assembly consists of ~7,000 super scaffolds.
The Genome Paper: Varshney RK et al (2011) Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nature Biotechnology DOI: 10.1038/nbt.2022
The current release of the common bean genome is 0.9 and was assembled from ~20x coverage of the genome using 454 reads (and a smaller number of paired end reads). This assembly included 430 megabases of assembled sequence and as been assembled to the scaffold level (although a solid chromosome-level assembly is promised for version 1.0). Read more and download the genome at phytozome
The first (potentially of several) cotton species to have its genome sequenced is Gossypium raimonddi. G. raimonddi contributes the "D" genome to the allotetraploid cotton species (A + D genomes) G. hirsutum which provides the majority of worldwide cotton production. The genome of G. raimonddi was sequenced by JGI and is available from phytozome but has not yet been published.
The current genome assembly represents ~750 megabases of sequence and 98% of it is incorporated into 13 pseudomolecules and another 22 large unplaced scaffolds (> 50 kb).
The genome of the tree that gives us chocolate Theobroma cacao has been independently sequenced by two groups. One genome assembly, of the variety called Criollo from Belize has been in the Nature Genetics. A second assembly of a breed called Matina 1-6 has available from the Cacao genome database since before the publication of the Criollo genome sequence, but has not yet been published. Both assemblies are complete to the level of pseudomolecules.
Chocolate has not experienced any whole genome duplications since the ancient hexaploidy shared by all sequenced rosids.
Genome Paper (Criollo version):
Xavier Argout et al., "The genome of Theobroma cacao," Nature Genetics 43 (2): 101-108. DOI: 10.1038/ng.736
Citrus fruits from lemons to oranges, grapefruits and pomelos belong to a singe genus. Many fruits we think of as separate species can breed with each other, making it difficult to properly define species barriers.
(genome unpublished and not fully assembled)
The sweet orange (Citrus sinensis) was sequenced using a combination of Sanger (old fashion, expensive, but long and easy to assemble) and 454 (much cheaper, faster, and somewhat shorter) sequencing technology. The current release is only version 0.1 and the genome is still split into 12,574 scaffolds that cover a combined 319 megabases of the sweet orange genome. Unlike the clementine genome described below, the sweet orange genome project used DNA from a diploid individual, making the assembly of the genome somewhat more difficult as inconsistences between aligned sequences might simply be the result of variation between the two genome copies of that diploid individual. This version of the genome release includes 25,376 annotated protein coding genes. You can read more or download data here.
(genome unpublished and not fully assembled)
The genome of a haploid Clementine orange (Citrus clementina) was sequenced by the International Citrus Genome Consortium to a coverage of 6.5-fold. The genome is not yet assembled into pseudomolecules but consists of 1,128 scaffolds containing a total of 296 megabases of sequence data. Genes were predicted using both sequencing of ESTs and homology to the genes of other sequenced plant species, resulting in a total of 25,385 protein coding genes. Download clementine sequence data and annotations from phytozome here.
The papaya genome is estimated to have a size of 372 megabases, spread across nine chromosomes, and contain 28,629 genes. The version of papaya within CoGe is organized into super contigs, but does contain a number of gaps.
The genome paper:
Ming R et al. (2008) The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) Nature, 452 (7190), 991-996 DOI: 10.1038/nature06856
Expect this category to grow substantially over the next year. The planned, in progress, and private genomes category below includes 7 more arabidopsis species and relatives.
The TAIR10 release contains 27,416 protein coding genes, 4827 pseudogenes or transposable element genes and 1359 ncRNAs (33,602 genes in all, 41,671 gene models). A total of 126 new loci and 2099 new gene models were added.
The Arabidopsis genome is ~120 megabases of sequence spread across five chromosomes.
The Genome Paper: The Arabidopsis Genome Initiative (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 408 (6814), 796-815 DOI: 10.1038/35048692
Genome resources:
Arabidopsis lyrata is a close relative of Arabidopsis thaliana. The ancestors of the two species split apart an estimated ten million years ago, making them somewhat closer than maize and sorghum among the grasses. A. lyrata is self-incompatable, while A. thaliana reproduced primarily through self-fertilization. The lyrata genome is also substantially larger than that of thaliana, weighing in at 207 MB, spread across seven chromosomes (compared to thaliana's 5 chromosomes and 125 megabase genome.)
The lyrata genome is available within CoGe, or you can download it from JGI.
Genome Paper: Tina T. Hu et al. (2011) "The Arabidopsis lyrata genome sequence and the basis of rapid genome size change." Nature Genetics 43:476–481 DOI: 10.1038/ng.807
The genome of Brassica rapa was published in Nature Genetics in September 2011 by a consortum of researchers lead by the Beijing Genomics Institute (BGI). While the variety of Brassica rapa sequenced (Chiifu-401-42) is a breed of chinese cabbage, turnips are actually another cultivars of the same species. Brassica rapa is also one of the two parental species of Brassica napus an allotetraploid species which gives us both the vegetable rutabaga and the oil seed crop canola (also known as rapeseed, but seriously, who wants to buy a bottle named "Rape oil"?). Brassica rapa is the first corner of the Triangle of U to be sequenced, however it is likely an assembly of "Brassica oleracea -- a species that includes kale, broccoli, cauliflower, Brussels' sprouts and more -- will not be bar behind.
In addition to the ancient hexaploidy shared by rosids and asterids and the two additional tetraploidies found in the shared Arabidopsis/Brassica lineage, the Brassica lineage experienced an additional ancient hexaploidy for a total duplication of 36 fold (3*2*2*3) relative to the pre-triplication common ancestor of the asterids and rosids.
Genome paper: The Brassica rapa Genome Sequencing Project Consortium. (2011) "The genome of the mesopolyploid crop species Brassica rapa." Nature Genetics DOI: 10.1038/ng.919
Capsella is "the closest well-characterized genus" to arabidopsis and in fact the plants look rather similar to the eye of someone who doesn't study arabidopsis for a living. The best known Capsella species (and the only one with its own wikipedia page as I write this) is Capsella bursa-pastoris, which goes by the common name "shepard's purse." However C. bursa-pastoris is a tetraploid with all the challenges to genome assembly and genetic analysis that entails. Rather than tangle with the tetraploid genetics of bursa-pastoris, JGI instead aimed its sequencers at the closely related sister species Capsella rubella which has a much better behaved diploid genome.
The current assembly of Capsella rubella was generated from 22x sequencing with 454 reads, includes 134 megabases of assembled sequence, and has been assembled to the level of scaffolds. Genes were annotated using alignment of RNA-seq data from Capsella and homology to the genes of other sequenced eudicots.
The current assembly of the Capsella rubella genome is available from phytozome.
Most people hadn't heard of this relative of arabidopsis prior to the publication of its genome in August 2011. Thellungiella, which goes by the common name "salt cress" is of interest because of its greatly increased tolerance for abiotic stresses (salt, cold, etc) relative to its much better studied relative Arabidopsis thaliana. The genome paper reported a genome approximately 140 megabases in size, assembled into seven pseudomolecules and emphasized the role of tandem duplicates in driving the remarkable stress tolerance of this species.
Thellungiella resources:
Genome paper: Maheshi Dassanayake et al (2011) "The genome of the extremophile crucifer Thellungiella parvula." Nature Genetics DOI: 10.1038/ng.889
In addition to being the first non-grass monocot genome to be published (in May of 2011 in the journal Nature Biotechnology, eight years after the first grass genome, rice), the paper describing the date palm (Phoenix dactylifera) genome is also the first scientific paper to pop up when you search of "Phoenix Genome." The current genome assembly includes 380 Megabases of sequence, which is only an estimated 60% of the total date palm genome, although it may include 90% of the gene space. Date palms have two sexes with separate make and female trees. Only the females produce dates so one of the key goals of the genome project was to be able to identify genetic tests to distinguish male and female seedlings, rather than having to wait 5-8 years for the plants to flower -- at which point it becomes obvious which plants are male and which are female.
Download link (at Weill Cornell Medical College).
Genome paper: Eman K Al-Dous et al., (2011) "De novo genome sequencing and comparative genomics of date palm (Phoenix dactylifera)." Nature Biotechnology 29:521–527 DOI: 10.1038/nbt.1860
(genome unreleased)
The argument for sequencing banana isn't to make the lives of comparative genomicists easier, but because of the key role many banana species play in the tropical food production. Bananas are also a target of genetic engineering, since the fact that most cultivated breeds are triploid and unable to reproduce sexually (the reason bananas aren't full of seeds) makes conventional breeding impossible, and bananas suffer from a number of nasty plant pathogens.
As of the 18th Plant and Animal Genome Conference (January 2010) the banana genome project was planning to wrap up sequencing in early 2010, and spend the spring doing gene annotation. The variety of banana they chose for sequencing was "Pahang DH" a breed of Musa acuminata malaccensis with an estimated genome size of 600-700 megabases.
For a long time banana looked like to front runner to be the first non-grass monocot genome published, a title recently claimed by the Date Palm genome. However that doesn't mean we're not still excitingly looking forward to the day the genome of banana is released into the wild!
Trivia: The banana is the most consumed fruit in America, with the average American eating ~25 pounds of bananas per year, a full quarter of the average total fruit consumed per person. Yet almost NO bananas are produced domestically.
The grasses, a family of plants known as the poaceae, can trace their lineages back to a common ancestor that probably lived between 50-70 million years ago, either right before or soon after the extinction of the dinosaurs(dinosaurs didn't eat grass). Since their emergence in the fossil record, the grasses have been extraordinarily successful, becoming one of the largest families of plants on the planet and covering vast swaths of the planet in the form of prairies/savannahs/steppes.
While you may think of grass primarily as the green stuff on lawns and sports fields, remember that grasses also include species like bamboo and the grains that make up so much of what we eat. Either three (rice, wheat, and corn/maize) or four (the same three plus sugar cane) grass species provide more than half of all the calories that feed the worlds population[4], and are the focus of much applied and basic scientific research. Check out the Pan-grass synteny project
The current version of the rice genome in CoGe is v6.1 from MSU (the japonica version of the genome) which contains ~370 megabases of sequence and 40,577 non-transposon related genes spread across 12 chromosomes.
Rice Resources:
The genome paper:
Goff, S. et al. (2002). A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica) Science, 296 (5565), 92-100 DOI: 10.1126/science.1068275
Yu, J. et al. (2002) A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) Science 2006 (5565), 79-92 DOI: 10.1126/science.1068037
The published version of the brachy genome includes 272 megabases of sequence and 25,532 protein coding genes spread across five chromosomes. It was sequenced to a coverage of 9.4× using Sanger sequencing.
Brachy Resources:
The genome paper:
Vogel, J et al (2010). Genome sequencing and analysis of the model grass Brachypodium distachyon Nature, 463 (7282), 763-768 DOI: 10.1038/nature08747
The ancestor of maize went through a Whole genome duplication between 5 and 12 million years ago. In additio, the recent history of maize has included not one but two blooms of transposon activity. The result is a genome that weighs in at ~2.5 gigabases of mostly repetitive sequence, making both sequencing and assembly major challenges.
But the maize genome sequence is now published.
The v1 sequence contains 2.3 gigabases of sequence data. Rather than shotgun sequencing of the entire genome as is now common with smaller less repetitive genomes, maize was sequenced using a BAC[5] by BAC approach. The BACs were lined up to cover the ten chromosomes of maize, and then the sequence contained in each BAC was shotgun sequenced and assembled into contigs. What this means in practice is that a given sequence in the maize genome is usually within 300 kilobases of its correct location, but within that range may be out of order or inverted. If a gene seems to be absent from its syntenic location (or only a portion of the gene is found) it is important to search up to 500 kilobases in either direction around its expected location to make sure the apparent deletion isn't the result of incorrect ordering of the contigs.
This issue was reduced in version 2 of the genome released in the spring of 2010 as over 80% of the contigs in the v2 sequence have data on their order and orientation, up from ~30% in the v1 release.
A word on gene models:
The maize genome was published with two sets of genome annotations, the working gene set and the filtered gene set. These two sets are based on different compromises between catching all the real genes in maize and excluding false genes.
The maize genome is divided among 10 chromosomes.
Maize Resources:
Maize Related CoGepedia Pages:
The genome paper:
Schnable, P et al (2009) The B73 Maize Genome: Complexity, Diversity, and Dynamics Science, 326(5956), 1112-1115 DOI: 10.1126/science.1178534
Companion issue in PLoS Genetics published simultaneously with the genome paper:
PLoS Genetics: 2009 Maize Genome Collection
The sorghum genome was published in 2009. The current version in CoGe (v1.4) contains ~700 megabases of sequence and 34,496 protein coding genes spread over ten chromosomes. The sorghum genome sequence is available from phytozome.
The genome paper: Paterson, A et al. (2009) The Sorghum bicolor genome and the diversification of grasses Nature, 457 (7229), 551-556 DOI: 10.1038/nature07723
(genome released but unpublished)
Foxtail Millet (Setaria italica) is a C4 grass. It is the first species in Paniceae, a tribe of grasses that includes switchgrass and is sister to the Andropogoneae (the tribe that maize and sorghum belong to), to have its genome sequenced. Foxtail millet was domesticated in China is much more distantly related to maize and sorghum. JGI has released an 8x assembly of the foxtail millet genome (which you can read about and download here.) The current assembly is organized in ten pseudomolecules HOWEVER this assembly was based on the sorghum genome and will not reflect inversions and translocations that occured in the Setaria genome after the ancestors of those two species diverged. A more accurate assembly based on a Setaria genetic map is in the works.
The current version of the Setaria genome includes 406 megabases of sequence and 32,095 annotated genes.
Physcomitrella patens is a moss. Mosses, along with liverworts and hornworts, make up the bryophytes a group of plants that have neither flowers nor vascular tissue. We think bryophytes still look a lot like the ancestors of all land plants, but it is important to remember that bryophytes alive today, like Physcomitrella patens, have been evolving from that common ancestor for just as long as rice or arabidopsis. ~450 million years in all three cases.
The Physcomitrella genome was published in early 2008 and consists of 480 megabases of sequence and 35,938 gene models spread over 2,106 scaffolds. (Physcomitrella has 27 chromosomes.) The genome was sequenced to a depth of 8x coverage using Sanger shotgun sequencing.
Physcomitrella resources:
The genome paper:
Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y, Tanahashi T, Sakakibara K, Fujita T, Oishi K, Shin-I T, Kuroki Y, Toyoda A, Suzuki Y, Hashimoto S, Yamaguchi K, Sugano S, Kohara Y, Fujiyama A, Anterola A, Aoki S, Ashton N, Barbazuk WB, Barker E, Bennetzen JL, Blankenship R, Cho SH, Dutcher SK, Estelle M, Fawcett JA, Gundlach H, Hanada K, Heyl A, Hicks KA, Hughes J, Lohr M, Mayer K, Melkozernov A, Murata T, Nelson DR, Pils B, Prigge M, Reiss B, Renner T, Rombauts S, Rushton PJ, Sanderfoot A, Schween G, Shiu SH, Stueber K, Theodoulou FL, Tu H, Van de Peer Y, Verrier PJ, Waters E, Wood A, Yang L, Cove D, Cuming AC, Hasebe M, Lucas S, Mishler BD, Reski R, Grigoriev IV, Quatrano RS, Boore JL. (2008) The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319 (5859):64-9 DOI: 10.1126/science.1150646
Selaginella moellendorffii is a lycophyte, an ancient branch of the plant tree of life. Like mosses, lycophytes do not have flowers, but lycophytes do have a vascular system. Lycophytes are often grouped with ferns as vascular non-seed producing plants. Selaginella has the distinction of currently being the smallest sequenced plant genome (~110 megabases, smaller than Arabidopsis!) and having a dedicated wiki.
Less genomics related, but exciting never the less is that plants that recognizably belong to the Selaginella genus can be found for the last 335-350 million years in the fossil record. That is older than the dinosaurs!
Selaginella resources:
Genome paper: Jo Ann Banks et al. (2011) "The Selaginella Genome Identifies Genetic Changes Associated with the Evolution of Vascular Plants." Science 332:960-963 DOI: 10.1126/science.1203810
Single celled chlorophyte (green alga) found all over the world in many different environments. Important model organism due to its photosynthetic capabilities, methods for genetically modifying it, and short generation time. In addition, these traits make Chlamydomonas a strong candidate as a source for biofuels.
Genome published in Science: http://www.sciencemag.org/content/318/5848/245.short
Available from the Joint Genome Institute: http://genome.jgi-psf.org/Chlre3/Chlre3.home.html