Difference between revisions of "Sequenced plant genomes"

From CoGepedia
Jump to: navigation, search
(Grasses)
Line 126: Line 126:
 
====Maize/Corn====
 
====Maize/Corn====
  
The genome of the species known to most Americans as corn and to biologists and Europeans as maize was published in the second half of 2009. Maize genetics has a history going back more than a century to the early work R. A. Emerson, widely considered the founder of modern maize genetics. The role of maize as an important model system as well as a vital crop might have placed it earlier in the order of plants to have their genomes sequenced if not for the complexity of the genome itself. The ancestors of maize went through a [[whole genome duplication]] between 5 and 12 million years ago. <ref name="maizedup">
+
The genome of the species known to most Americans as corn (''Zea mays'') and to biologists and Europeans as maize was published in the second half of 2009. Maize genetics has a history going back more than a century to the early work R. A. Emerson, widely considered the founder of modern maize genetics. Maize is an important crop species, and the most prominent crop species to engage in C4 photosynthesis (as opposed to the more standard C3 photosynthesis). The role of maize as an important model system as well as a vital crop might have placed it earlier in the order of plants to have their genomes sequenced if not for the complexity of the genome itself.  
 +
 
 +
The ancestor of maize went through a [[whole genome duplication]] between 5 and 12 million years ago. In additio, the recent history of maize has included not one but two blooms of transposon activity. The result is a genome that weighs in at ~2.5 gigabases of mostly repetitive sequence, making both sequencing and assembly major challenges.
 +
 
 +
But the maize genome sequence is now published.
 +
 
 +
The v1 sequence contains 2.3 gigabases of sequence data. Rather than shotgun sequencing of the entire genome as is now common with smaller less repetitive genomes, maize was sequenced using a BAC<ref name="BAC">Bacterial artificial chromosome. A way of break down a genome into managable chunks of ~300 kilobases</ref> by BAC approach. The BACs were lined up to cover the ten chromosomes of maize, and then the sequence contained in each BAC was shotgun sequenced and assembled into contigs. What this means in practice is that a given sequence in the maize genome is usually within 300 kilobases of its correct location, but within that range may be out of order or inverted. If a gene seems to be absent from its syntenic location (or only a portion of the gene is found) it is important to search up to 500 kilobases in either direction around its expected location to make sure the apparent deletion isn't the result of incorrect ordering of the contigs.
 +
 
 +
The next version of the maize sequence (v2) scheduled to be released on April 1st, 2010 should substantially reduce this issue, as over 80% of the contigs in the v2 sequence will have data on their order and orientation, up from ~30% in the v1 release.
 +
 
 +
'''A word on gene models:'''
 +
 
 +
The maize genome was published with two sets of genome annotations, the working gene set and the filtered gene set. These two sets are based on different compromises between catching all the real genes in maize and excluding false genes.
 +
*The filtered gene set (>32,000 genes) are high confidence genes. If it's in the filtered gene set, it's almost certainly a gene, but there is no promise that EVERY real gene is in the filtered gene get
 +
*The working gene set (~100,000 genes) includes all the genes in the filtered gene set, but also many other gene models that have less supporting gene evidence. Almost every real gene is likely included in the working gene set, but so are many things that aren't genes, particularly gene fragments remaining from the maize whole genome duplication, and pieces of genes captured by transposons.
 +
 
 +
The maize genome is divided among 10 chromosomes.
 +
 
 +
'''Maize Resources:'''
 +
 
 +
[http://www.maizegdb.org/ MaizeGDB]
 +
[http://www.maizesequence.org/index.html MaizeSequence.org]
 +
 
 +
'''Maize Related CoGepedia Pages:'''
 +
 
 +
[[Classical Maize Genes]]: ~460 maize genes that we have manually mapped to gene models in the published genome sequence, plus data on syntenic orthologs in rice, sorghum, and brachy, as well as the homeologous region of maize.
 +
[[MaizeGDB and CoGe]]: Explaining how to jump between our site and MaizeGDB
 +
[[Maize Sorghum Syntenic Dotplot]]: How to compare the maize and sorghum genomes.
 +
 
 +
'''The genome paper:'''
 +
 
 +
Schnable, P., Ware, D., Fulton, R., Stein, J., Wei, F., Pasternak, S., Liang, C., Zhang, J., Fulton, L., Graves, T., Minx, P., Reily, A., Courtney, L., Kruchowski, S., Tomlinson, C., Strong, C., Delehaunty, K., Fronick, C., Courtney, B., Rock, S., Belter, E., Du, F., Kim, K., Abbott, R., Cotton, M., Levy, A., Marchetto, P., Ochoa, K., Jackson, S., Gillam, B., Chen, W., Yan, L., Higginbotham, J., Cardenas, M., Waligorski, J., Applebaum, E., Phelps, L., Falcone, J., Kanchi, K., Thane, T., Scimone, A., Thane, N., Henke, J., Wang, T., Ruppert, J., Shah, N., Rotter, K., Hodges, J., Ingenthron, E., Cordes, M., Kohlberg, S., Sgro, J., Delgado, B., Mead, K., Chinwalla, A., Leonard, S., Crouse, K., Collura, K., Kudrna, D., Currie, J., He, R., Angelova, A., Rajasekar, S., Mueller, T., Lomeli, R., Scara, G., Ko, A., Delaney, K., Wissotski, M., Lopez, G., Campos, D., Braidotti, M., Ashley, E., Golser, W., Kim, H., Lee, S., Lin, J., Dujmic, Z., Kim, W., Talag, J., Zuccolo, A., Fan, C., Sebastian, A., Kramer, M., Spiegel, L., Nascimento, L., Zutavern, T., Miller, B., Ambroise, C., Muller, S., Spooner, W., Narechania, A., Ren, L., Wei, S., Kumari, S., Faga, B., Levy, M., McMahan, L., Van Buren, P., Vaughn, M., Ying, K., Yeh, C., Emrich, S., Jia, Y., Kalyanaraman, A., Hsia, A., Barbazuk, W., Baucom, R., Brutnell, T., Carpita, N., Chaparro, C., Chia, J., Deragon, J., Estill, J., Fu, Y., Jeddeloh, J., Han, Y., Lee, H., Li, P., Lisch, D., Liu, S., Liu, Z., Nagel, D., McCann, M., SanMiguel, P., Myers, A., Nettleton, D., Nguyen, J., Penning, B., Ponnala, L., Schneider, K., Schwartz, D., Sharma, A., Soderlund, C., Springer, N., Sun, Q., Wang, H., Waterman, M., Westerman, R., Wolfgruber, T., Yang, L., Yu, Y., Zhang, L., Zhou, S., Zhu, Q., Bennetzen, J., Dawe, R., Jiang, J., Jiang, N., Presting, G., Wessler, S., Aluru, S., Martienssen, R., Clifton, S., McCombie, W., Wing, R., & Wilson, R. (2009). The B73 Maize Genome: Complexity, Diversity, and Dynamics ''Science, 326''(5956), 1112-1115 DOI: [http://dx.doi.org/10.1126/science.1178534 10.1126/science.1178534]
  
 
====Sorghum====
 
====Sorghum====
  
====Foxtail Millet (unpublished)====
+
Sorghum (''Sorghum bicolor'') is an important grain species. A close relative of maize, sorghum is generally considered to be an even more stress tolerant crop. Like maize it carries out C4 photosynthesis. It does not share the recent whole genome duplication seen in maize, which makes it an excellent outgroup from studies of that event in maize as the common ancestor of maize and sorghum is estimated to have lived only 12 million years ago.
 +
 
 +
The sorghum genome was published in 2009. The current version in CoGe (v1.4) contains ~700 megabases of sequence and 34,496 protein coding genes spread over ten chromosomes.
 +
 
 +
'''The genome paper:'''
 +
Paterson, A., Bowers, J., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach, H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A., Schmutz, J., Spannagl, M., Tang, H., Wang, X., Wicker, T., Bharti, A., Chapman, J., Feltus, F., Gowik, U., Grigoriev, I., Lyons, E., Maher, C., Martis, M., Narechania, A., Otillar, R., Penning, B., Salamov, A., Wang, Y., Zhang, L., Carpita, N., Freeling, M., Gingle, A., Hash, C., Keller, B., Klein, P., Kresovich, S., McCann, M., Ming, R., Peterson, D., Mehboob-ur-Rahman, ., Ware, D., Westhoff, P., Mayer, K., Messing, J., & Rokhsar, D. (2009). The Sorghum bicolor genome and the diversification of grasses ''Nature, 457'' (7229), 551-556 DOI: [http://dx.doi.org/10.1038/nature07723 10.1038/nature07723]
 +
 
 +
====Foxtail Millet (in process)====
 +
 
 +
Foxtail Millet (''Setaria italica'') is a C4 grass that is much more distantly related to maize and sorghum. [http://www.jgi.doe.gov/sequencing/why/99178.html the genome has been sequenced by JGI] and is currently listed as completed in October of 2009 with a release data of March 1st 2010. As of March 12th 2010, I haven't been able to find the genome sequence. Please write us if you know more!
  
 
==Non-angiosperms==
 
==Non-angiosperms==
 +
 +
==Footnotes==
 +
{{Reflist}}

Revision as of 21:28, 12 March 2010

This site attempts to track all plant genomes with published sequences, and at least some of the genomes currently in the process of being sequenced

Eudicots

The eudicots are the largest group of flowering plants on the planet.

Asterids

The asterids are a group of plants within the eudicots that include species like the solanacious vegetables (Tobacco, Tomato, Potato, and Eggplant) and the sunflowers.

  • Tomato: The tomato genome project is not yet complete. The version of the genome currently loaded into CoGe is assembled into pseudomolecules[1] but does not contain [2]. Read more about the tomato genome here (add link to their page) or see it in GenomeView here.
  • Potato: The potato genome project is not yet complete. Read more about the potato genome here (add link to their page) or see it in GenomeView here.
  • Monkey Flower: The monkey flower/mimulus genome is not yet complete. The version of the genome currently loaded into CoGe is not assembled into pseudomolecules[3] but does contain genome models[2] Read more about the monkey flower genome here (add link to their page) or see it in GenomeView here.

Rosids

Grape

The genome sequence of the european grape (Vitis vinifera) was published by a group of French and Italian researchers in 2007. The variety of grape sequenced was the Pinot Noir.

Grape diverged early from the two main groups of species in the rosids (eurosids I and eurosids II) and has not experienced any whole genome duplications since that divergence making it an important outgroup for comparisons to other rosid species as well as providing a great resource for studying the ancient hexaploidy that preceeding the radiation of rosid species (and possibly the radiation of eudicot species).

The version of the grape genome in CoGe contains ~500 megabases of sequence and X annotated genes spread across 19 chromosomes.

The genome paper:

Jaillon, O., Aury, J., Noel, B., Policriti, A., Clepet, C., Casagrande, A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., Vezzi, A., Legeai, F., Hugueney, P., Dasilva, C., Horner, D., Mica, E., Jublot, D., Poulain, J., Bruyère, C., Billault, A., Segurens, B., Gouyvenoux, M., Ugarte, E., Cattonaro, F., Anthouard, V., Vico, V., Del Fabbro, C., Alaux, M., Di Gaspero, G., Dumas, V., Felice, N., Paillard, S., Juman, I., Moroldo, M., Scalabrin, S., Canaguier, A., Le Clainche, I., Malacrida, G., Durand, E., Pesole, G., Laucou, V., Chatelet, P., Merdinoglu, D., Delledonne, M., Pezzotti, M., Lecharny, A., Scarpelli, C., Artiguenave, F., Pè, M., Valle, G., Morgante, M., Caboche, M., Adam-Blondon, A., Weissenbach, J., Quétier, F., & Wincker, P. (2007). The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla Nature, 449 (7161), 463-467 DOI: 10.1038/nature06148

Eurosids 1

Cucumber

The genome sequence of cucumber (Cucumis sativus) was published in late 2009. The genome was sequenced using a combination of Illumina short read sequencing (68.3x coverage) and Sanger sequencing (3.9x coverage). The cucumber genome is made up of seven chromosomes, but a large fraction of the published genome sequence is still in unordered contigs. The version of the cucumber genome in CoGe contains ~200 megabases of DNA sequence and X gene models[2] spread over 4219 contigs.

The genome paper: Huang, S., Li, R., Zhang, Z., Li, L., Gu, X., Fan, W., Lucas, W., Wang, X., Xie, B., Ni, P., Ren, Y., Zhu, H., Li, J., Lin, K., Jin, W., Fei, Z., Li, G., Staub, J., Kilian, A., van der Vossen, E., Wu, Y., Guo, J., He, J., Jia, Z., Ren, Y., Tian, G., Lu, Y., Ruan, J., Qian, W., Wang, M., Huang, Q., Li, B., Xuan, Z., Cao, J., Asan, ., Wu, Z., Zhang, J., Cai, Q., Bai, Y., Zhao, B., Han, Y., Li, Y., Li, X., Wang, S., Shi, Q., Liu, S., Cho, W., Kim, J., Xu, Y., Heller-Uszynska, K., Miao, H., Cheng, Z., Zhang, S., Wu, J., Yang, Y., Kang, H., Li, M., Liang, H., Ren, X., Shi, Z., Wen, M., Jian, M., Yang, H., Zhang, G., Yang, Z., Chen, R., Liu, S., Li, J., Ma, L., Liu, H., Zhou, Y., Zhao, J., Fang, X., Li, G., Fang, L., Li, Y., Liu, D., Zheng, H., Zhang, Y., Qin, N., Li, Z., Yang, G., Yang, S., Bolund, L., Kristiansen, K., Zheng, H., Li, S., Zhang, X., Yang, H., Wang, J., Sun, R., Zhang, B., Jiang, S., Wang, J., Du, Y., & Li, S. (2009). The genome of the cucumber, Cucumis sativus L. Nature Genetics, 41(12), 1275-1281 DOI: 10.1038/ng.475

Poplar

The genome sequence of the black cottonwood tree (Populus trichocarpa) was published in 2006. The genome was originally sequenced to a coverage of 7.5x using Sanger sequencing. Poplar was the third plant genome to be published, and is now one of two published genomes of tree species (the other being papaya). Poplar contains a whole genome duplication that is not shared by any other plant species with a sequenced genome. The most recent version of the poplar genome in CoGe is v2 available on Phytozome which includes ~370 megabases of sequence and 41377 protein coding genes spread over 19 chromosomes.

The genome paper: Tuskan, G., DiFazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., Ralph, S., Rombauts, S., Salamov, A., Schein, J., Sterck, L., Aerts, A., Bhalerao, R., Bhalerao, R., Blaudez, D., Boerjan, W., Brun, A., Brunner, A., Busov, V., Campbell, M., Carlson, J., Chalot, M., Chapman, J., Chen, G., Cooper, D., Coutinho, P., Couturier, J., Covert, S., Cronk, Q., Cunningham, R., Davis, J., Degroeve, S., Dejardin, A., dePamphilis, C., Detter, J., Dirks, B., Dubchak, I., Duplessis, S., Ehlting, J., Ellis, B., Gendler, K., Goodstein, D., Gribskov, M., Grimwood, J., Groover, A., Gunter, L., Hamberger, B., Heinze, B., Helariutta, Y., Henrissat, B., Holligan, D., Holt, R., Huang, W., Islam-Faridi, N., Jones, S., Jones-Rhoades, M., Jorgensen, R., Joshi, C., Kangasjarvi, J., Karlsson, J., Kelleher, C., Kirkpatrick, R., Kirst, M., Kohler, A., Kalluri, U., Larimer, F., Leebens-Mack, J., Leple, J., Locascio, P., Lou, Y., Lucas, S., Martin, F., Montanini, B., Napoli, C., Nelson, D., Nelson, C., Nieminen, K., Nilsson, O., Pereda, V., Peter, G., Philippe, R., Pilate, G., Poliakov, A., Razumovskaya, J., Richardson, P., Rinaldi, C., Ritland, K., Rouze, P., Ryaboy, D., Schmutz, J., Schrader, J., Segerman, B., Shin, H., Siddiqui, A., Sterky, F., Terry, A., Tsai, C., Uberbacher, E., Unneberg, P., Vahala, J., Wall, K., Wessler, S., Yang, G., Yin, T., Douglas, C., Marra, M., Sandberg, G., Van de Peer, Y., & Rokhsar, D. (2006). The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) Science, 313 (5793), 1596-1604 DOI: 10.1126/science.1128691

Legumes

Legumes (the plant family Fabaceae) contained within the eurosid II clade. The family is perhaps best known for the fact that many of the species it contains form symbiotic relationships with nitrogen fixing bacteria. The bacteria are sheltered and feed within special nodules in the roots of these plants and in return the plant benefits from the bacteria's ability to convert the nitrogen in our atmosphere into bio-available forms (bioavailable nitrogen is often a limiting nutrient for other plant species).

Medicago
Soybean

Eurosids 2

Papaya

The genome of the papaya tree (Carica papaya) was published in the early 2008. Papaya was one of the earliest crops to be genetically modified (in papaya's case to resist the devastating papaya ringspot virus) and the sequenced genome actually comes from one of the genetically modified varieties (SunUp). The papaya genome was sequenced to a coverage of 3x using Sanger sequencing. Papaya has not experienced further [whole genome duplications] since the ancient hexaploidy shared by all currently sequenced eudicots. As the most closely related species to Arabidopsis with a currently sequenced genome that has not experienced the two subsequence whole genome duplications found in the Arabidopsis lineage, papaya is a useful outgroup, although the ancestors of Arabidopsis and Papaya split ~72 million years ago.

The papaya genome is estimated to be have a size of 372 megabases, spread across nine chromosomes, and contain (X) genes. The version of papaya within CoGe is organized into super contigs, but does contain a number of gaps.

The genome paper:

Ming, R., Hou, S., Feng, Y., Yu, Q., Dionne-Laporte, A., Saw, J., Senin, P., Wang, W., Ly, B., Lewis, K., Salzberg, S., Feng, L., Jones, M., Skelton, R., Murray, J., Chen, C., Qian, W., Shen, J., Du, P., Eustice, M., Tong, E., Tang, H., Lyons, E., Paull, R., Michael, T., Wall, K., Rice, D., Albert, H., Wang, M., Zhu, Y., Schatz, M., Nagarajan, N., Acob, R., Guan, P., Blas, A., Wai, C., Ackerman, C., Ren, Y., Liu, C., Wang, J., Wang, J., Na, J., Shakirov, E., Haas, B., Thimmapuram, J., Nelson, D., Wang, X., Bowers, J., Gschwend, A., Delcher, A., Singh, R., Suzuki, J., Tripathi, S., Neupane, K., Wei, H., Irikura, B., Paidi, M., Jiang, N., Zhang, W., Presting, G., Windsor, A., Navajas-Pérez, R., Torres, M., Feltus, F., Porter, B., Li, Y., Burroughs, A., Luo, M., Liu, L., Christopher, D., Mount, S., Moore, P., Sugimura, T., Jiang, J., Schuler, M., Friedman, V., Mitchell-Olds, T., Shippen, D., dePamphilis, C., Palmer, J., Freeling, M., Paterson, A., Gonsalves, D., Wang, L., & Alam, M. (2008). The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) Nature, 452 (7190), 991-996 DOI: 10.1038/nature06856

Arabidopsis

Arabidopsis thaliana is a poplar model plant species, partially as a result of its short generation time and compact size. The genome of Arabidopsis was also the first plant genome to be published back in 2000. The current release of the Arabidopsis genome is TAIR9:

The TAIR9 release contains 27,379 protein coding genes, 4827 pseudogenes or 
transposable elements and 1312 ncRNAs (33,518 genes in all, 39,640 gene models). 

The Arabidopsis genome is ~120 megabases of sequence spread across five chromosomes.

Genome resources:

The TAIR homepage

Arabidopsis lyrata genome:

Arabidopsis lyrata also has a sequenced, though unpublished, genome. As a close relative of A. thaliana, the lyrata genome is valuable for comparative genomics. A. lyrata also is self-incompatable, while A. thaliana reproduced primarily through self-fertilization. The lyrata genome is available within CoGe.

The 1001 genomes project[4] plans to sequence the genomes of 1001 different varieties of Arabidopsis. Currently 88 are available with more in progress.

The Genome Paper: The Arabidopsis Genome Initiative (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 408 (6814), 796-815 DOI: 10.1038/35048692

Monocots

  • Banana

Grasses

The grasses, a family of plants known as the poaceae, can trace their lineages back to a common ancestor that probably lived between 50-70 million years ago, either right before or soon after the extinction of the dinosaurs(dinosaurs didn't eat grass). Since their emergence in the fossil record, the grasses have been extraordinarily successful, becoming one of the largest families of plants on the planet and covering vast swaths of the planet in the form of prairies/savannahs/steppes.

While you may think of grass primarily as the green stuff on lawns and sports fields, remember that grasses also include species like bamboo and the grains that make up so much of what we eat. Either three (rice, wheat, and corn/maize) or four (the same three plus sugar cane) grass species provide more than half of all the calories that feed the worlds population[5], and are the focus of much applied and basic scientific research. Check out the Pan-grass synteny project

Rice

Rice (Oryza sativa) was the second published plant genome after Arabidopsis, making it the first monocot genome, the first grass genome, the first food crop genome, and the first grain genome (and probably a whole lot of other firsts as well). The original published genome (published in 2002) was from the subspecies Oryza sativa japonica, however the genome of the other primary subspecies Oryza sativa indica has also been sequenced.

The current version of the rice genome in CoGe is v6.1 from MSU which contains ~370 megabases of sequence and 40,577 non-transposon related genes spread across 12 chromosomes.

Rice Resources:

The genome paper:

Goff, S. et al. (2002). A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica) Science, 296 (5565), 92-100 DOI: 10.1126/science.1068275

Brachypodium

The brachy genome (Brachypodium distachyon) was published in early 2010. Brachypodium is a small temperate grass native found around the Mediterranean, and east into India. Its choice as a model organism was based on small physical size, quick generation time, and small genome (a lot of the same reasons as Arabidopsis) as well as its membership in the Pooideae, a group of grass species that also includes important crop species: wheat, barley, rye, and oats all species whose genomes have not yet been sequenced (although the last common ancestor of brachy and these important crop species is estimated to have lived >30 million years ago). Brachy's genome is currently the only published genome of a non-domesticated grass and the only temperate (as opposed to tropical) grass.

The published version of the brachy genome includes 272 megabases of sequence and 25,532 protein coding genes spread across five chromosomes. It was sequenced to a coverage of 9.4× using Sanger sequencing.

Brachy Resources:

The genome paper:

Vogel, J., Garvin, D., Mockler, T., Schmutz, J., Rokhsar, D., Bevan, M., Barry, K., Lucas, S., Harmon-Smith, M., Lail, K., Tice, H., Schmutz (Leader), J., Grimwood, J., McKenzie, N., Bevan, M., Huo, N., Gu, Y., Lazo, G., Anderson, O., Vogel (Leader), J., You, F., Luo, M., Dvorak, J., Wright, J., Febrer, M., Bevan, M., Idziak, D., Hasterok, R., Garvin, D., Lindquist, E., Wang, M., Fox, S., Priest, H., Filichkin, S., Givan, S., Bryant, D., Chang, J., Mockler (Leader), T., Wu, H., Wu, W., Hsia, A., Schnable, P., Kalyanaraman, A., Barbazuk, B., Michael, T., Hazen, S., Bragg, J., Laudencia-Chingcuanco, D., Vogel, J., Garvin, D., Weng, Y., McKenzie, N., Bevan, M., Haberer, G., Spannagl, M., Mayer (Leader), K., Rattei, T., Mitros, T., Rokhsar, D., Lee, S., Rose, J., Mueller, L., York, T., Wicker (Leader), T., Buchmann, J., Tanskanen, J., Schulman (Leader), A., Gundlach, H., Wright, J., Bevan, M., Costa de Oliveira, A., da C. Maia, L., Belknap, W., Gu, Y., Jiang, N., Lai, J., Zhu, L., Ma, J., Sun, C., Pritham, E., Salse (Leader), J., Murat, F., Abrouk, M., Haberer, G., Spannagl, M., Mayer, K., Bruggmann, R., Messing, J., You, F., Luo, M., Dvorak, J., Fahlgren, N., Fox, S., Sullivan, C., Mockler, T., Carrington, J., Chapman, E., May, G., Zhai, J., Ganssmann, M., Guna Ranjan Gurazada, S., German, M., Meyers, B., Green (Leader), P., Bragg, J., Tyler, L., Wu, J., Gu, Y., Lazo, G., Laudencia-Chingcuanco, D., Thomson, J., Vogel (Leader), J., Hazen, S., Chen, S., Scheller, H., Harholt, J., Ulvskov, P., Fox, S., Filichkin, S., Fahlgren, N., Kimbrel, J., Chang, J., Sullivan, C., Chapman, E., Carrington, J., Mockler, T., Bartley, L., Cao, P., Jung, K., Sharma, M., Vega-Sanchez, M., Ronald, P., Dardick, C., De Bodt, S., Verelst, W., Inzé, D., Heese, M., Schnittger, A., Yang, X., Kalluri, U., Tuskan, G., Hua, Z., Vierstra, R., Garvin, D., Cui, Y., Ouyang, S., Sun, Q., Liu, Z., Yilmaz, A., Grotewold, E., Sibout, R., Hematy, K., Mouille, G., Höfte, H., Michael, T., Pelloux, J., O’Connor, D., Schnable, J., Rowe, S., Harmon, F., Cass, C., Sedbrook, J., Byrne, M., Walsh, S., Higgins, J., Bevan, M., Li, P., Brutnell, T., Unver, T., Budak, H., Belcram, H., Charles, M., Chalhoub, B., & Baxter, I. (2010). Genome sequencing and analysis of the model grass Brachypodium distachyon Nature, 463 (7282), 763-768 DOI: 10.1038/nature08747

Maize/Corn

The genome of the species known to most Americans as corn (Zea mays) and to biologists and Europeans as maize was published in the second half of 2009. Maize genetics has a history going back more than a century to the early work R. A. Emerson, widely considered the founder of modern maize genetics. Maize is an important crop species, and the most prominent crop species to engage in C4 photosynthesis (as opposed to the more standard C3 photosynthesis). The role of maize as an important model system as well as a vital crop might have placed it earlier in the order of plants to have their genomes sequenced if not for the complexity of the genome itself.

The ancestor of maize went through a whole genome duplication between 5 and 12 million years ago. In additio, the recent history of maize has included not one but two blooms of transposon activity. The result is a genome that weighs in at ~2.5 gigabases of mostly repetitive sequence, making both sequencing and assembly major challenges.

But the maize genome sequence is now published.

The v1 sequence contains 2.3 gigabases of sequence data. Rather than shotgun sequencing of the entire genome as is now common with smaller less repetitive genomes, maize was sequenced using a BAC[6] by BAC approach. The BACs were lined up to cover the ten chromosomes of maize, and then the sequence contained in each BAC was shotgun sequenced and assembled into contigs. What this means in practice is that a given sequence in the maize genome is usually within 300 kilobases of its correct location, but within that range may be out of order or inverted. If a gene seems to be absent from its syntenic location (or only a portion of the gene is found) it is important to search up to 500 kilobases in either direction around its expected location to make sure the apparent deletion isn't the result of incorrect ordering of the contigs.

The next version of the maize sequence (v2) scheduled to be released on April 1st, 2010 should substantially reduce this issue, as over 80% of the contigs in the v2 sequence will have data on their order and orientation, up from ~30% in the v1 release.

A word on gene models:

The maize genome was published with two sets of genome annotations, the working gene set and the filtered gene set. These two sets are based on different compromises between catching all the real genes in maize and excluding false genes.

  • The filtered gene set (>32,000 genes) are high confidence genes. If it's in the filtered gene set, it's almost certainly a gene, but there is no promise that EVERY real gene is in the filtered gene get
  • The working gene set (~100,000 genes) includes all the genes in the filtered gene set, but also many other gene models that have less supporting gene evidence. Almost every real gene is likely included in the working gene set, but so are many things that aren't genes, particularly gene fragments remaining from the maize whole genome duplication, and pieces of genes captured by transposons.

The maize genome is divided among 10 chromosomes.

Maize Resources:

MaizeGDB MaizeSequence.org

Maize Related CoGepedia Pages:

Classical Maize Genes: ~460 maize genes that we have manually mapped to gene models in the published genome sequence, plus data on syntenic orthologs in rice, sorghum, and brachy, as well as the homeologous region of maize. MaizeGDB and CoGe: Explaining how to jump between our site and MaizeGDB Maize Sorghum Syntenic Dotplot: How to compare the maize and sorghum genomes.

The genome paper:

Schnable, P., Ware, D., Fulton, R., Stein, J., Wei, F., Pasternak, S., Liang, C., Zhang, J., Fulton, L., Graves, T., Minx, P., Reily, A., Courtney, L., Kruchowski, S., Tomlinson, C., Strong, C., Delehaunty, K., Fronick, C., Courtney, B., Rock, S., Belter, E., Du, F., Kim, K., Abbott, R., Cotton, M., Levy, A., Marchetto, P., Ochoa, K., Jackson, S., Gillam, B., Chen, W., Yan, L., Higginbotham, J., Cardenas, M., Waligorski, J., Applebaum, E., Phelps, L., Falcone, J., Kanchi, K., Thane, T., Scimone, A., Thane, N., Henke, J., Wang, T., Ruppert, J., Shah, N., Rotter, K., Hodges, J., Ingenthron, E., Cordes, M., Kohlberg, S., Sgro, J., Delgado, B., Mead, K., Chinwalla, A., Leonard, S., Crouse, K., Collura, K., Kudrna, D., Currie, J., He, R., Angelova, A., Rajasekar, S., Mueller, T., Lomeli, R., Scara, G., Ko, A., Delaney, K., Wissotski, M., Lopez, G., Campos, D., Braidotti, M., Ashley, E., Golser, W., Kim, H., Lee, S., Lin, J., Dujmic, Z., Kim, W., Talag, J., Zuccolo, A., Fan, C., Sebastian, A., Kramer, M., Spiegel, L., Nascimento, L., Zutavern, T., Miller, B., Ambroise, C., Muller, S., Spooner, W., Narechania, A., Ren, L., Wei, S., Kumari, S., Faga, B., Levy, M., McMahan, L., Van Buren, P., Vaughn, M., Ying, K., Yeh, C., Emrich, S., Jia, Y., Kalyanaraman, A., Hsia, A., Barbazuk, W., Baucom, R., Brutnell, T., Carpita, N., Chaparro, C., Chia, J., Deragon, J., Estill, J., Fu, Y., Jeddeloh, J., Han, Y., Lee, H., Li, P., Lisch, D., Liu, S., Liu, Z., Nagel, D., McCann, M., SanMiguel, P., Myers, A., Nettleton, D., Nguyen, J., Penning, B., Ponnala, L., Schneider, K., Schwartz, D., Sharma, A., Soderlund, C., Springer, N., Sun, Q., Wang, H., Waterman, M., Westerman, R., Wolfgruber, T., Yang, L., Yu, Y., Zhang, L., Zhou, S., Zhu, Q., Bennetzen, J., Dawe, R., Jiang, J., Jiang, N., Presting, G., Wessler, S., Aluru, S., Martienssen, R., Clifton, S., McCombie, W., Wing, R., & Wilson, R. (2009). The B73 Maize Genome: Complexity, Diversity, and Dynamics Science, 326(5956), 1112-1115 DOI: 10.1126/science.1178534

Sorghum

Sorghum (Sorghum bicolor) is an important grain species. A close relative of maize, sorghum is generally considered to be an even more stress tolerant crop. Like maize it carries out C4 photosynthesis. It does not share the recent whole genome duplication seen in maize, which makes it an excellent outgroup from studies of that event in maize as the common ancestor of maize and sorghum is estimated to have lived only 12 million years ago.

The sorghum genome was published in 2009. The current version in CoGe (v1.4) contains ~700 megabases of sequence and 34,496 protein coding genes spread over ten chromosomes.

The genome paper: Paterson, A., Bowers, J., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach, H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A., Schmutz, J., Spannagl, M., Tang, H., Wang, X., Wicker, T., Bharti, A., Chapman, J., Feltus, F., Gowik, U., Grigoriev, I., Lyons, E., Maher, C., Martis, M., Narechania, A., Otillar, R., Penning, B., Salamov, A., Wang, Y., Zhang, L., Carpita, N., Freeling, M., Gingle, A., Hash, C., Keller, B., Klein, P., Kresovich, S., McCann, M., Ming, R., Peterson, D., Mehboob-ur-Rahman, ., Ware, D., Westhoff, P., Mayer, K., Messing, J., & Rokhsar, D. (2009). The Sorghum bicolor genome and the diversification of grasses Nature, 457 (7229), 551-556 DOI: 10.1038/nature07723

Foxtail Millet (in process)

Foxtail Millet (Setaria italica) is a C4 grass that is much more distantly related to maize and sorghum. the genome has been sequenced by JGI and is currently listed as completed in October of 2009 with a release data of March 1st 2010. As of March 12th 2010, I haven't been able to find the genome sequence. Please write us if you know more!

Non-angiosperms

Footnotes

  1. Groupings of dna sequence that correspond to the individual chromosomes of an organisms
  2. 2.0 2.1 2.2 Need to define gene models in tomato entry
  3. Cite error: Invalid <ref> tag; no text was provided for refs named psuedomolecules
  4. Literally one uping the 1000 genome project that plans to sequence the genomes of 1000 people
  5. Estimated to be 6.7 billion people as of early 2010
  6. Bacterial artificial chromosome. A way of break down a genome into managable chunks of ~300 kilobases