Maize v1 v2

From CoGepedia
Jump to navigation Jump to search
Figure 1. Maize B73 refgen version 1 (x-axis) and version 2 (y-axis). Version 1 has gene models and version 2 is using only genomic sequence. Syntenic pairs (dots) are colored green and blue if in the same or opposite orientation respectively. Analysis can be regenerated at http://genomevolution.org/r/4dq


Figure 2. Syntenic dotplot of maize B73 chromosome 3 between refgen version 1 and version 2. Syntenic gene-pairs (dots) are colored green and blue if in the same or opposite orientation respectively. Note the large inversion near the middle (blue line) and many smaller inversion (blue dots). Results can be regenerated by visiting the master dotplot (http://genomevolution.org/r/4dq) and clicking on the chromosome 3 versus chromosome 3 comparison.
Figure 3. GEvo analysis of 1MB of chromosome 3 from maize between refgen version 1 and 2. Version 1 has gene models and non-CDS sequences masked (purple). Note the sets of genes that have been reoriented in version 2. These show up where regions of sequence similarity (pink blocks) are drawn below the dashed line in both panels. Results can be regenerated at http://genomevolution.org/r/4du

These analyses compare the genomic sequence assemblies of maize B73 refgen versions 1 and 2. Maize was sequenced bac by bac, and bacs were chosen that tile across all of maize's chromosomes. This means that the relative order of most bacs was correctly determined between and within a chromosome. However, the sequences within a bac were often unordered, and the position of contig sequences within a bac relative to one another is not necessarily correct. Therefore version 1 of maize contained many localized misassemblies. Version 2 of maize aimed to correct many of these errors.

Please note that at the time of these analyses, no gene models or annotations were available for version 2 of the maize genome.

To determine the extent of these corrected errors, syntenic dotplots can be generated between two different versions of a genome. SynMap makes these comparisons easy to perform and provides a variety of visualization options to help identify assembly differences. Figure 1 shows a syntenic dotplot between maize genome assemblies refgen v1 and v2. In this dotplot, syntenic regions are given a colored dot (which form lines when the density is high). These dots are colored green and blue if they are in the same or opposite orientations respectively. There are two sets of sytnenic lines in this dotplot. The strong lines that mostly form continuous lines in the chromosome-v-chromosome grids running from the lower-left corner of to the upper-right corner, and several smaller regions with a lower density of dots. The latter regions are from the most recent whole genome duplication event in maize (for additional information on this please see the maize versus sorghum dotplot and splitting the maize genome into its two ancestral genomes.)

This dotplot reveals that the overall structure of these two assemblies is highly similar (for an example of comparing genome assemblies with many more differences, please see medicago version 1 versus version 2.) There is a large obvious inverted region on chromosome 3 (close-up Fig 2), and several breaks in the syntenous line showing areas where sequence was added or removed from the assembly. However, close examination shows many blue dots intermixed with green. These point to regions where a small inversion was made between the two version of maize assemblies. However, at this resolution, it is not possibly to identify small movements of assembled pieces.

High-resolution analysis of these regions can show the details of these inversion as well as changes in the arrangement of contigs. Figure 3 uses GEvo to analyze a 1MB region of chromosome three. Since maize contains many highly repetitive sequences, which will severely obfuscate the results of such pair-wise sequence analyses, maize version 1 has all non-CDS sequences masked (top panel). These masked sequences are denoted by a purple background. Since maize version 2 has no gene annotations, we have to use the entire sequence. While difficult to see in this image, unsequenced regions are denoted with an orange background, and usually represent breaks between contigs. As can be seen in Figure 3, several of the CDSs regions from version 1 of maize have been move relative to neighboring sequences as well as been inverted.

One thing to keep in mind is that SynMap and GEvo are linked together to make it relatively easy to move from whole genome to high-resolution sub-chromosome sequence analyses. SynMap produces whole genome and chromosome level syntenic dotplots, and the chromosome level syntenic dotplot is linked to GEvo by clicking on syntenic pairs in the dotplot. This integrated linking between CoGe's tools is part of its design to create an open-ended analysis network.