For this analysis, every pairwise sequence comparison was generated using blastz (Schwartz et al., 2003). However, there are several alignment algorithms to chose from including blastn (Altschul et al., 1990), tblastx, blastz, chaos (Brudno et al., 2003), lagan (Brudno et al., 2004), and dialign (Brudno et al., 2003).


You can find a more detailed description of how to interpret GEvo results here, but for a brief overview:

  1. 1.Each genomic region has its own image (labeled At, Vv, Cp, Pt1, and Pt2)

  2. 2.Each genomic region has gene models (if available) drawn as composite colored arrows:

  3. If a gene model is present, you can click on it to get its annotations in the info box.

  4. 3.Background of genomic region is colored orange to show unsequenced regions (Ns) and purple for masked repetitive sequences. (We've done our own masking for some genomes and marked any sequence with an occurrence of more than 50x as Xs. Both masked and unmasked genomes are available in the CoGe system.)

  5. 4.Blast (or other alignment algorithms') Hits (aka HSPs) are drawn as colored boxes, which each color denoting one pair-wise comparison. Hits in the (++) and (+-)orientation are drawn above and below the dashed line respectively. A line connecting a pair of Blast hits is drawn when a blast hit is clicked.

  6. 5.There is a movable information box on the right of the image that displays information about genes and blast hits when they are clicked.

  7. 6.Links to the various data files are displayed at the bottom of the results.

  8. 7.A link to regenerate the analysis is provided at the bottom of the results under "GEvo link" and can be used to regenerate your analysis later as results are stored on our servers for ~24 hours.

  9. 8.A summary of blast hits overlapping features is displayed at the bottom of the results.

Once the analysis is complete (less than a minute for this example), the results are shown:

The image above shows a close-up of just the genomic images from the results in GEvo. When analyzing multiple genomic regions, it is usually important to get yourself oriented. The first thing you will want to do is to orient each genomic region so that all the blast hits are in the (++) orientation. Since hits in the (+-) orientation are shown below the dashed line, it is easy to figure out which regions need to be "flipped". In this example, At and Vv are in the same orientation (pink Blast hits), but all the other regions are in the (+-) orientation in relation to At and Vv. To flip a region, go to the "Sequence Submission" box below the results, and click on the "Sequence Options" for each sequence you want to change. That will open a menu for that sequence. Then select "Yes" for "Reverse complement:"


When you have finished specifying which sequence you wish to be reverse complemented, press the "Go" button at the top of the "Options:" area of GEvo to rerun your analysis.

Above are the new the results. Here you can see that all the Blast hits for all sequences are in the (++) orientation. Now you can evaluate these regions for synteny.


First you can see that all regions have a high degree of sequence similarity (i.e. many blast hits) to one another. However, At is a bit of an exception and you see sequence similarity only for the central set of three genes. Fortunately, this is to be expected because of our knowledge of At's genome evolution and structure.


If you recall, At has had two sequential genome duplication events since it diverged from the other lineages. Following such events, its genome has fractionated to a significant degree whereby most duplicated genes are lost from one duplicated genomic region or the other (Thomas et al., 2006).


Also, At's genome is small for a plant genome. This is due to mechanisms that result in genomes with smaller introns and less intergenic sequence. Since the GEvo link used the same amount of additional genomic sequence around each anchored position, each region in the results has approximately the same amount of total sequence (~40kb). This means that if two genomes have different gene densities, they will contain different numbers of genes for similarly size genomic regions. As you can see (above and also summarized in the "Overlap Feature Stats"), At's region has 13 genes, Vv has 6, Cp has 7, Pt1 has 7, and Pt2 has 7. Also, you can see that in many cases At's genes are physically smaller.


To evaluate these regions for synteny, you can simply click on a blast hit and a line will be drawn connecting it to its partner region. If you hold the "shift key" then click, GEvo will draw lines for all the blast hits in that track. If you see a series of blast hits between two sequences that overlap many genes, you probably have detected synteny.

Above shows lines connecting blastz hits between At and Vv. There are two things to note. The first is that the left At gene matches two Vv genes. At is the "best" annotated plant genome and usually has the most experimental annotation data (e.g. cDNA sequences). However, At does contain annotation errors. Since the gene model in At has blue that extends beyond the coding regions (yellow/green), this indicates that the mRNA has UTRs which were most likely experimentally determined. However, you'll note that Vv also has blue that extends beyond (in this case, way beyond) the coding region. However, At's gene model is likely to be correct, and Vv's gene was incorrectly split into two.


Are these regions syntenic? You have two genes in At that have sequence similarity to Vv and their order in the region is conserved. Remember that even though Vv contains intervening genes, this is to be expected since At has had two genome duplication events while Vv has none since their divergence. To test for synteny: 1. analyze a large genomic region. 2. compare multiple genomic regions. Since you have other genomic regions, let's first analyze those, then expand the region.

Above shows lines connecting blast hits between At and Cp, and Vv and Cp. Here, you can see a stronger synteny signal. At-Cp has three putatively homologous genes, and Cp-Vv has 5 genes. However, to be convinced, let's look at a much larger region. You can do this by padding all sequences with 150,000bp and rerunning the analysis (or specifying additional sequences to the left and right of your anchored positions):

Since you are using much larger genomic regions, the analysis may take a couple of minutes to complete. Above shows lines connect blast hits between At and Cp(pink), At-Vv(red), and Vv and Cp(green). Here, there is no denying synteny. Although syntenic regions can stretch for megabases, using GEvo to analyze such large regions is usually not practical because the results take a while to generate (image generation can be quite slow) and it is difficult to visualize. However, starting with a large region and then selecting a sub-region for further analysis is one of the things GEvo does well. Also, note that there appears to be a gene in At that was transposed to a new region. Although the Pt genomic regions aren't shown in the figure above, you can check if the Vv gene is syntenic with Pt (which it is).


To select a sub-region, just move the slider-bars in each region to where you want them. This will change the "left/right" additional sequence for each sequence. You will also want to turn the "Pad Sequence" option to 0:


Above shows a screenshot of all the regions being resized:


Shown above are the results after the analysis is rerun (http://tinyurl.com/2s28c3) for your trimmed analysis with syntenic lines drawn between Vv and Cp. Just by looking at the various regions of sequence similarity and underling gene models, there are several regions with possible annotation errors. Some of which have been circled.


Next, since you know that At has had two sequential genome duplication events since its divergence from Cp, let's see if you can find those regions using other tools of CoGe. This is actually an academic exercise since we provide a pre-made list of GEvo links to anchor any one Cp region to what are often four At syntenic regions. This list, and others, are available for download from the green menu bar in the upper right part of CoGe's screen.

Linking to GEvo


The GEvo link will take you to CoGe (and possibly ask you to log into the system), automatically configure an analysis using the genomic anchors from the spread sheet, and start running the analysis: