Analysis of differences found between Escherichia coli strain K12 DH10B and strain B REL606 using SynMap and GEvo analysis

From CoGepedia
Jump to: navigation, search

In this exercise you will compare the genomes of two Escherichia coli strains, K12 DH10B and B REL606 using SynMap and GEvo analysis. In addition, we will observe the differences between these two genomes as a result of lineage divergence of the two E-coli strains. The computational tools used to do this analysis can be used for comparing genomes of any species. In two closely related bacterial genomes, for instance, several differences could be found such as transposition, insertions, deletions, duplications, inversion and translocations.


First, we need to construct a syntenic dotplot of K12 DH10B and B REL606 using SynMap. Go to SynMap Search for E-coli strain K12 DH10B and E-coli strain B REL606 in the database of Organism 1 and Organism 2 respectively. Click "Generate SynMap". This program will lay the two genomes on the axes and indicate regions of similarities between the two as green dots on a syntenic dotplot. The collection of these dots appear as green line.
Generate synmap.png
Dotplot.png


In order to accurately account for the "breaks" or discontinuities in the dotplot, we need to run GEvo analysis. GEvo uses multiple algorithms to run comparisons between the two genomic regions. More information on GEvo software tool can be found at: GEvo. The discontinuities in this syntenic dotplot represent the sites of insertions or deletions. To analyse each of these "breaks" in dotplot, use the locator of the dotplot to click on the green spot right before a "break". The locator will turn "red" once you have placed it on basepairs/green dots.

After clicking, a new page of GEvo will appear displaying the sequence information corresponding to our region of interest in the dotplot. Click "Run GEvo Analysis!". This will allow you to visualize and compare the genetic make-up of strain B REL606 and strain K DH10B at a given region. In this case, our region of interest corresponds to a discontunity in our dotplot.
GEvo.png

Once GEvo analysis appears, we can begin to look for differences between the two genomes. Click on individual genes/green bars and its annotation will appear in a box. Repeat the above mentioned steps for each discontinuity in dotplot for individual analysis.
Gene annotation1.png
. Notice the fragments of green line all over the dotplot. These represent translocation events in the E-coli strains we are examining. Place the locator on these and run GEvo to determine which genes were translocated.
Notice the pink bars over the DNA segments. Click on these and it will connect to its syntenic region. A sliding window at the sides of the diagram can be used to magnify a region and enables us to view these genes at a higher resolution. Notice the edges of pink bars connecting syntenic regions. They may run parallel or cross each other. The latter represents an inversion event.
Sliding window1.png
Inversion1.png

Evidence for deletion and insertions can also be found on these genomes using GEvo. At several instants, you will find that the "breaks" in our dotplot corresponds to transposition. Several deletions and insertion events could be explained by transposon activity in these genomes. You can also distinguish the DNA segments containing different GC content relative to other parts of genomes. Under GEvo Configuration, click "Results Parameters" and select "Yes" for "Color wobble codon GC content". Click "Run GEvo Analysis!". The region containing different GC content relative to the rest of genome will appear red.
GC content1.png

Beware of the of the genes that may not seem syntenic (missing pink bars) at first. For locating the potential syntenic regions, change the sequence number on either genomes. Under GEvo Configuration, increase/decrease the number of sequences on left/right. Then click "Run GEvo Analysis!".
Sequence-1.png

The DNA segments can also be aligned against each other. This is particularly helpful when locating regions of direct repeats, inverted repeats and determining percent identities between paralogs and orthologs. To align multiple sequences simultaneously, click "+ Add Sequence", copy and paste the name/ID of the organism in the newly created box for additional sequences. Click "Run GEvo Analysis!". The resulting analysis will be color-coded distantly. Click on color-coded bars to find syntenic regions on each sequences.
Addseq.png
Analysis.png

Detailed analysis of this syntenic dotplot can be found at Syntenic dotplot