Difference between revisions of "Analysis of variations found in genomes of Escherichia coli strain K12 DH10B and strain B REL606 using SynMap and GEvo analysis"

From CoGepedia
Jump to: navigation, search
Line 11: Line 11:
 
[[Image:Dotplot.png|thumb|center|700px]]  
 
[[Image:Dotplot.png|thumb|center|700px]]  
  
If you look closely at the syntenic dotplot of these genomes, you'll notice that the syntenic line is not perfect, and there are many "breaks" or discontinuities between them. In the figure above, these are indicated by a numbered arrow. These breaks in the syntenic path between these genomes are due to genomic changes happening at a larger scale than a single nucleotide polymorphism, and are mostly likely due to the insertion or deletion of a many nucleotide chunk in one of these genomes. In order to accurately account and characterize these or discontinuities in the dotplot, you need to perform a high-resolution analysis of these regions use GEvo. GEvo allows you to run pairwise comparisons between multiple genomic regions where you can specify how big of a genomic region to analyze. More information on GEvo software tool can be found at: [[GEvo]]. To analyse each of these "breaks" in the sytnenic dotplot, first click on the dotplot. This will open a new window with a close-up of the dotplot. When this window appears use the cross-hair locator thar appears when you mouse over the dotplot and place it on the green spot right before a "break". The locator will turn "red" when it is over a gene-pair that can be used as a link to GEvo. When the locator has turned red, click. For example, in order to visualize GEvo analysis of "break" number six, position the locator where the number six arrow points on dotplot. Click when the locator turns "red" or use this address: [http://tinyurl.com/yl4vlbb tinyurl.com/yl4vlbb] to regenerate this particular analysis. [[Image:canvas1.png|thumb|center|700px]]  
+
If you look closely at the syntenic dotplot of these genomes, you'll notice that the syntenic line is not perfect, and there are many "breaks" or discontinuities between them. In the figure above, these are indicated by a numbered arrow. These breaks in the syntenic path between these genomes are due to genomic changes happening at a larger scale than a single nucleotide polymorphism, and are mostly likely due to the insertion or deletion of a many nucleotide chunk in one of these genomes. In order to accurately account and characterize these or discontinuities in the dotplot, you need to perform a high-resolution analysis of these regions use GEvo. GEvo allows you to run pairwise comparisons between multiple genomic regions where you can specify how big of a genomic region to analyze. More information on GEvo software tool can be found at: [[GEvo]]. To analyse each of these "breaks" in the sytnenic dotplot, first click on the dotplot. This will open a new window with a close-up of the dotplot. When this window appears use the cross-hair locator thar appears when you mouse over the dotplot and place it on the green spot right before a "break". The locator will turn "red" when it is over a gene-pair that can be used as a link to GEvo. When the locator has turned red, click. For example, in order to visualize GEvo analysis of "break" number six, position the locator where the number six arrow points on dotplot. Click when the locator turns "red" or use this address: [http://tinyurl.com/yl4vlbb tinyurl.com/yl4vlbb] to regenerate this particular analysis. [[Image:canvas1.png|thumb|center|700px| Red cross hairs positioned at the sixth break of our dotplot. Click to view GEvo page and run GEvo analysis]]  
 
<br>
 
<br>
 
=== Ambreen: Include a picture with red cross hairs at the location what you clicked on for the example you describe below.  ===
 
  
 
After positioning the locator and clicking , a new page for GEvo will appear displaying the sequence information corresponding to our region of interest in the dotplot. These sequences are "anchor" points into these two genomes for specifying the genomic regions to be compared. When linking to GEvo, SynMap automatically sets GEvo to specify using 50,000 nucleotides to the left and right of the anchor point. By default, GEvo will use BlastZ for its sequence comparison algorithm, which is a good choice for identifying large blocks of similar sequence. These settings (~100kb of each genome; BlastZ) usually work well for an initial analysis, and all you need to do is click "Run GEvo Analysis!".  
 
After positioning the locator and clicking , a new page for GEvo will appear displaying the sequence information corresponding to our region of interest in the dotplot. These sequences are "anchor" points into these two genomes for specifying the genomic regions to be compared. When linking to GEvo, SynMap automatically sets GEvo to specify using 50,000 nucleotides to the left and right of the anchor point. By default, GEvo will use BlastZ for its sequence comparison algorithm, which is a good choice for identifying large blocks of similar sequence. These settings (~100kb of each genome; BlastZ) usually work well for an initial analysis, and all you need to do is click "Run GEvo Analysis!".  

Revision as of 16:19, 28 October 2009

In this exercise you will compare the genomes of two Escherichia coli strains, K12 DH10B and B REL606, using whole genome syntenic comparison and high-resolution analyses of specific genomic regions. These analyses will use CoGe's tools SynMap and GEvo respectively, and will reveal evolutionary changes between these two genomes that happened after the divergence of their lineages. While the nucleotide sequence of these genomes is identical over large expanses of their genomes, many other types of large-scale genomic change will be discovered including phage insertions, transposon transposition, and genomic insertion, deletion, inversion, and duplication events. The computational tools used to do these analyses can be used for comparing genomes of any organisms.

First, you are going to identify syntenic regions between these genomes. Syntenic is defined as two or more genomic regions that share a common ancestry and thus are derived from a common ancestor. To do this, you are going to construct a Syntenic dotplot of K12 DH10B and B REL606 using SynMap. Go to SynMap Search for these E. coli strains in CoGe's database by typing in part of the their names in the "Name" search boxes for Organism 1 and Organism 2. For example, search for "DH10B" and "REL606" respectively, or type "escheri" in both boxes. Once CoGe has found organisms matches these names, make sure they are selected in the Organism List. While there are several parameters that can be configured when generating a syntenic dotplot using SynMap, the default settings work well for most situations, and very well for closely related organisms. Click "Generate SynMap" to start the analysis.

Generate synmap.png

A lot of processing is happening behind the scenes, but the general way a syntenic dotplot is created is the genomes are compared to one another in order to find putative homologous genes between them, and then these pairs of genes are processed to find collinear series of genes in both genomes. The general principle is that the most likely and parsimonious way two genomes have a collinear series of homologous genes is those genomic regions in each organism are derived from a common ancestral genomic region (hence they are syntenic).

When finished, SynMap will display a dotplot. Each axis of the dotplot is in nucleotide units and represents one of the two genomes laid end to end. The lower-left corner represents the start of each genome (usually 'ORI' for circular bacterial genomes), and the end of each axis is the end of each chromosome. Each putative homologous gene-pair is drawn as a gray dot on the dotplot with its position corresponding to the genomic position of each gene in their respective genomes. Gene-pairs that have been identified has being syntenic are colored green. The collection of these dots appear as green line, which for the comparison of these two genomes, in nearly continuous along the entire length of both genomes.

Dotplot.png
If you look closely at the syntenic dotplot of these genomes, you'll notice that the syntenic line is not perfect, and there are many "breaks" or discontinuities between them. In the figure above, these are indicated by a numbered arrow. These breaks in the syntenic path between these genomes are due to genomic changes happening at a larger scale than a single nucleotide polymorphism, and are mostly likely due to the insertion or deletion of a many nucleotide chunk in one of these genomes. In order to accurately account and characterize these or discontinuities in the dotplot, you need to perform a high-resolution analysis of these regions use GEvo. GEvo allows you to run pairwise comparisons between multiple genomic regions where you can specify how big of a genomic region to analyze. More information on GEvo software tool can be found at: GEvo. To analyse each of these "breaks" in the sytnenic dotplot, first click on the dotplot. This will open a new window with a close-up of the dotplot. When this window appears use the cross-hair locator thar appears when you mouse over the dotplot and place it on the green spot right before a "break". The locator will turn "red" when it is over a gene-pair that can be used as a link to GEvo. When the locator has turned red, click. For example, in order to visualize GEvo analysis of "break" number six, position the locator where the number six arrow points on dotplot. Click when the locator turns "red" or use this address: tinyurl.com/yl4vlbb to regenerate this particular analysis.
Red cross hairs positioned at the sixth break of our dotplot. Click to view GEvo page and run GEvo analysis


After positioning the locator and clicking , a new page for GEvo will appear displaying the sequence information corresponding to our region of interest in the dotplot. These sequences are "anchor" points into these two genomes for specifying the genomic regions to be compared. When linking to GEvo, SynMap automatically sets GEvo to specify using 50,000 nucleotides to the left and right of the anchor point. By default, GEvo will use BlastZ for its sequence comparison algorithm, which is a good choice for identifying large blocks of similar sequence. These settings (~100kb of each genome; BlastZ) usually work well for an initial analysis, and all you need to do is click "Run GEvo Analysis!".

GEvo.png

Once GEvo analysis appears, we can begin to look for and characterize the differences between these two genomes at this syntenic region. GEvo's results will show two panels, one for each genomic region. The dashed line in the middle of each panel separates the top and bottom strands of DNA. Gene models are drawn as green arrows above and below this line, and clicking on a gene will cause its annotation will appear in a box.


Gene annotation1.png

The pink blocks in these panels are genomic regions identified by BlastZ as being similar in sequence composition. If you click on a pink block, a transparent wedge is drawn connecting it to its partner region in the other genome, and information about the blast hit (also known as an HSP) is shown in an information box.

Ambreen: Show the same figure as the previous one but with one of the pink bars clicked on so that it is connected with its partner region.

Ambreen: For the rest of this tutorial: Write sections that:

1. walks the reader through clicking on all the regions of sequence similarity in order to identify that a portion of one genome is missing. Show picture



2. walks through zooming in on a region by adjust the bars in GEvo. Show picture

3. Characterize the genes that are missing in one by clicking on genes (Don't call them green lines!)

4. Determine what kind of event happened.

I want you do carefully choose several examples and walk through them slowly with details so someone who has never seen/done this before can follow:

1. Phage insertion -- evidence: phage genes in new genes

2. Insertion -- evidence: direct terminal repeats; briefly talk about mechanism. Perhaps create another wiki page describing this using figures as we've drawn on the white-board

2. Deletion -- evidence: transposon at position, mechanism: probably due to insertion of two transposons in same orientation and this happens as an insertion, but in reverse

3. Inversion -- evidence: flipped genes.

Ambreen, the rest of this tutorial is confusing. It feels as though you were running out of steam by this point. I find that the best way to write these (or any scientific discourse) is to start with an outline, and then flush out the details (as well as reorganize). Remember, this needs to be of high-quality and not "to get the task done".


end Eric's comments

Notice the fragments of green line all over the dotplot. These represent translocation events in the E. coli strains we are examining. Place the locator on these and run GEvo to determine which genes were translocated.


Notice the pink bars over the DNA segments. Click on these and it will connect to its syntenic region. A sliding window at the sides of the diagram can be used to magnify a region and enables us to view these genes at a higher resolution. Notice the edges of pink bars connecting syntenic regions. They may run parallel or cross each other. The latter represents an inversion event.

Sliding window1.png
Inversion1.png

Evidence for deletion and insertions can also be found on these genomes using GEvo. At several instants, you will find that the "breaks" in our dotplot corresponds to transposition. Several deletions and insertion events could be explained by transposon activity in these genomes. The DNA segments can also be aligned against each other. This is particularly helpful when locating regions of direct repeats, inverted repeats and determining percent identities between paralogs and orthologs. To align multiple sequences simultaneously, click "+ Add Sequence", copy and paste the name/ID of the organism in the newly created box for additional sequences. Click "Run GEvo Analysis!". The resulting analysis will be color-coded distantly. Click on color-coded bars to find syntenic regions on each sequences.
Addseq.png
Analysis.png
You can also distinguish the DNA segments containing different GC content relative to other parts of genomes. Under GEvo Configuration, click "Results Parameters" and select "Yes" for "Color wobble codon GC content". Click "Run GEvo Analysis!". The region containing different GC content relative to the rest of genome will be color-coded distantly.
GC content1.png

Beware of the of the genes that may not seem syntenic (missing pink bars) at first. It is possible to assume falsely that certain genes are deleted/inserted just because not sufficient area of DNA was considered. To avoid that and locate the potential syntenic regions, change the sequence number on either genomes. Under GEvo Configuration, increase/decrease the number of sequences on left/right. Then click "Run GEvo Analysis!".
Sequence-1.png

Detailed analysis of this syntenic dotplot can be found at Syntenic dotplot