Bacteria Genomic Inversion Shewanella baltica

From CoGepedia
Jump to: navigation, search
Syntenic dotplot generated by SynMap between two strains of Shewanella baltica. Strain OS155 is on the x-axis; strain OS185 is on the y-axis. Results can be regenerated at;dsgid2=2360;D=20;g=10;A=5;w=0;b=1;ft1=1;ft2=1;dt=geneorder
GEvo analysis at an inversion breakpoint identified in the syntenic dotplot. Pairwise regions of sequence similarity are identified by pairs of colored blocks, some of which are connected by transparent wedges. Note the repeat sequences in strain OS155 at the inversion breakpoint. This is evidence that OS155 had the inversion event. Results can be regenerated at:
Gevo analysis between two strains of Shewanella baltica, OS155 and OS185 showing many genomic differences in a syntenic region that are likely caused by insertions of DNA elements into the genome of OS155. At each putative insertion site, there is a duplication of sequence present in OS185. This is a signature of an insertion event causing target-site duplication of genomic sequence. Results can be regenerated at:

It is easy to visualize genomic inversions with a Syntenic dotplot. Since Syntenic regions appear as colored lines in the syntenic dotplot generated by SynMap, inversions are seen when a syntenic line has a discontinuity or break, and there is an adjacent line with an opposite slope (e.g. a positively sloping line has a break and adjacent to it is a negatively sloping line).

In the example shown here, SynMap colored the syntenic lines green if they have a positive slope and red if they have a negative slope. The syntenic dotplot on the right has a series of green lines starting in the lower-left hand corner that run to the upper right-hand corner, and a series of red lines starting in the upper left-hand corner and running to the lower-right hand corner. Together, they form an "X" and is characteristic pattern of bacterial inversion events. This is called an X-alignment. If you grab a ruler and hold it up to the dotplot at the end of one of these syntenic lines (either horizontally or vertically), you'll find that the missing piece is always found on the other side of the genome. For example, if you line a ruler at the end of a green line, you'll find that it will match with the end of a red line (and vice-versa). SynMap's user interface for interacting with dotplots has cross-hairs that makes it easy to determine how the end of one syntenic region lines with the end of another syntenic region.

Each of these large breaks in synteny is caused by an inversion event that happened in one of the two genomes. If you can imagine flipping the center red line 180 degrees, it will fit perfectly in the space of the two green lines neighboring it. Likewise, if you were to take the region encompassing those two green pieces and flipping it 180 degrees, it would fit perfectly in the next set of outer red syntenic regions. You can imagine doing this till you reach the end of the genome (though keep in mind that bacterial genomes are circular and are represented linearly in CoGe). If you were to do this by starting the middle, it should take you 5 flips to reach the end. That means there were a combined total of 5 inversion events that happened in both of these since they diverged from one another. However, it is not possible to tell which genome had a given inversion event just looking at the dotplot. To make such an assessment we need to examine the ends of an inversion in both genomes and try to find a signature leftover from the inversion event in one of the genomes. Inversions usually happen between regions with some degree of sequence similarity and/or through a double-strand break repair mechanism. In both cases, there will be repeat sequences at end of the region of the inversion. If we are lucky, only one genome will have repeat sequences at the inversion breakpoints.

CoGe's tool GEvo is the perfect tool to perform such a high-resolution sequence comparisons across multiple genomic regions. SynMap's interface for viewing syntenic dotplots is inherently linked to GEvo. When the dotplot loads, click on the large square that shows most of the genomes (the small regions above and to the right are plasmid sequences). This will then pop-up another window with just the genomic sequence. Now, if you move the cross-hairs around the image, you will see that they turn red when over a dot. This means you can click on that spot and launch GEvo with those genomic regions pre-loaded. When GEvo loads, all you need to do is press the button "Run GEvo Analysis!" to start a high-resolution sequence analysis centered on the genes making the dot in SynMap plus an additional 50,000 basepairs of sequence upstream and downstream.

To generate the GEvo analysis on the right that consists of four genomic regions, click on both ends of an inversion. In the example shown here, the ends of the central red syntenic region were used. After both GEvo analyses have completed, copy one of the GEvo links that is used to regenerate an analysis and copy it into the text-box next to "Merge Previous GEvo Analysis (paste in URL):" in the other analysis. When pasted, press the "Merge" button and the genomic regions from the pasted-in analysis will appear in the sequence submission area. Now press the "Run GEvo Analysis!" button again to generate an analysis using all four genomic regions.

When the results are returned, you may need to adjust the extent of the genomic regions to get a clear view of the regions, or you can use this link to generate the results shown here: . In this analysis, there is extensive amounts of similar sequence among the genomic regions. These are shown as colored blocks, which each color representing one of the pair-wise comparisons. In GEvo, you can click on these blocks and a transparent wedge will be drawn connecting it to the region to which it is similar.

It may take you a few moments to get oriented, but when you do, you will see that each region of each strain is sytnenic to the two regions of the other strain. This is to be expected if there is an inversion breakpoint separating these genomic regions in one of the strains. Also, you should notice that there is one anomalous regions of sequence similarity between the two regions of strain OS155. This has been labeled on the figure "Sequence Repeat/Putative inversion site". This sequence is present at the inversion breakpoint in OS155, and is not present in either of the genomic regions from OS185. This is the smoking gun of an inversion event and provides evidence that this inversion happened in strain OS155 and not in strain OS185.

The genes that make up this repeat sequence can be determined using GEvo and clicking on the gene models. The genes in that region are annotated as:

  • site-specific recombinase, resolvase family (only present in one region)
  • IstB domain protein, transposition helper protein
  • Integrase, transposase

While it is interesting to see DNA modifying proteins at this location and assume they were involved with the recombination event, it could be that these transposases landed in these two regions independently, and set the stage for a recombination event. Or perhaps an even more exotic duplication/inversion event. In any case, this is strong evidence suggesting that the inversion seen between these two strains happened in OS155.

On a separate note, you may have noticed that these sytnenic lines contain quite a few number of smaller discontinuities. These are not from inversion events, but are the results of insertions, deletions, and duplications between the genomes. These events are often associated with transposases, phages, and other pseudo-autonomous inhabitants of the genome. The final GEvo image to the right shows an analysis of one of these regions. Strain OS155 has two genomic regions that are not present in OS185. Interestingly, each of these regions in OS155 is flanked by duplicate sequence that is present in OS185 only once. During DNA integration event, genomic DNA at the site of integration will often be duplicated in a process known as target-site duplication. Also, many of the genes in these non-syntenic genomic regions are transposases, integrases, and phage proteins.