Bacteria Genomic Inversion E .coli K12

From CoGepedia
Jump to: navigation, search
Master 4241 4243.CDS-CDS.blastn geneorder D20 g10 A5.w500.png

It is easy to visualize genomic inversions with a Syntenic dotplot. Since Syntenic regions appear as colored lines in the syntenic dotplot generated by SynMap, inversions are seen when a syntenic line has a discontinuity or break, and there is an adjacent line with an opposite slope (e.g. a positively sloping line has a break and adjacent to it is a negatively sloping line).

In the example shown here, SynMap colored the syntenic lines green if they have a positive slope and red if they have a negative slope. This plot has a nearly continuous green line starting at the origin in the lower left and sloping to the upper right. While you will notice that this line does have some discontinuities that are the result of insertions, deletions, and duplications between the genomes, in the upper right area of the dotplot, you will see the green line break, and a red line start with the opposite slope. Beyond the red line is another green line. If you connect the two green lines, they will be almost perfectly inline with one another. The gap is completely filled by the red line. This offset red line is show that a genomic inversion happened in one of the two genomes.

It is not possible to tell which of the two genomes had the inversion by examining the syntenic dotplot. To make such an assessment we need to examine the ends of the inversion in both genomes and try to find a signature leftover from the inversion event in one of the genomes. Inversion usually happen between regions with some degree of sequence similarity and/or through a double-strand break repair mechanism. In both cases, there will be Repeat sequences at end of the region of the inversion. If we are lucky, only one genome will have repeat sequences at the inversion break-points.

DH10B-WG3110-Inversion.png

Unfortunately, the high-resolution analysis doesn't immediately determine which genome had the inversion. At the break-points in all four genomes are ribosomal gene cassettes. These ~5kb sequences are high in sequence similarity within a genome and provide the right type of sequences that are used for a genomic inversion site.

One possible line of evidence that can be used to determine which genome had the inversion is to calculate the pairwise percent sequence identity among these four ribosomal gene cassettes. If some degree of sequence repair/conversion happened for the genome in which the inversion happened, the intra-genomic identity with be higher in that genome than in the other genome:


DH10B-A
DH10B-B
W3110-A
W3110-B
DH10B-A
------------
97.83%
97.83%
100%
DH10B-B

-----------
100%
97.72%
W3110-A


-----------
97.71%
W3110-B



-----------


Unfortunately, given these data and no statistics, it is difficult to say with any certainty in which genome the inversion happened. 


But that is how it goes sometimes. See this example between strains of Shewanella baltica where transposition helper genes are found at both sides of an inversion in one of the strains, while the other strain lacks such repeated sequences.