Difference between revisions of "Syntenic dotplot"

From CoGepedia
Jump to: navigation, search
 
(27 intermediate revisions by 3 users not shown)
Line 1: Line 1:
'''Syntenic dotplots''' are a type of scatter-plot. Each axis represents a sequence laid end-to-end, and each dot in the scatter-plot represents a putative [[Homologous]] match between the two sequences. Often, these dotplots are used for whole genome comparisons within the same genome or across two genomes from different taxa in order to identify [[Synteny]]. Synteny is defined as two or more genomic regions that are derived from a common ancestral genomic region. The evidence for synteny is the identification of a set of genes in each genome that have a collinear arrangement. When such a pattern of gene-order conservation is discovered, the most parsimonious explanation is that the two regions are related through a common ancestor. While syntenic dotplots are useful for identifying related genomic regions, they are also useful for identifying genomic regions that have undergone an evolutionary change in one of the two genomes being compared. For example, insertions, deletions, duplications, and inversions are readily identifiable from these plots.  
+
[[Image:Dotplot.png|thumb|right|600px| Syntenic dotplot of E-coli B strain REL606(x-axis) and E-coli K12 strain DH10B (y-axis). The "green" line represents the regions of similarities between the two genomes while the discontinuities in this syntenic line (marked by numbered arrows) represent regions of genomic variations at a given locus between the two substrains of E-coli. Variations of this size (10s of kb) are usually the result of phage insertions, horizontal gene transfer events, deletions, and transposon activity. More information about this comparison can be found [[Analysis of variations found in genomes of Escherichia coli strain K12 DH10B and strain B REL606 using SynMap and GEvo analysis | here]].  More examples of bacterial syntenic dotplots and [[x-alignments]] can be found [[x-alignments | here]]. This dotplot can be regenerated [http://genomevolution.org/CoGe/SynMap.pl?dsgid1=7454;dsgid2=4241;D=20;g=10;A=5;w=0;b=1;ft1=1;ft2=1;dt=geneorder here].]]
  
Below is an example of comparing two closely related substrains of E. coli strain K12. While their entire genome is highly similar to one another at the nucleotide identity level, there are many "breaks" in the syntenic path through their genomes which reveal a variety of genomic changes (mostly insertions and deletions for this example).  
+
[[Image:Master 6807 8082.CDS-CDS.blastn geneorder D40 g20 A10.w1200.gene.ks.png|thumb|right|600px|Syntenic dotplot with Ks coloration of sorghum (x-axis) versus maize (y-axis). Genes are used for axis metrics; black lines separate chromosomes in each genome. Results can be regenerated at: https://genomevolution.org/r/dfjy.  Red syntenic lines are from the maize-specific [[whole genome duplication]] event and are orthologous to sorghum.  Purple are from the older pre-grass [[whole genome duplication]] event are are [[out-paralogs]].  More information about this analysis can be found [[Maize_Sorghum_Syntenic_dotplot | here]]. ]]
  
[[Image:Dotplot.png|thumb|center|983px]]  
+
[[Image:Master 8154 8154.CDS-CDS.blastn.dag.go c4 D40 g20 A5.aligncoords.gcoords ct0.w2000.gene.ks.png|thumb|right|600px|Syntenic dotplot of poplar versus itself.  Syntenic gene-pairs are colored by the [[synonymous mutation]] values.  This reveals intragenomic synteny derived from a recent [[whole genome duplication]] event (dark blue) and the older [[eudicot paleohexaploidy]] event (green-cyan).  This analysis can be regenerated at http://genomevolution.org/CoGe/SynMap.pl?dsgid1=8154;dsgid2=8154;c=4;D=40;g=20;A=5;Dm=;gm=;w=0;b=1;ft1=1;ft2=1;do1=1;do2=1;do=40;dt=geneorder;ks=1;am=g]]
  
<br>
+
[[Image:Master 8154 8154.CDS-CDS.blastn.dag.go c4 D40 g20 A5.aligncoords.gcoords ct0.w2000.gene.ks.hist.png|thumb|right|600px|Histogram of the [[synonymous mutation]] (Ks) values (log 10 transformed) of the syntenic gene pairs within poplar.  Smaller values on left infers young gene pairs, and larger values on right infers older gene pairs.  The two middle peaks are from poplar's recent whole genome duplication event (blue) and a more ancient [[eudicot paleohexaploidy]] event (green-cyan).  The peak on the far right, with non-log10 transformed Ks values of 50-100 are noise in the analysis.  Perhaps from the alignment of pseudogenes, mis-called syntenic gene pairs, and erroneous gene models.  These colors correspond to the colors used in the syntenic dotplot shown above.]]
  
{| width="1000" cellspacing="1" cellpadding="1" border="1"
+
'''Syntenic dotplots''' are a type of scatter-plot. Each axis represents a sequence laid end-to-end, and each dot in the scatter-plot represents a putative [[homologous]] match between the two sequences. Often, these dotplots are used for whole genome comparisons within the same genome or across two genomes from different taxa in order to identify [[synteny]]. Synteny is defined as two or more genomic regions that are derived from a common ancestral genomic region. The evidence for synteny is the identification of a set of homologous genes in two genome that have a collinear arrangement. When such a pattern of gene-order conservation is discovered, the most parsimonious explanation is that the two regions are related through a common ancestor. While syntenic dotplots are useful for identifying related genomic regions, they are also useful for identifying genomic regions that have undergone an evolutionary change in one of the two genomes being compared. Example of such events are:
|-
+
*[[insertions]]
| Variation type<br>
+
*[[horizontal gene transfers]]
| Difference in strain B REL606<br>
+
*[[deletions]]
| Difference in strain K-12 DH10B<br>
+
*duplications
| Evidence<br>
+
*[[inversions]]  
| Notes<br>
+
| Link leading to GEvo <br>
+
CoGe's tool [[SynMap]] makes it easy to create a syntenic dotplot for any two genomes in CoGe.
|-
+
| 1. Deletion<br>
+
| none<br>
+
| Deletion of ~18 genes including DNA <br>pol II, genes in metabolic pathway, thiamine ABC transporter<br><br>
+
| pseudogenes in DH10B at deletion site.<br><br>
+
| Possible additional insertion in DH10B as evidenced by <br>pseudogenes of yabP, RNA pol associated helicase and FruR, that are not present in REl606<br><br>
+
| [http://tinyurl.com/ylg9qrk tinyurl.com/yexrzpb]<br>
+
|-
+
| 2. Insertion<br>
+
| Insertion of IS1 transposon<br>
+
| Insertion sequences and Prophage CP46 DNA insertion
+
| Prophage specific genes found in DH10B<br>
+
| Prophage DNA insertion and IS insertions has created pseudogenes in K-12 DH10B<br>
+
| [http://tinyurl.com/yjdqgzr tinyurl.com/yd2quy7]<br>
+
|-
+
| 3. Translocation in REL606 and insertion in DH10B<br>
+
| Insertion of IS1 sequence. Translocation of ~15 genes including lac operon and other metabolic enzymes genes
+
| Insertion of IS3 and IS2 sequences
+
|
+
Translocation in REL606 as evidenced by direct repeats.Dotplot shows that the missing genes are present in DH10B but not in this locus. The syntenic region is therefore not colinear.<br>
+
 
+
|
+
Pseudogenes of yaiT and yaiX were created in DH10B by transposon insertions.
+
 
+
Insertion by translocation in REL606 was confirmed as lac operon and other metabolic genes were found in DH10B by analyzing the translocated genes on the dotplot
+
 
+
| [http://tinyurl.com/yldc83u http://tinyurl.com/yldc83u]<br>
+
|-
+
| 4. Insertion in REL606 and DNA duplication event in DH10B. <br>
+
| Prophage DNA and transposase insertion <br>
+
| Recent DNA duplication event&nbsp;&nbsp;
+
| 100% identity between paralogs in DH10B and ~98% identity between syntenic region of DH10B and REL606<br>
+
| Possible phage DNA insertion in REL606 as "hypothetical protein"&nbsp;genes&nbsp;were found near putative prophage tail component gene in REL606. <br>
+
| [http://tinyurl.com/yk7vjgq tinyurl.com/yea8bu6]<br>
+
|-
+
| 5. Insertion<br>
+
| Bacteriophage DNA insertion <br>
+
| IS2 sequence insertion<br>
+
| Pseudogenes at IS2 insertion site in DH10B. Phage specific genes were found in REL606<br>
+
| Possible phage DNA insertion in REL606 as "Hypothetical proteins" were found near phage specific genes <br>
+
| [http://tinyurl.com/yevlb2w tinyurl.com/yevlb2w]<br>
+
|-
+
| 6. Insertion<br>
+
| Prophage DNA insertion <br>
+
| none<br>
+
| Phage specific genes were found in REL606<br>
+
| none<br>
+
| [http://tinyurl.com/ybokuag tinyurl.com/ybokuag]<br>
+
|-
+
| 7. Insertion,translocation and inversion<span class="Apple-tab-span" style="white-space:pre"> </span><br>
+
| none
+
| Prophage DNA insertion and translocation of nitrite reductase 2 genes
+
| Phage specific genes found in DH10B<br>
+
| Translocation in DH10B is evident by dotplot. Moreover, the translocated genes in DH10B were found to be inverted. It could not be determined genes on which genomes were inverted as tranposon insertions were found in both genomes.<br>
+
|
+
[http://tinyurl.com/yaxlh7o tinyurl.com/yaxlh7o]
+
 
+
[http://tinyurl.com/y9cs6ft http://tinyurl.com/y9cs6ft]
+
 
+
|-
+
| 8. Insertion and deletion<br>
+
| Transposon insertions and deletion of&nbsp;phenylacetic acid degradation genes <br>
+
| IS and Rac prophage DNA insertion
+
| Phage specific genes found in DH10B. IS or transposon insertions in REL606 might have created direct repeats and facilitated excision of phenylacetic acid degradation genes.<br>
+
| Rac prophage DNA disrupted by transposon insertion in DH10B
+
| [http://tinyurl.com/ylgv7xc tinyurl.com/yccbmsq]<br>
+
|-
+
| 9a. Insertion
+
| 9a. none<span class="Apple-tab-span" style="white-space: pre;"> </span>
+
| 9a. Insertion of IS5 sequence
+
| 9a. none
+
| 9a. none
+
| 9a.&nbsp;[http://tinyurl.com/ylllc6u http://tinyurl.com/ylllc6u]
+
|-
+
| 9b. Insertion
+
| 9b. Insertion of ISI transposon
+
| 9b. none
+
| 9b. none
+
| 9b. none
+
| 9b.&nbsp;[http://tinyurl.com/ygsqg2f tinyurl.com/ygsqg2f]
+
|-
+
| 9c. Insertion
+
| 9c. ISI insertion<br>
+
| 9c. Insertion of ABC transporter, flagella encoding genes and few other enzymes
+
| 9c. Inserted DNA segment in DH10B is bordered by direct repeats at both ends. 100% identity was found between the two repeats. <br>
+
| 9c.DR indicates transposon insertion in DH10B.&nbsp;
+
| 9c[http://tinyurl.com/yza4jy3 tinyurl.com/yza4jy3][http://tinyurl.com/yjlns3p]
+
|-
+
| 9d. Insertion
+
| 9d. IS2 insertion <br>
+
| 9d. none
+
| 9d. none
+
| 9d. none
+
| 9d.&nbsp;[http://tinyurl.com/ygfgtqy tinyurl.com/ygfgtqy]
+
|-
+
| 10. Insertion and deletion<br><br>
+
| Bacteriophage DNA insertion and IS1 transposon insertion. Deletion of ~5 genes<br><br>
+
| Insertion of IS3<br>
+
| IS1 insertion at the site of deletion. Another IS1 insertion might have created direct repeats and facilitated deletion.&nbsp;
+
| &nbsp; none
+
| [http://tinyurl.com/ykynub2 tinyurl.com/ykynub2]<br>
+
|-
+
| 11. Insertion and deletion&nbsp; <br>
+
| IS1 insertion followed by insertion<br>
+
| CP4-57 prophage DNA insertion and possible deletion of ParB family protein and recombinase<br>
+
| Phage insertion in DH10B may have created pseudogene of ParB family protein genes and recombinase which later got deleted <br>
+
| Pseudogenes of yqa, yga and ypj indicated possible formation of pseudogenes of ParB and recombinase at some time prior to their deletion in DH10B.<br>
+
| [http://tinyurl.com/yg7ybg4 tinyurl.com/yg7ybg4]<br>
+
|-
+
| 12. Insertion, deletion and Inversion<br>
+
| IS1 insertion.&nbsp;
+
| IS5 and IS10 transposon insertion. Inversion of ornithine decarboxylase, M-type protein and bifunctional prepilin peptidase/methylase. Deletion of saframycin synthetase, capsule related genes, bio-film formation genes, anti-toxin system and type II secretory apparatus genes.&nbsp;
+
|
+
Insertion of pyrophosphorylase and "hypthetical protein' genes in REL606 as evidenced by their different GC content
+
 
+
Inversion in DH10B as evidenced by inverted repeats of IS10 transposon.
+
 
+
Deletion of several genes in DH10B is evidenced by IS5&nbsp; trans-activator transposase and presence of pseudogenes in DH10B,
+
 
+
|
+
Insertion of IS5 trans-activator transposase indicates possible deletion of several genes in DH10B. Also, no evidence of insertion in REL606 was found such as direct repeats.&nbsp;
+
 
+
| [http://tinyurl.com/yhyxgrq tinyurl.com/yhyxgrq]<br>
+
|-
+
|
+
13a. Deletion
+
 
+
| 13a. none
+
|
+
13a. Deletion of putative adhesin<br>
+
 
+
| 13a. No direct repeats were found to indicate insertion of putative adhensin in REL606 therefore deletion in DH10B may have happened
+
| 13a.none<br>
+
| 13a.[http://tinyurl.com/yjojy53 &nbsp;tinyurl.com/yjojy53]
+
|-
+
|
+
13b. Insertion<br>
+
 
+
| 13b. IS1 insertion and deletion of lipopolysaccharide genes&nbsp;
+
|
+
13b. none<br>
+
 
+
|
+
<br> 13b. IS1 insertion in REL606 indicates that deletion may have occured by formation of directed repeats.&nbsp;
+
 
+
| 13b. IS1 insertion created pseudogene.
+
| 13b.[http://tinyurl.com/yj2yg5s http://tinyurl.com/yj2yg5s]
+
|-
+
|
+
13c. Insertion
+
 
+
and deletion<br>
+
 
+
|
+
13c. Insertion of IS30 transposon and several 'hypothetical protein" genes.&nbsp;
+
 
+
|
+
13c. none<br>
+
 
+
|
+
13c. Insertion in REL606 is evidenced by direct repeats<br>
+
 
+
<br>
+
 
+
|
+
<br>
+
 
+
13c. direct repeats were found in REL606 which indicates insertion of ShiA-like and TrbC-like genes.&nbsp;<br>
+
 
+
|
+
<br>
+
 
+
<br>
+
 
+
13c.[http://tinyurl.com/yjzdyum &nbsp;tinyurl.com/yjzdyum]<br>
+
 
+
[http://tinyurl.com/ydkrcv8 <br>]  
+
 
+
|-
+
|
+
14a. Insertion
+
 
+
| 14a. Insertion of several transposons and secondary glycine betaine transporter
+
| 14a.Insertion of several transposons. Insertion of Kple2 phage-like element
+
| 14a. Direct repeats bordering secondary glycine betaine transporter indicates its insertion
+
| 14a. none
+
| 14a. [http://tinyurl.com/yzyvunx tinyurl.com/yzyvunx]
+
|-
+
| 14b. Insertion
+
| 14b. Insertion of ~15 genes
+
| 14b. Phage insertion. Transposon insertions
+
| 14b. Insertion in REL606 is evidenced by direct repeats flanking the DNA segment containing several genes.  
+
| 14b.Phage-like genes were found in DH10B
+
| 14b. [http://tinyurl.com/yly2b6u tinyurl.com/yly2b6u]
+
|-
+
|
+
<br> 14c. Deletion
+
 
+
|
+
14c. none
+
 
+
|
+
<br>
+
 
+
14c. Deletion of ~15 genes.
+
 
+
<br>
+
 
+
<br>
+
 
+
|
+
<br>
+
 
+
14c. Deletion in DH10B is evidenced by insertion of IS10R which&nbsp; may have facilitated excision of DNA by forming direct repeats
+
 
+
|
+
<br>
+
 
+
<br>
+
 
+
14c. Pseudogenes found at the site of deletion and IS10R insertion.
+
 
+
|
+
<br>
+
 
+
<br>
+
 
+
14c. [http://tinyurl.com/yfhhsk6 tinyurl.com/yfhhsk6]
+
 
+
[http://tinyurl.com/ycwsmsl <br>]
+
 
+
|}
+
 
+
<br>
+

Latest revision as of 15:02, 24 July 2014

Syntenic dotplot of E-coli B strain REL606(x-axis) and E-coli K12 strain DH10B (y-axis). The "green" line represents the regions of similarities between the two genomes while the discontinuities in this syntenic line (marked by numbered arrows) represent regions of genomic variations at a given locus between the two substrains of E-coli. Variations of this size (10s of kb) are usually the result of phage insertions, horizontal gene transfer events, deletions, and transposon activity. More information about this comparison can be found here. More examples of bacterial syntenic dotplots and x-alignments can be found here. This dotplot can be regenerated here.
Syntenic dotplot with Ks coloration of sorghum (x-axis) versus maize (y-axis). Genes are used for axis metrics; black lines separate chromosomes in each genome. Results can be regenerated at: https://genomevolution.org/r/dfjy. Red syntenic lines are from the maize-specific whole genome duplication event and are orthologous to sorghum. Purple are from the older pre-grass whole genome duplication event are are out-paralogs. More information about this analysis can be found here.
Syntenic dotplot of poplar versus itself. Syntenic gene-pairs are colored by the synonymous mutation values. This reveals intragenomic synteny derived from a recent whole genome duplication event (dark blue) and the older eudicot paleohexaploidy event (green-cyan). This analysis can be regenerated at http://genomevolution.org/CoGe/SynMap.pl?dsgid1=8154;dsgid2=8154;c=4;D=40;g=20;A=5;Dm=;gm=;w=0;b=1;ft1=1;ft2=1;do1=1;do2=1;do=40;dt=geneorder;ks=1;am=g
Histogram of the synonymous mutation (Ks) values (log 10 transformed) of the syntenic gene pairs within poplar. Smaller values on left infers young gene pairs, and larger values on right infers older gene pairs. The two middle peaks are from poplar's recent whole genome duplication event (blue) and a more ancient eudicot paleohexaploidy event (green-cyan). The peak on the far right, with non-log10 transformed Ks values of 50-100 are noise in the analysis. Perhaps from the alignment of pseudogenes, mis-called syntenic gene pairs, and erroneous gene models. These colors correspond to the colors used in the syntenic dotplot shown above.

Syntenic dotplots are a type of scatter-plot. Each axis represents a sequence laid end-to-end, and each dot in the scatter-plot represents a putative homologous match between the two sequences. Often, these dotplots are used for whole genome comparisons within the same genome or across two genomes from different taxa in order to identify synteny. Synteny is defined as two or more genomic regions that are derived from a common ancestral genomic region. The evidence for synteny is the identification of a set of homologous genes in two genome that have a collinear arrangement. When such a pattern of gene-order conservation is discovered, the most parsimonious explanation is that the two regions are related through a common ancestor. While syntenic dotplots are useful for identifying related genomic regions, they are also useful for identifying genomic regions that have undergone an evolutionary change in one of the two genomes being compared. Example of such events are:

CoGe's tool SynMap makes it easy to create a syntenic dotplot for any two genomes in CoGe.