SynMap: Difference between revisions
No edit summary |
|||
Line 9: | Line 9: | ||
If you choose, [[synonymous]] and non-synonymous site mutation data can be calculated for protein coding genes that are identified as syntenic. These genes will then be colored based on those values in the dotplot for rapid identification of different age-classes of syntenic regions. | If you choose, [[synonymous]] and non-synonymous site mutation data can be calculated for protein coding genes that are identified as syntenic. These genes will then be colored based on those values in the dotplot for rapid identification of different age-classes of syntenic regions. | ||
==SynMap Methods== | |||
#extract sequences for comparison; build fasta files | |||
#create blastable databases and compare using blast <ref name="blast">Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology 215:403-410 </ref> | |||
#identify syntenic pairs of by finding collinear series of putative homologous sequences using DAGChainer <ref name="dagchainer">Haas BJ, Delcher AL, Wortman JR, Salzberg SL (2004) DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 20: 3643–3646</ref> | |||
#optional: calculate synonymous and non-synonymous mutation rates for syntenic gene pairs using CodeML of the PAML package<ref name="paml"> | |||
{{reflist}} | |||
==Specifying genomes== | ==Specifying genomes== |
Revision as of 18:34, 4 January 2010

Overview
SynMap allows you to generate a syntenic dotplot between two organisms and identify syntenic regions. This is done by:
- Finding putative genes or regions of homology between two genomes
- Identifying collinear sets of genes or regions of sequence similarity to infer synteny
- Generating a dotplot of the results and coloring syntenic pairs.
If you choose, synonymous and non-synonymous site mutation data can be calculated for protein coding genes that are identified as syntenic. These genes will then be colored based on those values in the dotplot for rapid identification of different age-classes of syntenic regions.
SynMap Methods
- extract sequences for comparison; build fasta files
- create blastable databases and compare using blast [1]
- identify syntenic pairs of by finding collinear series of putative homologous sequences using DAGChainer [2]
- optional: calculate synonymous and non-synonymous mutation rates for syntenic gene pairs using CodeML of the PAML packageCite error: Closing
</ref>
missing for<ref>
tag and share two sequential whole genome duplications events [3] since the divergence of their lineage with Carica papaya's lineage [4]. Each whole genome duplication event creates a contemporaneous copy of every chromosome and all the genetic information they contain. However, over evolutionary time, these duplicated homeologous chromosomes are fractionated, undergo rearrangement and inversions, gene transposition events, and other genomic changes. In addition, duplicated genes that are retained (as well as surrounding non-coding sequence) will diverge from one another. Coding sequence divergence can be measured by synonymous changes (Ks), and a population of contemporaneously created syntenic genes pairs from a whole genome duplication event will create characteristic peaks in a histogram of Ks values [5].
Shared whole genome duplication events can be detected through syntenic dotplot analysis (spacial analysis of gene order) and through synonymous change rate (Ks) histograms (temporal analysis of coding sequence divergence) for putative homologous gene pairs. SynMap can combine these approaches and can identify collinear sets of putatively homologous genes (spatial detection of synteny), calculate Ks values for these syntelogous gene pairs (temporal calculation of synteny), and use those Ks values to generate a color-metric histogram and paint the syntelogs the appropriate color on the dotplot. This combination of temporal and spatial syntenic analysis creates a final image that permits the rapid visual identification and evaluation of shared whole genome duplication events.
Figure 1a shows a syntenic dotplot between the genomes of At (y-axis) and Al (x-axis) laid on each axis. These plots are generated by comparing every coding sequence between these taxa using blastn in order to identify putatively homologous gene pairs. These results are used by DAGChainer to find collinear sets of genes shared between the taxa. The combined data-set is plotted according to their relative genomic position where each putative homologous gene pair is plotted with a gray dot, and syntenic gene pairs are plotted with a color based on their Ks value. The comparison of At and Al's genomes shows two significant patterns of synteny. First, these two genomes have syntenic regions identified by bright-green lines that are derived from the speciation divergence of these two taxa. Socond, there are smaller blocks of yellow-green lines that are derived from their shared whole genome duplication event (WGD) know as alpha[3]. Comparison to the Ks histogram (Figure 1B) shows that the bright-green has smaller Ks values (fewer changes) than the yellow-green line, which is to be expected as their divergence post-dates their shared whole genome duplication event.
Generating a close-up view of the comparison of chromosome one from both taxa (Figure 2), reveals a similar pattern (light-blue orthologs, light-green out-paralogs derived from their shared most recent whole genome duplication event), with additional evidence of the more ancient shared whole genome duplication event known as beta[3]. The beta whole genome duplication event is visualized by much smaller identified syntenic regions colored in yellow-orange. These syntenic gene-pairs correlate to a smaller peak in the Ks histogram with a larger mean Ks value than the subsequent whole genome duplication even (alpha) or the orthologs derived from the divergence of these taxa.
In order to validate and access the types and patterns of change at these genomic loci, high-resolution analysis of these syntenic regions can be performed using GEvo, and selecting the appropriate set of genomic regions using SynMap's interface. Such an analysis can be seen in figure 3 which compares five syntenic regions from these taxa. Two pairs of regions, At1-Al1 and At2-Al2, are orthologous and derived from the speciation of these lineages. This is evidenced by the high degree of spatial evidence for synteny between these regions (pink and blue lines) where nearly every gene in these regions has an orthologous partner in a collinear arrangement. These two pairs of regions are also syntenic with respect to one another (green lines) and are derived from their shared most recent whole genome duplication event (WGD) known as alpha. These four regions are syntenic to an additional region, Al3, which is derived from these lineages' shared second most recent whole genome duplication event known as beta. Syntenic genes are connected between Al2 and Al3 using dark blue lines, and note the lower density of syntenic gene pairs than for regions derived from the most recent WGD and the speciation of the lineages. While not shown in this figure, there is a syntenic region in At to Al3 from the speciation of these taxa, and two addition syntenic regions (one from each lineage) derived from the alpha whole genome duplication.
Please note that the Ks histogram is using log10 transformed Ks values. While many people set an upper cutoff for Ks values (usually at 2), these histograms show all values. The peak in the both these Ks histograms (Mean Ks ~ 65) at the far right and colored red is the result from mis-called syntenic gene pairs, genes whose alignments are very poor (e.g. due to a frame-shift mutation or pseudogenization), or from an error in the Ks calculation.
- ↑ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology 215:403-410
- ↑ Haas BJ, Delcher AL, Wortman JR, Salzberg SL (2004) DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 20: 3643–3646
- ↑ 3.0 3.1 3.2 Bowers JE, Chapman BA, Rong JK, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422:433–438.
- ↑ Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, Senin P, Wang W, Ly BV, Lewis KL, et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature. 2008;452:991–996. doi: 10.1038/nature06856.
- ↑ Blanc, G., and K. H. Wolfe. 2004. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16:1667-1678