Plasmodium genome analysis using Syntenic Path Assembly

From CoGepedia
Jump to: navigation, search

Using Syntenic Path Assembly (SPA) to make analysis of poor or early genome assemblies easier (SynMap - SPA tool)

Figure 1. Syntenic Path Assembly (SPA) window analysis

There are a large number of Plasmodium genomes that remain to be fully sequenced, assembled and annotated. Incomplete genomic data comes from a variety of sources:

  • Genomic information published on early assembly stages.
  • Partially sequenced genomes.
  • Low-quality genome segments.

Sequencing projects can be simplified by the use of a reference genome as a guide for genome assembly. While unassembled and non-annotated genomes can be used in smaller-scale studies (e.g. orthologs can be identified with BLAST), there are limitations in their usability in large-scale comparative genomics.

Figure 2. P. inui Syntenic Path Assembly (SPA) using P. coatneyi as a reference genome. Black circles show putative interpretation errors. The analysis can be replicated following this link: https://genomevolution.org/r/ljen

Tools that generate preliminary assemblies are incredibly important for comparative analyses, especially as more genomic data become available. CoGe’s tool, Syntenic_path_assembly (SPA), creates a graphical display of syntenic gene pairs based on a reference genome. We will use SPA to assemble the P. inui genome (on scaffold level as in 2016) using the fully assembled P. coatneyi genome as a reference.

The following steps show how to use SynMap - SPA tool:


1. Go to: https://genomevolution.org/coge/ and login to CoGe

2. Run SynMap between an assembled and a non-assembled genome (this might take longer than analyses using two fully assembled genomes).

3. After running SynMap click on the Display Options tab and find the SPA tool (Figure 1). Select the tool by clicking on the check mark next to: The Syntenic Path Assembly (SPA)?

4. After a few minutes, the incomplete genome will be assembled using the second genome as a reference.


You can follow a link to an example analysis here: https://genomevolution.org/r/ljen

SPA is extremely useful to generate quick and dirty genome assemblies; however, there are some limitations regarding assembly interpretation. We highlight two scenarios seen in the P. inui’s SPA using P. coatneyi’s genome as a reference (Figure 2).

Rearrangement events such as inversions or duplications cannot be identified using SPA. For one, several contigs can be syntenic to the same region of the reference genome without signaling a duplication event. Also, contigs syntenic to a reverse DNA strand might not reflect chromosome inversions (black circles, Figure 2).

In addition, contigs will be arranged to increase synteny between the unassembled and the reference genome. Thus, using different reference genomes will result in different preliminary assemblies. In the case of P. inui, using P. coatneyi (a closely related species) or P. falciparum (a more distantly related species) as reference genomes will result in different assemblies. Therefore, before running SPA, the reference genomes should be selected after consideration of the biological and evolutionary relationship between species. Also, interpretation of SPA assemblies might be problematic when working with transposon-rich genomes.