Difference between revisions of "Using CoGe for the analysis of Plasmodium spp"

From CoGepedia
Jump to: navigation, search
(Using CoGe tools to perform comparative analyses)
(Using CoGe tools to perform comparative analyses)
Line 32: Line 32:
You can find the details of Plasmodium genome integration in the following link: [[Finding and intregating Plasmodium genomes to CoGe]]
You can find the details of Plasmodium genome integration in the following link: [[Finding and intregating Plasmodium genomes to CoGe]]
=='''Using CoGe tools to perform comparative analyses''' ==
=='''Using CoGe tools to perform comparative analyses'''==

Revision as of 13:53, 7 February 2017

About this guide

This 'cookbook' style document is meant to provide an introduction to many of our tools and services and is structured around a case study of investigating genome evolution of the malaria-causing Plasmodium spp. The small size and unique features of this pathogen's genome make it ideal for beginning to understand how our tools can be used to conduct comparative genomic analyses and uncover meaningful discoveries.

Through a number of example analyses, this guide will teach users about the following tools:

  • LoadGenome: Add a new genome to CoGe.
  • LoadAnnotation: Add structural and/or functional annotations to a genome.
  • GenomeInfo: Get information about a genome.
  • GenomeList: Get information about several genomes in a table.
  • CoGeBLAST: BLAST against any set of genomes.
  • GEvo: Microsynteny analysis.
  • SynMap: Whole genome syntenic analysis.
- SynMap#Calculating and displaying synonymous/non-synonymous (Ks, Kn), data Kn/Ks Analysis: Characterize the evolution of populations of genes.
- SPA tool: Syntenic Path Assembly to assist in genome analysis.
  • SynFind: Identify syntenic genes across multiple genomes.
  • CodeOn: Characterize patterns of codon and amino acid evolution in coding sequence.

A brief introduction to Plasmodium genome evolution

The study of parasitic genomes via comparative genomics offers many unique challenges. Parasite genomes are characterized by a combination of gene loss and the acquisition of species- or lineage-specific genes; in particular, many specialized genes mediate host–parasite interaction [1]. The dynamic nature of parasitic genomes is particularly evident within the genus Plasmodium. The genus emerged ~40 million years ago and harbors roughly 200 species of parasitic protozoa better known as malaria parasites. All Plasmodium species have a complex life cycle involving some kind of vertebrate host and a mosquito vector of the genus Anopheles (mammals) or Culex (birds). In addition, Plasmodium species share similar life cycle characteristics, albeit with a few exceptions (e.g. hypnozoites). However, host and vector preferences differ among Plasmodium species [2].

Plasmodium genomes are tiny (between 17-28Mb) in comparison to those of their vertebrate (1Gb for birds; 2-3Gb for mammals) and mosquito (230–284Mbp) hosts [3]. All Plasmodium genomes consist of fourteen chromosomes (nuclear genome), as well as a mitochondrial and apicoplast genome. Despite these shared genomic characteristics, the structural organization, gene content, and sequence of Plasmodium genomes is highly variably within the genus [4]. The exact origins and mechanisms of these differences remain largely unexplored, however, they are generally hypothesized to stem from host shift events [5][6].

An increase in funding devoted to malaria research has coincided with a dramatic increase in publicly available genomic information for Plasmodium [7]. The most prominent repository is found at NCBI/Genbank [8]; while additional and unique sequences can also be found on other databases: PlasmoDB [9], GeneDB [10], and MalAvi [11]. This wealth of genomic data facilitates detailed comparative genomic approaches, opening the possibility to:

  • Infer origins of certain traits, specialized phenotypes, and genomic features.
  • Track the maintenance of conserved genes across the genus, as well as the gain or loss of genes unique to a single species or a group of closely related species.
  • Identify the potential historical interactions that might have lead to the development of genomic adaptations.

Through a case study on Plasmodium evolution, we will illustrate how CoGe can be used for the analysis of multigene families, local synteny, and whole genome comparisons (genome composition, rearrangement events, and gene order conservation).

Finding and integrating Plasmodium genomes in CoGe

You can find the details of Plasmodium genome integration in the following link: Finding and intregating Plasmodium genomes to CoGe

Using CoGe tools to perform comparative analyses


The following links direct to specific tools for the comparative analysis of Plasmodium genomes:

Plasmodium analysis workflow 1: Tools that evaluate genomic properties and amino acid usage

Identifying gene homologs (CoGeBLAST)

Figure 11. Screen capture of CoGeBLAST input. Genomes included in the analysis and the used query sequence are shown.

The identification of homology based on sequence similarity is a key tool for gaining insight into an organism’s biology and genetics. Defining evolutionary relationships and inferring common ancestry is particularly challenging when dealing with multigene families. Plasmodium multigene families perform a wide array of functions, have diverse gene organization, and distinct evolutionary histories. Here we focus on a set of multi-gene families arising from the subtelomere (e.g. var, stevor, rifin, or vir) that have very complex evolutionary patterns and organizations [12]. These four gene families are of particular interest because of their role in immune evasion and cell invasion. In addition, these families have undergone rapid sequence evolution and gene turnover [13][14][15]. These factors make inferring orthology/paralogy and gene gain/loss events in Plasmodium subtelomeric families a complex task.

The 313 members of P. vivax’s vir family are grouped into 10 subfamilies based on their sequence similarity. Gene size and structure (number of exons) is largely variable among family members [16][17][18]. The genetic diversity in the vir family is larger than that of other P. vivax families. Only fifteen of the 313 vir genes are shared across all sequenced P. vivax strains despite the recent emergence of the species ~ five million years ago. Within this group, PVX_113230 has been proposed as a potential family founder based on its high sequence conservation [19].

Here we use CoGeBLAST to identify the proposed founder of the Plasmodium vir family (PVX_113230) in six P. vivax strains (including the recently sequenced PO1 strain). CoGeBLAST incorporates genome visualization into BLAST analyses. Therefore, this tool facilitates the study of complex evolutionary patterns.

Figure 12. Screen capture of the genomic HSP visualization section of CoGeBLAST. Salvador-1 (left) and PO1 (right) are shown side by side. Analysis can be replicated following this link: https://genomevolution.org/r/mjg3
The following steps show how to use CoGeBLAST in the CoGe platform:

1. Go to: https://genomevolution.org/coge/ and login to CoGe.

2. Click on CoGeBLAST or follow this link: https://genomevolution.org/coge/CoGeBlast.pl

3. Type the scientific name of the Organism of interest in the Search box. All genomes with names matching the search term will appear under the Matching Organisms menu. Notebooks matching the term will appear in a new window after clicking on Import List.

4. Select all the genomes of interest and click on + Add. The genomes will now appear on the Selected Genomes menu. You can also select any of your Notebooks and include all the genomes contained in it.

5. Enter your query sequence in FASTA format. If desired, you can change the BLAST Parameters before starting the analysis.

6. Once all information is included click on Run CoGe BLAST (Figure 11).

7. The analysis output will include:

  • A table showing the high-scoring segment pairs (HSP) counts for each genome.
  • A graphic depiction of the location of BLAST hits (Genomic HSP Visualization).
  • A HSP table detailing genetic information for each hit.

You can follow a link to an example analysis here: https://genomevolution.org/r/mjg3

You can find links to the FASTA sequences used in this analysis in the "Sample data" section at the end of this page.

Sequences with significant similarity to PVX_113230 were found in all the evaluated P. vivax strains, including PO1. However, the number of high-scoring segment pairs for each P. vivax genome was variable. The highest number of sequence homologs was observed in the strains: Mauritania, PO1, and Salvador-1. Sequence divergence of vir members within P. vivax seems to affect the number of high-scoring segment pairs per strain. Thus, the variation in the number of HSPs across strains further supports observations about the high sequence variation among vir homologs.

The location of HSPs appears to be slightly variable across genomes. However, we cannot confirm this patterns until the Mauritania, North Korea, Brazil I, and India VII genomes are fully assembled. Between the two fully assembled P. vivax genomes (Salvador-1 and PO1), BLAST hits were located largely in the same chromosome regions (Figure 12). As expected, a higher number of BLAST hits and a more variable genome location were observed when a less conserved vir family member (PVX_096004.1) was used as a query (analysis can be run following this link: https://genomevolution.org/r/mkcg).

Identifying microsyntenic regions (GEvo)

Figure 13. Background GC content: GC-rich regions (green), AT-rich regions (white). Wobble GC content: GC-poor (red), ~50% GC (yellow), and GC-rich (green). The location of CyRPA and Rh5 is marked with sapphire and teal lines, respectively. You can rerun the analysis following this link: https://genomevolution.org/r/m4dq

Changes in local genome organization can be used to ascertain the evolutionary history of a region (microsynteny). In Plasmodium, many genes related to parasite-host interactions are rapidly evolving and undergo frequent rearrangements, gain/loss events, and horizontal transfer. These evolutionary processes leave "genomic signals" by altering the local genome organization. Erythrocyte invasion is a multi-step process that represents one of the most crucial steps in the Plasmodium life cycle [20]. Recently, two P. falciparum genes (the reticulocyte-binding-like homologous protein 5 (Rh5) and the cysteine-rich protective antigen (CyRPA)) were shown to be the result of a horizontal gene transfer between P. faciparum and P. adleri progenitors within the . Remarkably, comparative genomics demonstrated that this horizontal gene transfer was localized to an 8kb segment on chromosome 4. The localized nature of this event, plus interspecific hybridization barriers suggest that the gene transfer occurred by the capture of a small segment of P. adleri progenitor genomic DNA by the P. falciparum progenitor within the Laveranian subgenus. As Rh5 and CyRPA are crucial for host erythrocyte invasion by P. falciparum, it has been proposed that the capture of these two genes conferred a strong fitness advantage that allowed the P. falciparum progenitor to infect humans [21]. In sum, the genomic region surrounding these two genes represents an excellent case study on how to examine microsyteny with CoGe.

Here, we will use CoGe’s tool GEvo to evaluate genomic properties within this region and assess the hypothesized horizontal transfer event.

Figure 14. The analysis shows a region of synteny loss between P. vivax (Salvador-1), P. vivax (PO1) and P. cynomolgi. Low quality segments are shown in orange. You can rerun the analysis following this link: https://genomevolution.org/r/mjjv
The following steps show how to use GEvo to analyze microsyntenic regions:

1. Go to: https://genomevolution.org/coge/ and login into CoGe.

2. Click on GEvo or follow this link: https://genomevolution.org/coge/GEvo.pl

3. Specify a sequence for each box found under Sequence (you can specify a maximum of 25 sequences). Each box contains:

  • A drop down menu of sequence databases (CoGe database, NCBI GenBank, or Direct Submission).
  • The name of the selected sequence (e.g. gene ID numbers).
  • The length of the genome segment to display in GEvo.
  • Additional Sequence Options including: skip sequence from the analysis, set sequence as a reference, set sequence as a reverse complement, and mask the sequence.

You can either import sequences for GEvo analysis by entering their gene IDs in the Name box, or you can select gene pairs for analysis directly from SynMap.

4. Click on Run GEvo.

5. The GEvo analysis will display the syntenic region between the compared genomes.

6. You can modify the parameters of the GEvo analysis in the Algorithm tab. Also, you can modify the information of the graphical display by altering the options on the Results Visualization Options tab.

You can follow a link to an example analysis here: https://genomevolution.org/r/m4dq and here https://genomevolution.org/r/mjjv

We performed a microsynteny analysis of the genome region containing Rh5 and CyRPA. The analysis was conducted using the five fully sequenced Laveranian genomes currently available: P. falciparum strains 3D7 and IT, P. reichenowi strains CDC and SY57, and P. gaboni strain SY75. Our results show that microsynteny is largely maintained in the regions surrounding Rh5 and CyRPA. We modified the Results Visualization Options tab to display background and wobble GC content for genes in this region. Neither background GC content across the region, nor wobble GC content for either Rh5 or CyRPA vary significantly (Figure 13). It has been proposed that significant changes in background or wobble GC content could be used as evidence of a horizontal transfer event. However, we did not observe such a pattern in our analyses. It is possible that a horizontal transfer event between ancestral Laveranian genomes might not be detected using this method due to the similar nucleotide composition of species in the subgenus. Therefore, an additional test might be required to further support the proposed horizontal transfer event.

We also used GEvo to further analyze regions where putative inversion breakpoints are located. Comparative analyses between P. vivax (Salvador-1) and P. vivax (PO1), and between P. vivax (Salvador-1) and P. cynomolgi show two inversion events. These events are not observed in comparisons between P. cynomolgi and P. vivax (PO1). A detailed study of the inversion breakpoints using GEvo shows genome segments of low sequence quality on P. vivax (Salvador-1) (Figure 14). This suggests that the reported inversion event might be the product of a sequencing artifact instead of a real rearrangement.

Performing synteny analyses between two genomes (SynMap)

Over evolutionary time, neighboring genes often maintain their relative position and order within a chromosomal segment. Chromosomal regions from different species that contain colinear homologs are said to be syntenic, i.e., genomic regions of shared ancestry. Changes in colinearity within syntenic regions are used to ascertain the nature, location, and extension of rearrangement events between related species. The main use of CoGE’s tool, SynMap, is to find syntenic regions where gene order is preserved. SynMap’s graphical output allows for easy and fast interpretation of these results.

Figure 15. SynMap input screen. Genomes for two different species are selected: P. cynomolgi B strain (Organism 1), and P. vivax Salvador-1 strain (Organism 2).
Figure 16. Inversion events observed in SynMap Legacy. Inversions seen on pairwise comparisons with P. vivax are marked with orange circles. See steps section (green box) to find links to rerun these analyses.
Figure 17. Independent rearrangement events observed in SynMap Legacy. Identified rearrangement events: fusion/fission originated on chromosome 5 and 9 of P. malariae (red squares), fusion/fission originated on chromosome 13 and 14 of P. coatneyi (green squares), an inversion found on the central region of chromosome 4 of P. malariae (blue circle). See steps section (green box) to find links to rerun the analyses.
The following steps show how to analyze syntenic gene pairs with SynMap:

1. Go to: https://genomevolution.org/coge/ and login to CoGe

2. Click on Organism View or follow this link: https://genomevolution.org/coge/OrganismView.pl

3. Type a scientific name in the Search box and select the appropriate genome. Then, click on the GenomeInfo link under the Genome Information section.

4. Find the link to the SynMap tool under the Analyze section.

5. By default, SynMap will perform a self-comparison of any selected genome. This is of use when characterizing a genome or when attempting to identify the relative age of putative duplication events [22]. To analyze two different genomes, type a scientific name on the Search box of either Organism 1 or Organism 2. Once finished, click on Generate SynMap to run the analysis (Figure 15).

6. SynMap will output a graphical depiction of the syntenic regions between two genomes. There are currently two version of SynMap:

  • SynMap2, allows the user to interact and dynamically alter the analysis.
  • SynMap Legacy, provides static images of the analysis.

7. You can further analyze regions or genes of interest using the GEvo tool linked to SynMap. To do this, double click on a syntenic gene pair (SynMap Legacy), or select a syntenic gene pair and click on Compare in GEvo >>> (SynMap2).

You can follow a link to the first example analyses here (Figure 16):

https://genomevolution.org/r/lj12 (P. vivax vs. P. cynomolgi)

https://genomevolution.org/r/lj1x (P. knowlesi vs. P. cynomolgi)

https://genomevolution.org/r/lj1t (P. knowlesi vs. P vivax)

You can follow a link to the second example analyses here (Figure 17):

https://genomevolution.org/r/lq5x (P. knowlesi vs. P. malariae)

https://genomevolution.org/r/lj2b (P. coatneyi vs. P. knowlesi)

https://genomevolution.org/r/lq5y (P. coatneyi vs. P. malariae)

https://genomevolution.org/r/lq5t (P. ovale vs. P. malariae)

https://genomevolution.org/r/lq65 (P. coatneyi vs. P. ovale)

https://genomevolution.org/r/lq5v (P. ovale vs. P. knowlesi)

Identifying syntenic gene pairs

Gene position can be critical in gene expression. In many eukaryotes, expression of neighboring genes is coordinated by adjacent regulatory elements [23][24][25]. Thus, changes in gene position and order can have profound effects on gene expression. In P. falciparum, subtelomeric neighboring genes are known to form small independently expressed groups in a process thought to increase the parasite’s adaptive potential [26]. It is still unknown if these transcriptional "islands" are found outside the subtelomeric regions, or even in other Plasmodium parasites. The first step to address this issue is to use tools that allow the rapid identification of changes in gene order and position. We can use SynMap to determine gene origin, establish relative location, and identify changes in position and order. This information can later be used to establish if patterns of coordinated expression, or lack of thereof, are prevalent across the Plasmodium genus.

Identifying chromosomal inversions, fusions, fissions and other events between two genomes

Numerous genome rearrangements have taken place throughout the evolution of the genus Plasmodium. There is a strong correlation between synteny and divergence times. In other words, the more closely related two species are, the more likely synteny will be observed between their genomes [27]. We can use SynMap to identify rearrangement events and infer their putative evolutionary origin.

We used SynMap to confirm the location and origin of reported inversions between P. vivax, P. cynomolgi and P. knowlesi’s 3rd and 6th chromosomes. We performed pairwise comparisons to evaluate changes in genome organization among the three species (Figure 16). We only detected inversion events in pairwise comparisons with P. vivax (Figure 16, orange circles). This suggests that the inversion events reported on chromosomes 3 and 6 occurred after the split of P. cynomolgi and P. vivax (approximately 3.43-3.87 Mya) [28]. However, a detailed analysis of the breakpoint regions in P. vivax using GEvo (Figure 14) shows a genome segment of low sequence quality. Thus, it is possible that the inversion event reported on P. vivax could actually be an artifact.

We also used SynMap to infer changes in gene order and composition among another group of closely related Plasmodium species. Pairwise comparisons were performed between four closely related Plasmodium parasites from the simian clade: P. ovale curtisi, P. malariae, P. coatneyi and P. knowlesi. We identified independent sets of chromosome fusion/fission events across these species. A set of fusions/fissions was found on P. malariae’s 5th and 9th chromosomes (Figure 17, red squares); another set of fusion/fission events was found on P. coatneyi’s 13th and 14th chromosomes (Figure 17, green squares). In addition, we found an inversion event located in the central region of P. malariae’s 4th chromosome (Figure 17, blue circle).

Measuring Kn/Ks values between genomes (SynMap - CodeML analysis tool)

Two genomes with a common ancestor will slowly accumulate nucleotide changes over time that distinguish them from one another. Nucleotide changes that result in an amino acid change are called non-synonymous and those that do not are called synonymous. Synonymous substitutions are largely neutral (have no noticeable effect) and mostly reflect background evolutionary changes. On the other hand, non-synonymous substitutions are largely affected by natural selection, as changes in a protein can give an organism a selective advantage (or be detrimental to overall fitness). Under neutrality, the rate of synonymous (Ks) and non-synonymous (Kn) substitutions will be equivalent. Deviations from this expectation indicate a significant role of natural selection. Insights into trends of natural selection are gained from evaluating the Kn/Ks ratio. We observe Kn/Ks = 1 under neutrality; we observe Kn/Ks > 1 when non-synonymous substitutions are fixed at a faster rate than synonymous ones (positive selection); and, we observe Kn/Ks < 1 when new amino acid changes are eliminated (purifying selection).

The CoGe platform is capable of calculating the Kn/Ks ratio on syntenic gene pairs across the length of a genome. CoGe’s Kn/Ks analyses can be used to:

  • Identify hotspots of strong positive or purifying selection across the length of the genome.
  • Establish associations between genome position (e.g. telomeres vs. centromeres) and trends of natural selection.
  • Describe species- or genus-specific adaptive trends.

CoGe uses the CodeML analysis tool to measure the Kn/Ks ratio between two annotated genomes. The CodeML analysis tool can be accessed from SynMap. Here, we evaluated the selective trends of three closely related species from the Laveranian subgenus (Figure 18).

Figure 18. Phylogeny of Plasmodium species of the Laverania subgenus built using mitochondrial sequences. Species included in our analysis are marked with a red asterisk. Modified from Rayner et al. (2011) [29]
Figure 19. Paired Ks analyses between species of the Laverania subgenus. A. P. gaboni vs. P. reichenowi; B. P. falciparum vs. P. reichenowi; and, C. P. gaboni vs. P. falciparum
Figure 20. Paired Kn analyses between species of the Laverania subgenus. A. P. gaboni vs. P. reichenowi; B. P. falciparum vs. P. reichenowi; and, C. P. gaboni vs. P. falciparum
The following steps show how to perform Kn/Ks analyses using SynMap’s CodeML tool:

1. Go to: https://genomevolution.org/coge/ and login to CoGe.

2. Run SynMap or select a previous SynMap analysis from My Data (CoGe stores all ran analyses under a users' account).

3. Find the CodeML tool under the Analysis Options tab. Click on Calculate syntenic CDS pairs and color dots: substitution rates(s) and select Synonymous (Ks) from the dropdown menu. Repeat the analysis selecting the Non-synonymous (Kn) and (Kn/Ks) options. You can alter the display selecting a different Color Scheme, specifying Min Val. or Max Val. axis values, or changing the Log10 Transform. data option.

4. The analysis will modify the Syntenic_dotplot display to represent the distribution of the Ks, Kn or Kn/Ks values across syntenic gene pairs. A Histogram of Ks values (or Kn or Ks/Kn) will also be generated. In SynMap2, specific regions can be dynamically selected to view the Ks, Kn or Kn/Ks values.

You can follow a link to Ks example analyses here (Figure 19):

https://genomevolution.org/r/ljhj (P. reichenowi vs. P. falciparum)

https://genomevolution.org/r/ljhl (P. falciparum vs. P. gaboni)

https://genomevolution.org/r/ljhq (P. reichenowi vs. P. gaboni)

You can follow a link to Kn example analyses here (Figure 20):

https://genomevolution.org/r/lsyy (P. reichenowi vs. P. gaboni)

https://genomevolution.org/r/lsz2 (P. reichenowi vs. P. falciparum)

https://genomevolution.org/r/lsz5 (P. falciparum vs. P. gaboni)

P. reichenowi and P. falciparum are thought to have diverged approximately 5.28-5.93 Mya [30]. The divergence time of either species with P. gaboni is estimated to be larger [31]. Based on these evolutionary relationships, it is expected that the number of accumulated nucleotide differences will be smaller between P. reichenowi and P. falciparum than between both species and P. gaboni.

We found smaller Ks values between P. gaboni (SY57) - P. reichenowi (CDC) than between P. gaboni (SY57) - P. falciparum (3D7) (Figure 19). Also, smaller Ks values were observed between P. reichenowi - P. falciparum than between P. falciparum - P. gaboni. The same trends were observed when a different P. reichenowi strain (SY75) was used (results can be replicated in the following links: https://genomevolution.org/r/mr5u for P. reichenowi vs. P. gaboni, and https://genomevolution.org/r/lzrr for P. reichenowi vs. P. falciparum). The differences in Ks rates suggest that a recent number of synonymous substitutions occurred on the P. reichenowi genome. Genome composition and codon usage are largely similar amongst Laveranian species (Figures 10 and 24). Thus, this variation could indicate an increased mutation rate in P. reichenowi, resulting in a rapidly evolving genome compared to other Laveranian. However, the reasons for this accelerated evolution remain unexplored.

Non-synonymous (Kn) substitution rates were largely similar between P. gaboni - P. falciparum and P. gaboni - P. reichenowi (Figure 20). Smaller Kn substitution values were observed between P. falciparum - P. reichenowi. Similar trends were seen when P. reichenowi (SY75) was used (results can be replicated in the following links: https://genomevolution.org/r/mr5z for P. reichenowi vs. P. gaboni, and https://genomevolution.org/r/mr5x for P. reichenowi vs. P. falciparum). These results suggest that a comparable rate of Kn changes occurred since the divergence of the P. reichenowi/P. falciparum ancestor. These changes were followed by a significant number of species-specific substitutions on both P. falciparum and P. reichenowi. Previous studies have found large Kn values in P. reichenowi - P. falciparum comparisons; particularly, in genes expressed during blood parasite's stages [32]. Thus, our results likely reflect Kn changes related to parasite-host interactions and adaptations to infection of different host types.

Identifying sets of syntenic genes amongst several genomes (SynFind)

Figure 21. Screen capture of Synfind analysis output. Additional links to CoGe's analyses can be found under Links. Results can be replicated here: https://genomevolution.org/r/moya

Small-scale genomic rearrangements are often linked to species-specific gene gain/loss events. Family-linked rearrangements are observed amongst closely related Plasmodium species, and in occasion, at an intra-specific level. CoGe’s tool, SynFind, is used to identify gene homologs across any number of genomes, and thus can be of use to identify these rearrangements.

The evolutionary trajectory of multigene families can be difficult to infer, especially in those with a scattered organization or rapid gene turnover. While this issue is particularly prevalent in species-specific gene families; genus-specific families can present intricate evolutionary patterns as well. One good example can be found in the SERA (serine repeat antigen) family, a gene family that has experienced a significant number of inter-specific contractions, expansions, and rearrangements. These patterns remain to be evaluated at an intra-specific level. We will use SynFind to study family's organization of SERA paralogs in 6 P. vivax strains.

SERA paralogs are expressed during various stages of the Plasmodium life cycle. All SERA family members encode proteins with a papain-like cysteine protease motif [33]. These motifs are commonly found both inside and outside the genus Plasmodium [34][35]. One member (SERA-5), expressed during late trophozoite and schizont stages, has been considered as a promising malaria vaccine target [36]. We will use this gene sequence as a query for the SynFind analysis.

Figure 22. GEvo analysis using the Synfind output. The number of sequences and display order has been modified to include only the SERA family: PVX_003850 (Salvador-1, set as reference), PVP01_0417200.1 (P01), cds1276 (Brazil I), cds1241 (North Korea), cds1011 (India VII), and cds1227 (Mauritania). Connector lines show syntenic regions between SERA family members. Brazil I strain is marked with a blue diamond. Strain-specific changes on family's organization are highlighted with a blue parallelogram. Results can be replicated here: https://genomevolution.org/r/mpdf
The following steps show how to use SynFind:

1. Go to: https://genomevolution.org/coge/ and login to CoGe.

2. Click on SynFind or follow this link: https://genomevolution.org/CoGe/SynFind.pl.

3. Type a scientific name of your search bar under Select Target Genomes. Organisms and genomes with names matching the search term will be displayed on the Matching Organisms menu.

4. Select the genomes of interest using Ctl+click or Command+click, then click on + Add. The genomes will appear on the Selected Genomes menu. You can also import genomes from your Notebooks.

5. Type the Name, Annotation, or Organisms on the Specify Features section. It is recommended to include as many specific terms as possible. Once done click on Search.

6. All matches to the search term and the genome where they have been found will appear in a new menu within the same section. Select all relevant Matches and the reference Genome.

7. Click on Run SynFind to start the analysis.

8. SynFind will output all syntenic regions from the reference genome and their Syntenic depth. This output can be used as a query for other CoGe tools.

You can follow a link to a SynFind example analysis here: https://genomevolution.org/r/moya

GEvo results can be replicated here: https://genomevolution.org/r/mpdf

We used Synfind to identify genes homologous to SERA-5 across 6 P. vivax genomes (Figure 21). Synfind’s output was used as a query for a GEvo analysis of the region. Our results show a conserved number of SERA paralogs in all P. vivax strains. The organization of the SERA family was different on the Brazil I strain respect to other P. vivax strains (Figure 22). Previous studies on SERA have suggested that some family members are unique to P. vivax and closely related species [37]. Our results indicate that family organization is not completely conserved on the intra-specific level. This is most evident on recently duplicated paralogs.

SynFind also identified matching segments outside the SERA multigene family. These segments belonged to hypothetical protein coding genes, ATP proteases, and uncharacterized transcripts. Papain-like cysteine protease motifs are commonly found outside both Plasmodium and the SERA family. Thus, is likely that these segments share a papain-like cysteine protease motif but are not evolutionarily related to SERA.

Additional tools for genome analysis with CoGe

You can learn about the SPA usage on Plasmodium genomes in the following link: Plasmodium genome analysis using Syntenic Path Assembly

Overall conclusions

The number of available Plasmodium genomes has increased considerably during recent years. This wealth of genomic information creates an unprecedented opportunity to study the unique genomic qualities of this genus using comparative genomics.

There have been tremendous achievements in malaria treatment and control strategies. Thanks to worldwide efforts, there has been a significant reduction in the number of malaria cases and malaria-related deaths between 2000 and 2015. By 2015, it was estimated that the number of malaria cases decreased from 262 million to 214 million, and the number of malaria-related deaths from 839,000 to 438,000 [38]. However, there are still numerous aspects of malaria research that need to be further addressed.

The intricacies of parasite-host relations in Plasmodium infection might be more complex than previously considered [39]. Humans have recently been infected by Plasmodium species classically considered specific to non-human primates (e.g. a single infection with P. cynomolgi [40] and various infections with P. knowlesi [41]). In addition, african primates have been infected by unique P. falciparum strains (a parasite classically considered exclusive to humans) and are proposed to act as reservoirs for this parasite [42][43]. In bird Plasmodium, the putative evolutionary time of parasite-host associations has a significant role in the development of pathogenicity and in host mortality [44]. Finally, multiple host-switch events between largely divergent host types are thought to have occurred in bat Haemosporidia [45]. These cases highlight the complexity of the Plasmodium infection landscape. Insights into the unique patterns of Plasmodium biology, epidemiology, ecology, and genetics can be obtained from molecular and comparative genomic studies.

The rapid growth of genomic information makes implementing tools that facilitate assessing genome evolutionary trends an imperative task. The services and tools provided by the CoGe platform are of considerable use in advancing Plasmodium comparative genomics. Here, we showed how various CoGe tools could be used to assess evolutionary patterns unique to Plasmodium. We also showed how to use this platform to further characterize sequenced Plasmodium genomes. Overall, we have demonstrated that CoGe’s tools can be used to address evolutionary questions such as:

  • The evolutionary origins of Laveranian AT-rich genomes.
  • The location and nature of genome rearrangements between Plasmodium.
  • The evolutionary patterns of genes crucial in cell invasion.
  • The evolutionary trends of multigene families.

Useful links

Plasmodium Notebooks in CoGe

Link to Notebook for published Plasmodium genome data: https://genomevolution.org/coge/NotebookView.pl?lid=1753
Link to Notebook for published P. falciparum strains: https://genomevolution.org/coge/NotebookView.pl?lid=1758
Link to Notebook for published P. vivax strains: https://genomevolution.org/coge/NotebookView.pl?lid=1760
Link to Notebook for published Plasmodium apicoplast data: https://genomevolution.org/coge/NotebookView.pl?lid=1754
Link to Notebook for published Plasmodium mitochondrion data: https://genomevolution.org/coge/NotebookView.pl?lid=1756

Sample data

  • Gene sequences used on CoGeBLAST analysis (obtained from PlasmoDB):
PVX_113230.1 | Plasmodium vivax Sal-1 | variable surface protein Vir14-related (http://plasmodb.org/plasmo/app/record/gene/PVX_113230)
PVX_096004.1 | Plasmodium vivax Sal-1 | VIR protein (http://plasmodb.org/plasmo/app/record/gene/PVX_096004)
  • Gene sequence used on SynFind to inform GEvo analysis (obtained from PlasmoDB):
PVX_003830.1 | Plasmodium vivax Sal-1 | serine-repeat antigen 5 (SERA) (http://plasmodb.org/plasmo/app/record/gene/PVX_003830)
  • Gene sequences used on CoGeBLAST to inform GEvo analysis (obtained from PlasmoDB):
PF3D7_0424100.1 | Plasmodium falciparum 3D7 | reticulocyte binding protein homologue 5 (http://plasmodb.org/plasmo/app/record/gene/PF3D7_0424100)
PVX_096410.1 | Plasmodium vivax Sal-1 | cysteine repeat modular protein 2, putative (http://plasmodb.org/plasmo/app/record/gene/PVX_096410)


  1. Jackson AP. 2015. Preface. The evolution of parasite genomes and the origins of parasitism. Parasitology. 142 Suppl 1:S1-5. https://www.ncbi.nlm.nih.gov/pubmed/25656359
  2. Sinka ME, Bangs MJ, Manguin S, Rubio-Palis Y, Chareonviriyaphap T, Coetzee M, Mbogo CM, Hemingway J, Patil AP, Temperley WH, Gething PW, Kabaria CW, Burkot TR, Harbach RE, Hay SI. 2012. A global map of dominant malaria vectors. Parasit Vectors. 5:69. https://www.ncbi.nlm.nih.gov/pubmed/22475528
  3. DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/
  4. Carlton JM, Perkins SL, Deitsch KW. 2013. Malaria Parasites. Caister Academic Press
  5. Prugnolle F, Durand P, Ollomo B, Duval L, Ariey F, Arnathau C, Gonzalez JP, Leroy E, Renaud F. 2011. A Fresh Look at the Origin of Plasmodium falciparum, the Most Malignant Malaria Agent. PLoS Pathog. 7: e1001283. http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1001283
  6. Prugnolle F, Rougeron V, Becquart P, Berry A, Makanga B, Rahola N, Arnathau C, Ngoubangoye B, Menard S, Willaume E, Ayala FJ, Fontenille D, Ollomo B, Durand P, Paupy C, Renaud F. 2013. Diversity, host switching and evolution of Plasmodium vivax infecting African great apes. Proc Natl Acad Sci U S A. 110:8123-8. https://www.ncbi.nlm.nih.gov/pubmed/23637341
  7. Buscaglia CA, Kissinger JC, Agüero F. 2015. Neglected Tropical Diseases in the Post-Genomic Era. Trends Genet. 31:539-55. https://www.ncbi.nlm.nih.gov/pubmed/26450337
  8. Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2016. GenBank. Nucleic Acids Res. 44: D67–D72. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702903/
  9. Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ Jr, Treatman C, Wang H. 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37:D539-43. https://www.ncbi.nlm.nih.gov/pubmed/18957442
  10. Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, Carver T, Aslett M, Olsen C, Subramanian S, Phan I, Farris C, Mitra S, Ramasamy G, Wang H, Tivey A, Jackson A, Houston R, Parkhill J, Holden M, Harb OS, Brunk BP, Myler PJ, Roos D, Carrington M, Smith DF, Hertz-Fowler C, Berriman M. 2012. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40:D98-108. https://www.ncbi.nlm.nih.gov/pubmed/22116062
  11. Bensch S, Hellgren O, Pérez-Tris J. 2009. MalAvi: a public database of malaria parasites and related haemosporidian in avian hosts based on mitochondrial cytochrome b lineages. Mol Ecol Resour. 9:1353-8. https://www.ncbi.nlm.nih.gov/pubmed/21564906
  12. Singh V, Gupta P, Pande V. 2014. Revisiting the multigene families: Plasmodium var and vir genes. J Vector Borne Dis. 51:75-81. https://www.ncbi.nlm.nih.gov/pubmed/24947212
  13. Niang M, Yan Yam X, Preiser PR. 2009. The Plasmodium falciparum STEVOR multigene family mediates antigenic variation of the infected erythrocyte. PLoS Pathog. 5:e1000307. https://www.ncbi.nlm.nih.gov/pubmed/19229319
  14. Witmer K, Schmid CD, Brancucci NM, Luah YH, Preiser PR, Bozdech Z, Voss TS. 2012. Analysis of subtelomeric virulence gene families in Plasmodium falciparum by comparative transcriptional profiling. Mol Microbiol. 84:243-59. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3491689/
  15. Petter M, Bonow I, Klinkert MQ. 2008. Diverse expression patterns of subgroups of the rif multigene family during Plasmodium falciparum gametocytogenesis. PLoS One. 3:e3779. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0003779
  16. Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, Cheng Q, Coulson RM, Crabb BS, Del Portillo HA, Essien K, Feldblyum TV, Fernandez-Becerra C, Gilson PR, Gueye AH, Guo X, Kang'a S, Kooij TW, Korsinczky M, Meyer EV, Nene V, Paulsen I, White O, Ralph SA, Ren Q, Sargeant TJ, Salzberg SL, Stoeckert CJ, Sullivan SA, Yamamoto MM, Hoffman SL, Wortman JR, Gardner MJ, Galinski MR, Barnwell JW, Fraser-Liggett CM. 2008. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 455:757-63. https://www.ncbi.nlm.nih.gov/pubmed/18843361
  17. Lopez FJ, Bernabeu M, Fernandez-Becerra C, del Portillo HA. 2013. A new computational approach redefines the subtelomeric vir superfamily of Plasmodium vivax. BMC Genomics. 14:8. https://www.ncbi.nlm.nih.gov/pubmed/?term=A+new+computational+approach+redefines+the+subtelomeric+vir+superfamily+of+Plasmodium+vivax
  18. Fernandez-Becerra C, Yamamoto MM, Vêncio RZ, Lacerda M, Rosanas-Urgell A, del Portillo HA. 2009. Plasmodium vivax and the importance of the subtelomeric multigene vir superfamily. Trends Parasitol. 2009 25:44-51. https://www.ncbi.nlm.nih.gov/pubmed/19036639
  19. Neafsey DE, Galinsky K, Jiang RH, Young L, Sykes SM, Saif S, Gujja S, Goldberg JM, Young S, Zeng Q, Chapman SB, Dash AP, Anvikar AR, Sutton PL, Birren BW, Escalante AA, Barnwell JW, Carlton JM. 2012. The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum. Nat Genet. 44:1046-50. https://www.ncbi.nlm.nih.gov/pubmed/22863733
  20. Cowman AF, Crabb BS. 2006. Invasion of red blood cells by malaria parasites. Cell. 124:755-66. https://www.ncbi.nlm.nih.gov/pubmed/16497586
  21. Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, Shaw KS, Ayouba A, Peeters M, Speede S, Shaw GM, Bushman FD, Brisson D, Rayner JC, Sharp PM, Hahn BH. 2016. Genomes of cryptic chimpanzee Plasmodium species reveal key evolutionary events leading to human malaria. Nat Commun. 7:11078. https://www.ncbi.nlm.nih.gov/pubmed/27002652
  22. Tang H, Lyons E. 2012. Unleashing the Genome of Brassica Rapa. Front Plant Sci. 3: 172. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3408644/
  23. Ghanbarian AT, Hurst LD. 2015. Neighboring Genes Show Correlated Evolution in Gene Expression. Mol Biol Evol. doi:10.1093/molbev/msv053http://mbe.oxfordjournals.org/content/early/2015/04/01/molbev.msv053.full
  24. De S, Teichmann SA, Babu MM. 2009. The impact of genomic neighborhood on the evolution of human and chimpanzee transcriptome. Genome Res. 19(5): 785–794. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675967/
  25. Michalak P. 2008. Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics. 91:(43–248) http://www.sciencedirect.com/science/article/pii/S0888754307002807
  26. Rovira-Graells N, Gupta AP, Planet E, Crowley VM, Mok S, Ribas de Pouplana L, Preiser PR, Bozdech Z, Cortés A. 2012. Transcriptional variation in the malaria parasite Plasmodium falciparum. Genome Res. 5:925-38. https://www.ncbi.nlm.nih.gov/pubmed/22415456
  27. Tachibana SI, Sullivan SA, Kawai S, Nakamura S, Kim HR, Goto N, Arisue N, Palacpac NM, Honma H, Yagi M, Tougan T, Katakai Y, Kaneko O, Mita T, Kita K, Yasutomi Y, Sutton PL, Shakhbatyan R, Horii T, Yasunaga T, Barnwell JB, Escalante AA, Carlton JM, Tanabe K. 2012. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat Genet. 44: 1051–1055. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759362/
  28. Pacheco MA, Reid MJ, Schillaci MA, Lowenberger CA, Galdikas BM, Jones-Engel L, Escalante AA. 2012. The origin of malarial parasites in orangutans. PLoS One. 7:e34990. https://www.ncbi.nlm.nih.gov/pubmed/22536346
  29. Rayner JC, Liu W, Peeters M, Sharp PM, Hahn BH. 2011. A plethora of Plasmodium species in wild apes: a source of human infection? Trends Parasitol. 27:222-9. https://www.ncbi.nlm.nih.gov/pubmed/21354860?dopt=Abstract&holding=npg
  30. Pacheco MA, Reid MJ, Schillaci MA, Lowenberger CA, Galdikas BM, Jones-Engel L, Escalante AA. 2012. The origin of malarial parasites in orangutans. PLoS One. 7:e34990. https://www.ncbi.nlm.nih.gov/pubmed/22536346
  31. Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, Shaw KS, Ayouba A, Peeters M, Speede S5, Shaw GM, Bushman FD, Brisson D, Rayner JC, Sharp PM, Hahn BH. 2016. Genomes of cryptic chimpanzee Plasmodium species reveal key evolutionary events leading to human malaria. Nat Commun. 7:11078. https://www.ncbi.nlm.nih.gov/pubmed/27002652
  32. Otto TD, Rayner JC, Böhme U, Pain A, Spottiswoode N, Sanders M, Quail M, Ollomo B, Renaud F, Thomas AW, Prugnolle F, Conway DJ, Newbold C, Berriman M. 2014. Genome sequencing of chimpanzee malaria parasites reveals possible pathways of adaptation to human hosts. Nat Commun. 5:4754. https://www.ncbi.nlm.nih.gov/pubmed/25203297
  33. Arisue N, Kawai S, Hirai M, Palacpac NM, Jia M, Kaneko A, Tanabe K, Horii T. 2011. Clues to Evolution of the SERA Multigene Family in 18 Plasmodium Species. PLoS One. 6: e17775. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017775
  34. Prasad R, Atul, Soni A, Puri SK, Sijwali PS. 2012. Expression, characterization, and cellular localization of knowpains, papain-like cysteine proteases of the Plasmodium knowlesi malaria parasite. PLoS One. 12:e51619. https://www.ncbi.nlm.nih.gov/pubmed/23251596
  35. Brömme D. 2001. Papain-like cysteine proteases. Curr Protoc Protein Sci. 21. doi: 10.1002/0471140864.ps2102s21. https://www.ncbi.nlm.nih.gov/pubmed/18429163
  36. Arisue N, Hirai M, Arai M, Matsuoka H, Horii T. 2007. Phylogeny and evolution of the SERA multigene family in the genus Plasmodium. J Mol Evol. 65:82-91. http://link.springer.com/article/10.1007%2Fs00239-006-0253-1
  37. Arisue N, Kawai S, Hirai M, Palacpac NM, Jia M, Kaneko A, Tanabe K, Horii T. 2011. Clues to Evolution of the SERA Multigene Family in 18 Plasmodium Species. PLoS One. 6: e17775. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017775
  38. World Health Organization. (2015). World Malaria Report 2015. Retrieved from http://www.who.int/malaria/publications/world-malaria-report-2015/report/en/
  39. Garamszegi LZ. 2009. Patterns of co-speciation and host switching in primate malaria parasites. Malar J. 110. doi: 10.1186/1475-2875-8-110. https://www.ncbi.nlm.nih.gov/pubmed/19463162
  40. Ta TH, Hisam S, Lanza M, Jiram AI, Ismail N, Rubio JM. 2014. First case of a naturally acquired human infection with Plasmodium cynomolgi. Malar J. 13: 68. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3937822/
  41. Singh B, Daneshvar C. 2013. Human infections and detection of Plasmodium knowlesi. Clin Microbiol Rev. 26:165-84. https://www.ncbi.nlm.nih.gov/pubmed/23554413
  42. Prugnolle F, Durand P, Neel C, Ollomo B, Ayala FJ, Arnathau C, Etienne L, Mpoudi-Ngole E, Nkoghe D, Leroy E, Delaporte E, Peeters M, Renaud F. 2010. African great apes are natural hosts of multiple related malaria species, including Plasmodium falciparum. Proc Natl Acad Sci U S A. 107:1458-63. https://www.ncbi.nlm.nih.gov/pubmed/20133889
  43. Duval L, Fourment M, Nerrienet E, Rousset D, Sadeuh SA, Goodman SM, Andriaholinirina NV, Randrianarivelojosia M, Paul RE, Robert V, Ayala FJ, Ariey F. 2010. African apes as reservoirs of Plasmodium falciparum and the origin and diversification of the Laverania subgenus. Proc Natl Acad Sci U S A. 107:10561-6. https://www.ncbi.nlm.nih.gov/pubmed/20498054
  44. Krizanauskiene A, Hellgren O, Kosarev V, Sokolov L, Bensch S, Valkiunas G. 2006. Variation in host specificity between species of avian haemosporidian parasites: evidence from parasite morphology and cytochrome B gene sequences. J Parasitol. 6:1319-24. https://www.ncbi.nlm.nih.gov/pubmed/17304814
  45. Duval L, Robert V, Csorba G, Hassanin A, Randrianarivelojosia M, Walston J, Nhim T, Goodman SM, Ariey F. 2007. Multiple host-switching of Haemosporidia parasites in bats. Malar J. 6:157. https://www.ncbi.nlm.nih.gov/pubmed/18045505