Difference between revisions of "Plasmodium analysis workflow 1: Tools that evaluate genomic properties and amino acid usage"

From CoGepedia
Jump to: navigation, search
(Created page with "=== ''Analyzing GC content and other genomic properties (GenomeList)'' === File:Genomelistnew.png|thumb|250px|'''Figure 1. Genome List''' upload window as seem from '''Orga...")
Line 48: Line 48:
=== Abstract ===
==== ''Comparing genomic composition sequence: GenomeList'' ====
==== ''Comparing genomic composition sequence: GenomeList'' ====

Revision as of 18:12, 9 February 2017

Analyzing GC content and other genomic properties (GenomeList)

Figure 1. Genome List upload window as seem from OrganismView. Twelve Plasmodium genomes have been included. Analysis can be run following this link: https://genomevolution.org/r/lys1

There are significant variations in average GC content and GC content distribution between the two main human malaria agents: P. vivax and P. falciparum. The average GC content is 40.57% P. vivax compared to 19.4% in P. falciparum. GC-poor regions are restricted to the subtelomeric regions of P.vivax’s genome, whereas they are ubiquitous across the P. falciparum genome [1]. The current model is that AT-rich genomes represent the ancestral state and GC-rich genomes the derived state in specific Plasmodium lineages [2]. Here, we will evaluate the patterns of GC content variation across three of the four main Plasmodium clades.

Figure 2. Genome List output window shows the analysis of 12 Plasmodium genomes. Clades are indicated with colors: simian clade (brown), rodent clade (red), and Laveranian subgenus (blue). The number of columns on display has been modified.

CoGe can display a genome’s GC content in GenomeInfo. To calculate GC content, click on %GC under the Length and/or Noncoding sequence sections on the Statistics tab. You can compare and contrast GC content (and other genomic features) across several species and/or strains using GenomeList. This tool creates a list of genomes selected by the user and calculates features such as:

  • Amino acid usage.
  • Codon usage.
  • Coding sequence (CDS) GC content.
  • Number of genes.
  • Number of introns.

GenomeList also summarizes some of the genomes’s metadata including:

Figure 3. GC content is written in color text next to each analyzed Plasmodium genome. Species are colored according to their clade: simian (brown), rodent (red), Laveranian subgenus (blue), and reptile-birds (green/purple). Figure modified from Hayakawa et al. (2008) [3]
The following steps indicate how to perform comparative analyses using the GenomeList tool in CoGe:

1. Go to: https://genomevolution.org/coge/ and login to CoGe

2. Click on OrganismView or follow this link: https://genomevolution.org/coge/OrganismView.pl

3. Type the scientific name of any organism of interest on the Search box. Then, select a genome version.

4. Find the Tools section under Genome Information and click on Add to GenomeList. The first genome added to GenomeList will appear in a new window.

5. Without closing this window, type the scientific name of another organism on the Search box. Select the genome version and click on Add to GenomeList.

6. Once you have added all genomes click on Send to GenomeList (Figure 1).

7. GenomeList will generate a table including all the selected genomes. You can use GenomeList to select and compare different genomic features and attributes. The analyses can be run on specific genomes or on all the included genomes. You can also select the display columns by clicking on Change Viewable Columns.

8. Click on "Send Selected Genomes to" to download the genomes included on GenomeList.

You can follow a link to an example analysis here: https://genomevolution.org/r/lys1


Comparing genomic composition sequence: GenomeList

We used GenomeList to compare 12 fully sequenced Plasmodium genomes (Figure 1). Our results show that species closely related to P. falciparum (subgenus Laverania) have similarly AT rich genomes. GC content was higher in Plasmodium species of the simian and rodent clades (Figure 2 and Figure 3). The highest GC content values were observed in species of the simian clade (P. vivax, P. cynomolgi and P. knowlesi). Tellingly, these species all share a common ancestor and diverged from one another recently. GC content varied widely across Plasmodium species infecting humans (P. vivax, P. ovale, P. malariae, and P. falciparum) but not on species infecting rodents (P. berghei, P. chabaudi, and P. yoelii). GC content also varied in human-infecting Plasmodium within the simian clade (P. vivax = 46.89%, P. ovale = 32.83%, and P. malariae = 25.12%). Our results suggest that GC-richness (> 30%) evolved recently and is a derived state within the genus. Our results suggest that a correlation between GC-content and evolutionary relatedness, but not with host-related selective pressures.

AT-richness as an ancestral state for the Plasmodium genus is unusual since closely related genera within the phylum Apicomplexas frequently have GC-rich genomes (Toxoplasma gondii = 52.28%, Cryptosporidium parvum = 30.4%, C. muris = 28.5%, Theileria orientalis = 41.58%, T. equii = 39.47%, Babesia bovis = 36.3%, Eimeria tenella = 51.07%, etc.). Our data suggests that Plasmodium GC content may be in the process of being reinstated to values that can be considered typical for the phylum. The implications of and mechanisms behind the extreme variability in GC-content within Plasmodium are currently being investigated [4].

Identifying codon and amino acid substitution frequencies (CodeOn)

Figure 4. Amino acid usage tables of simian clade Plasmodium species. Upper row: sister species P. vivax and P. cynomolgi. Bottom row: sister species P. knowlesi and P.coatneyi. See steps section (green box) to find links to rerun the analyses.

Codon and amino acid usage are significantly shaped by two factors: selection for translational efficiency and genome composition. The significance of translational selection on genome evolution varies across the genus Plasmodium. It is believed that usage of less energetically expensive amino acids provides an evolutionary advantage by decreasing energetic costs during infection [5]. In P. falciparum many highly expressed genes are predominantly composed of C-ended codons despite the AT-rich genome. In the GC-rich P. vivax genome, translational selection and codon usage bias are not strongly related [6]. Genome composition is also a powerful force in protein evolution.

Here, we will use CodeOn to calculate amino acid usage across a range of GC-rich to GC-poor genomes. We will measure the effects of genome composition bias on amino acid usage across 7 Plasmodium genomes from two major clades (Laveranian and simian).

Figure 5. Amino acid usage tables in Plasmodium species from the Laveranian subgenus. Upper row: sister species P. falciparum and P. reichenowi. Bottom row: P. gaboni. See steps section (green box) to find links to rerun the analyses.
The following steps indicate how to built amino acid usage tables using CodeOn:

1. Go to: https://genomevolution.org/coge/ and login to CoGe.

2. Find the genome of interest in OrganismView or follow this link https://genomevolution.org/coge/OrganismView.pl

3. Click on CodeOn to start the analysis. After a couple of minutes, the output will show in a different tab.

You can follow links to CodeOn example analyses for the simian clade here (Figure 4):

https://genomevolution.org/coge/CodeOn.pl?oid=27002 (P. vivax)

https://genomevolution.org/coge/CodeOn.pl?dsgid=32770 (P. cynomolgi)

https://genomevolution.org/coge/CodeOn.pl?oid=26997 (P. knowlesi)

https://genomevolution.org/coge/CodeOn.pl?oid=40698 (P. coatneyi)

You can follow links to CodeOn example analyses for the Laveranian subgenus here (Figure 5):

https://genomevolution.org/coge/CodeOn.pl?oid=26992 (P. falciparum)

https://genomevolution.org/coge/CodeOn.pl?oid=40801 (P. reichenowi)

https://genomevolution.org/coge/CodeOn.pl?oid=40696 (P. gaboni)

Amino acid usage trends were markedly different in species from different clades (Figure 4 and Figure 5). On the other hand, closely related Plasmodium species showed similar amino acid usage patterns.

P. vivax (Salvador-1) had the highest number of CDS with 45-55% GC content. Closely related species (P. cynomolgi, P. knowlesi, and P.coatneyi) had a higher number of CDS in the 40-45% GC tier (Figure 4). Genome composition is similar between P. cynomolgi, P. knowlesi, and P. coatneyi (Figure 2 and Figure 3). However, patterns of amino acid usage were markedly different on P. coatneyi respect to other simian species (Figure 4).

In the Laveranian subgenus, the number of CDS with 20-30% GC content was significantly larger. Amino acid usage was similar in P. falciparum (3D7) and P. reichenowi (SY57), but slightly different in P. gaboni (Figure 5). This variation is noteworthy given that the three species share a similar GC-content (Figure 2 and Figure 3). This result suggests that GC-content is a significant factor in amino acid usage on both the simian clade and Laveranian subgenus. However, we cannot discard the significance of additional factors not evaluated here.


  1. Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, Cheng Q, Coulson RM, Crabb BS, Del Portillo HA, Essien K, Feldblyum TV, Fernandez-Becerra C, Gilson PR, Gueye AH, Guo X, Kang'a S, Kooij TW, Korsinczky M, Meyer EV, Nene V, Paulsen I, White O, Ralph SA, Ren Q, Sargeant TJ, Salzberg SL, Stoeckert CJ, Sullivan SA, Yamamoto MM, Hoffman SL, Wortman JR, Gardner MJ, Galinski MR, Barnwell JW, Fraser-Liggett CM. 2008. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 455:757-63. https://www.ncbi.nlm.nih.gov/pubmed/18843361
  2. Nikbakht H, Xia X, Hickey DA. 2014. The evolution of genomic GC content undergoes a rapid reversal within the genus Plasmodium. Genome. 9:507-511. https://www.ncbi.nlm.nih.gov/pubmed/25633864
  3. Hayakawa T, Culleton R, Otani H, Horii T, Tanabe K. 2008. Big bang in the evolution of extant malaria parasites. Mol Biol Evol. 10:2233-9. https://www.ncbi.nlm.nih.gov/pubmed/18687771
  4. Bensch S, Canbäck B, DeBarry JD, Johansson T, Hellgren O, Kissinger JC, Palinauskas V, Videvall E, Valkiūnas G. 2016. The Genome of Haemoproteus tartakovskyi and Its Relationship to Human Malaria Parasites. Genome Biol Evol. 8:1361-73.https://www.ncbi.nlm.nih.gov/pubmed/27190205
  5. Peixoto L, Fernández V, Musto H. 2004. The effect of expression levels on codon usage in Plasmodium falciparum. Parasitology. 128:245-51. https://www.ncbi.nlm.nih.gov/pubmed/15074874
  6. Yadav MK, Swati D. 2012. Comparative genome analysis of six malarial parasites using codon usage bias based tools. Bioinformation. 8:1230-9. https://www.ncbi.nlm.nih.gov/pubmed/23275725