Difference between revisions of "Using CoGe for the analysis of Plasmodium spp"

From CoGepedia
Jump to: navigation, search
(Identifying gene homologs (CoGeBlast))
(A brief introduction to Plasmodium genome evolution)
Line 19: Line 19:
 
The unique features found in many parasitic genomes create unique challenges when using comparative genomics to study their evolution. Parasite genomes are characterized by a mixture of genome reduction associated with gene loss (''e.g.'' homeobox genes), but also for the development of specialized genes. Many of the genes gained in parasitic genomes are involved in different aspects of host-parasite interaction and are, for the most part, species or lineage specific <ref>Jackson AP. 2015. Preface. The evolution of parasite genomes and the origins of parasitism. Parasitology. 142 Suppl 1:S1-5. https://www.ncbi.nlm.nih.gov/pubmed/25656359</ref>. This dynamic nature of parasitic genomes is especially evident within the phylum ''Apicomplexa'', and particularly within the genus ''Plasmodium''. A marked loss of synteny between different ''Apicomplexa'' genera has been previously reported <ref>Carlton JM, Perkins SL, Deitsch KW. 2013. '''''Malaria Parasites'''''. Caister Academic Press</ref>, although syntenic relationships between species within a single genus are largely conserved. While this finding remains true for many genera, the increasing number of sequenced ''Plasmodium'' genomes has shown that numerous clade and species-specific gain/loss events and chromosome rearrangements have occurred <ref>Tachibana SI, Sullivan SA, Kawai S, Nakamura S, Kim HR, Goto N, Arisue N, Palacpac NM, Honma H, Yagi M, Tougan T, Katakai Y, Kaneko O, Mita T, Kita K, Yasutomi Y, Sutton PL, Shakhbatyan R, Horii T, Yasunaga T,  Barnwell JB, Escalante AA, Carlton JM, Tanabe K. 2012. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat Genet. 44: 1051–1055. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759362/</ref>. The exact origins and mechanisms of these rearrangements remains largely unexplored, but they are generally hypothesized to stem from different host shift events <ref>Prugnolle F, Durand P, Ollomo B, Duval L, Ariey F, Arnathau C, Gonzalez JP, Leroy E, Renaud F. 2011. A Fresh Look at the Origin of Plasmodium falciparum, the Most Malignant Malaria Agent. PLoS Pathog. 7: e1001283. http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1001283</ref><ref>Prugnolle F, Rougeron V, Becquart P, Berry A, Makanga B, Rahola N, Arnathau C, Ngoubangoye B, Menard S, Willaume E, Ayala FJ, Fontenille D, Ollomo B, Durand P, Paupy C, Renaud F. 2013. Diversity, host switching and evolution of Plasmodium vivax infecting African great apes. Proc Natl Acad Sci U S A. 110:8123-8. https://www.ncbi.nlm.nih.gov/pubmed/23637341</ref>, which have led to diverse types of host-parasite interactions.
 
The unique features found in many parasitic genomes create unique challenges when using comparative genomics to study their evolution. Parasite genomes are characterized by a mixture of genome reduction associated with gene loss (''e.g.'' homeobox genes), but also for the development of specialized genes. Many of the genes gained in parasitic genomes are involved in different aspects of host-parasite interaction and are, for the most part, species or lineage specific <ref>Jackson AP. 2015. Preface. The evolution of parasite genomes and the origins of parasitism. Parasitology. 142 Suppl 1:S1-5. https://www.ncbi.nlm.nih.gov/pubmed/25656359</ref>. This dynamic nature of parasitic genomes is especially evident within the phylum ''Apicomplexa'', and particularly within the genus ''Plasmodium''. A marked loss of synteny between different ''Apicomplexa'' genera has been previously reported <ref>Carlton JM, Perkins SL, Deitsch KW. 2013. '''''Malaria Parasites'''''. Caister Academic Press</ref>, although syntenic relationships between species within a single genus are largely conserved. While this finding remains true for many genera, the increasing number of sequenced ''Plasmodium'' genomes has shown that numerous clade and species-specific gain/loss events and chromosome rearrangements have occurred <ref>Tachibana SI, Sullivan SA, Kawai S, Nakamura S, Kim HR, Goto N, Arisue N, Palacpac NM, Honma H, Yagi M, Tougan T, Katakai Y, Kaneko O, Mita T, Kita K, Yasutomi Y, Sutton PL, Shakhbatyan R, Horii T, Yasunaga T,  Barnwell JB, Escalante AA, Carlton JM, Tanabe K. 2012. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat Genet. 44: 1051–1055. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759362/</ref>. The exact origins and mechanisms of these rearrangements remains largely unexplored, but they are generally hypothesized to stem from different host shift events <ref>Prugnolle F, Durand P, Ollomo B, Duval L, Ariey F, Arnathau C, Gonzalez JP, Leroy E, Renaud F. 2011. A Fresh Look at the Origin of Plasmodium falciparum, the Most Malignant Malaria Agent. PLoS Pathog. 7: e1001283. http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1001283</ref><ref>Prugnolle F, Rougeron V, Becquart P, Berry A, Makanga B, Rahola N, Arnathau C, Ngoubangoye B, Menard S, Willaume E, Ayala FJ, Fontenille D, Ollomo B, Durand P, Paupy C, Renaud F. 2013. Diversity, host switching and evolution of Plasmodium vivax infecting African great apes. Proc Natl Acad Sci U S A. 110:8123-8. https://www.ncbi.nlm.nih.gov/pubmed/23637341</ref>, which have led to diverse types of host-parasite interactions.
  
Despite the enormous diversity of ''Plasmodium'' parasites, all studies to date (2016) show conservation of certain genomic characteristics. Fourteen chromosomes, a mitochondrial, and an apicoplast compose the entire repertoire of the ''Plasmodium'' genome in all sequenced species. This conservation in genomic complement is remarkable, especially considering the potential for altering the number of chromosomes without compromising genome the size can be observed ancestrally (''e.g.'' 4 chromosomes and 13Mb approximately in ''Babesia bovis'' vs. 14 chromosomes and 18Mb approximately in the smallest ''Plasmodium'' genome). As in the case of other parasites, ''Plasmodium'' genomes are relatively small (between 17-28Mb approximately) in comparison to those of the hosts, but larger than those of other ''Apicomplexan'' parasites (''Theileria orientalis'' and ''Cryptosporidium parvum'' have genomes of approximately 9Mb) <ref>DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/</ref>. All ''Plasmodium'' species have a complex life cycle involving some kind of vertebrate host and a mosquito vector of the genus ''Anopheles''. Though specificities and preferences during the infection process are prevalent within the genus <ref>Sinka ME, Bangs MJ, Manguin S, Rubio-Palis Y, Chareonviriyaphap T, Coetzee M, Mbogo CM, Hemingway J, Patil AP, Temperley WH, Gething PW, Kabaria CW, Burkot TR, Harbach RE, Hay SI. 2012. A global map of dominant malaria vectors. Parasit Vectors. 5:69. https://www.ncbi.nlm.nih.gov/pubmed/22475528</ref>, the overall preservation of the life cycle characteristics suggests the existence of a set of preserved core genes. These core genes represent are pivotal elements for the use of comparative genomics on the study of ''Plasmodium'' evolution.  
+
Despite the enormous diversity of ''Plasmodium'' parasites, all studies to date (2016) show conservation of certain genomic characteristics. Fourteen chromosomes, a mitochondrial, and an apicoplast compose the entire repertoire of the ''Plasmodium'' genome in all sequenced species. This conservation in genomic complement is remarkable, especially considering the potential for altering the number of chromosomes without compromising genome size.   As in the case of other parasites, ''Plasmodium'' genomes are relatively small (between 17-28Mb approximately) in comparison to those of the hosts (1Gb for birds; 2-3Gb for mammals), but larger than those of other ''Apicomplexan'' parasites (''Theileria orientalis'' and ''Cryptosporidium parvum'' have genomes of approximately 9Mb) <ref>DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/</ref>. All ''Plasmodium'' species have a complex life cycle involving some kind of vertebrate host and a mosquito vector of the genus ''Anopheles''. Though host and vector preferences different among species within the genus <ref>Sinka ME, Bangs MJ, Manguin S, Rubio-Palis Y, Chareonviriyaphap T, Coetzee M, Mbogo CM, Hemingway J, Patil AP, Temperley WH, Gething PW, Kabaria CW, Burkot TR, Harbach RE, Hay SI. 2012. A global map of dominant malaria vectors. Parasit Vectors. 5:69. https://www.ncbi.nlm.nih.gov/pubmed/22475528</ref>, all ''plasmodium'' species share lifecycle characteristics, which suggests the existence of a set of preserved core genes necessary for them to their lifecycle. These core genes represent are pivotal elements for the use of comparative genomics to study the evolution of ''Plasmodium''.  
  
An increase in funding devoted to malaria research during recent years has come hand in hand with increased understanding of ''Plasmodium'' genetics <ref>Buscaglia CA, Kissinger JC, Agüero F. 2015. Neglected Tropical Diseases in the Post-Genomic Era. Trends Genet. 31:539-55. https://www.ncbi.nlm.nih.gov/pubmed/26450337</ref>. At the moment, there is an unprecedented amount of ''Plasmodium'' genomes and gene sequences publicly available, spread through diverse databases. The most prominent repository is found in NCBI/Genbank <ref>Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2016. GenBank. Nucleic Acids Res. 44: D67–D72. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702903/</ref>; while additional and unique sequences can also be found on other databases:  [http://plasmodb.org/plasmo/ PlasmoDB],  [http://www.genedb.org/Homepage GeneDB] and [http://mbio-serv2.mbioekol.lu.se/Malavi/ MalAvi] <ref>Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ Jr, Treatman C, Wang H. 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37:D539-43. https://www.ncbi.nlm.nih.gov/pubmed/18957442</ref><ref>Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, Carver T, Aslett M, Olsen C, Subramanian S, Phan I, Farris C, Mitra S, Ramasamy G, Wang H, Tivey A, Jackson A, Houston R, Parkhill J, Holden M, Harb OS, Brunk BP, Myler PJ, Roos D, Carrington M, Smith DF, Hertz-Fowler C, Berriman M. 2012. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40:D98-108. https://www.ncbi.nlm.nih.gov/pubmed/22116062</ref><ref>Bensch S, Hellgren O, Pérez-Tris J. 2009. MalAvi: a public database of malaria parasites and related haemosporidians in avian hosts based on mitochondrial cytochrome b lineages. Mol Ecol Resour. 9:1353-8. https://www.ncbi.nlm.nih.gov/pubmed/21564906</ref>. The availability of genomic data from ''Plasmodium'' species opens the possibility to: identify the likely origin of certain traits, specialized phenotypes, and genomic landscapes; track the maintenance of conserved genes across the genus, as well as the rise and loss of genes unique to only a single or a group of closely related species; and infer the potential historical interactions which might have lead to the development of adaptations as well as their putative consequences.
+
An increase in funding devoted to malaria research during recent years has come hand in hand with increased understanding of ''Plasmodium'' genetics <ref>Buscaglia CA, Kissinger JC, Agüero F. 2015. Neglected Tropical Diseases in the Post-Genomic Era. Trends Genet. 31:539-55. https://www.ncbi.nlm.nih.gov/pubmed/26450337</ref>. At the moment, there is an unprecedented amount of ''Plasmodium'' genomes and gene sequences publicly available. The most prominent repository is found in NCBI/Genbank <ref>Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2016. GenBank. Nucleic Acids Res. 44: D67–D72. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702903/</ref>; while additional and unique sequences can also be found on other databases:  [http://plasmodb.org/plasmo/ PlasmoDB],  [http://www.genedb.org/Homepage GeneDB] and [http://mbio-serv2.mbioekol.lu.se/Malavi/ MalAvi] <ref>Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ Jr, Treatman C, Wang H. 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37:D539-43. https://www.ncbi.nlm.nih.gov/pubmed/18957442</ref><ref>Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, Carver T, Aslett M, Olsen C, Subramanian S, Phan I, Farris C, Mitra S, Ramasamy G, Wang H, Tivey A, Jackson A, Houston R, Parkhill J, Holden M, Harb OS, Brunk BP, Myler PJ, Roos D, Carrington M, Smith DF, Hertz-Fowler C, Berriman M. 2012. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40:D98-108. https://www.ncbi.nlm.nih.gov/pubmed/22116062</ref><ref>Bensch S, Hellgren O, Pérez-Tris J. 2009. MalAvi: a public database of malaria parasites and related haemosporidians in avian hosts based on mitochondrial cytochrome b lineages. Mol Ecol Resour. 9:1353-8. https://www.ncbi.nlm.nih.gov/pubmed/21564906</ref>. The availability of genomic data from ''Plasmodium'' species opens the possibility to:  
 +
* identify the likely origin of certain traits, specialized phenotypes, and genomic landscapes
 +
* track the maintenance of conserved genes across the genus, as well as the rise and loss of genes unique to only a single or a group of closely related species
 +
* infer the potential historical interactions which might have lead to the development of adaptations as well as their putative consequences.
  
 
One of the many remarkable trends of ''Plasmodium'' genome evolution is the rapid change in GC content. ''P. falciparum'' and closely related parasites have a remarkably AT rich genome compared to other ''Plasmodium'' species <ref>Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511</ref>. While significant shifts in GC content have been reported in other parts of the tree of life such as ''Bacteria'' <ref>Wu H, Zhang Z, Hu S, Yucorresponding S. 2012. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct. 2012; 7: 2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3274465/</ref><ref>Lassalle F, Périan S, Bataillon T, Nesme X, Duret L, Daubin V. 2015. GC-Content Evolution in Bacterial Genomes: The Biased Gene Conversion Hypothesis Expands. PLoS Genet. 11: e1004941. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4450053/</ref> and monocots <ref>Šmarda P, Bureš P, Horová L, Leitch IJ, Mucina L, Pacini E, Tichý L, Grulich V, Rotreklováa O. 2014. Ecological and evolutionary significance of genomic GC content diversity in monocots. Proc Natl Acad Sci U S A. 111: E4096–E4102. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4191780/</ref>, the short evolutionary time during which this change has occurred in ''Plasmodium'' is noteworthy. Moreover, the GC content variability observed amongst ''Plasmodium'' species has not yet been observed in other ''Apicomplexan'' genera. AT rich genomes not only present challenges for sequencing <ref>Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511</ref>, but they result in entirely different trends of codon and amino acid usage. Furthermore, patterns of genome mutability and in the evolution of repetitive elements can also be markedly different in AT rich genomes. By utilizing various analysis tools for comparative genomics, it is possible to assess the evolutionary origins and trace patterns of GC content shift across the ''Plasmodium'' genus.   
 
One of the many remarkable trends of ''Plasmodium'' genome evolution is the rapid change in GC content. ''P. falciparum'' and closely related parasites have a remarkably AT rich genome compared to other ''Plasmodium'' species <ref>Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511</ref>. While significant shifts in GC content have been reported in other parts of the tree of life such as ''Bacteria'' <ref>Wu H, Zhang Z, Hu S, Yucorresponding S. 2012. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct. 2012; 7: 2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3274465/</ref><ref>Lassalle F, Périan S, Bataillon T, Nesme X, Duret L, Daubin V. 2015. GC-Content Evolution in Bacterial Genomes: The Biased Gene Conversion Hypothesis Expands. PLoS Genet. 11: e1004941. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4450053/</ref> and monocots <ref>Šmarda P, Bureš P, Horová L, Leitch IJ, Mucina L, Pacini E, Tichý L, Grulich V, Rotreklováa O. 2014. Ecological and evolutionary significance of genomic GC content diversity in monocots. Proc Natl Acad Sci U S A. 111: E4096–E4102. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4191780/</ref>, the short evolutionary time during which this change has occurred in ''Plasmodium'' is noteworthy. Moreover, the GC content variability observed amongst ''Plasmodium'' species has not yet been observed in other ''Apicomplexan'' genera. AT rich genomes not only present challenges for sequencing <ref>Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511</ref>, but they result in entirely different trends of codon and amino acid usage. Furthermore, patterns of genome mutability and in the evolution of repetitive elements can also be markedly different in AT rich genomes. By utilizing various analysis tools for comparative genomics, it is possible to assess the evolutionary origins and trace patterns of GC content shift across the ''Plasmodium'' genus.   
Line 28: Line 31:
  
 
In the following paper, we will demonstrate how to use the CoGe platform to analyze genomes and evaluate diverse evolutionary hypotheses.  Through a case study on  ''Plasmodium'' evolution, we will illustrate how CoGe can be used for the analysis of both genes (specifically multigene families) and whole genomes (genome composition, rearrangement events, conservation).
 
In the following paper, we will demonstrate how to use the CoGe platform to analyze genomes and evaluate diverse evolutionary hypotheses.  Through a case study on  ''Plasmodium'' evolution, we will illustrate how CoGe can be used for the analysis of both genes (specifically multigene families) and whole genomes (genome composition, rearrangement events, conservation).
 
  
 
== '''Finding and importing data into CoGe''' ==
 
== '''Finding and importing data into CoGe''' ==

Revision as of 15:38, 30 November 2016

About this Guide

Welcome to the Plasmodium genus genome analysis with CoGe guide. This 'cookbook' style document is meant to provide an introduction to many of our tools and services, and is structured around a case study of investigating genome evolution of the malaria-causing Plasmodium spp. The small size and unique features of this pathogen's genome make it a great example for beginning to understand how our tools can be used to conduct comparative genomic analyses and uncover meaningful discoveries.

Through a number of guided examples, this guide will teach users how to use the following tools:

- Kn/Ks analysis: characterize the evolution of populations of genes
- SPA tool: Syntenic Path Assembly to assist in genome analysis
  • SynFind: Identify syntenic genes across multiple genomes
  • CodeOn: Characterize patterns of codon and animo acid evolution in coding sequence

A brief introduction to Plasmodium genome evolution

The unique features found in many parasitic genomes create unique challenges when using comparative genomics to study their evolution. Parasite genomes are characterized by a mixture of genome reduction associated with gene loss (e.g. homeobox genes), but also for the development of specialized genes. Many of the genes gained in parasitic genomes are involved in different aspects of host-parasite interaction and are, for the most part, species or lineage specific [1]. This dynamic nature of parasitic genomes is especially evident within the phylum Apicomplexa, and particularly within the genus Plasmodium. A marked loss of synteny between different Apicomplexa genera has been previously reported [2], although syntenic relationships between species within a single genus are largely conserved. While this finding remains true for many genera, the increasing number of sequenced Plasmodium genomes has shown that numerous clade and species-specific gain/loss events and chromosome rearrangements have occurred [3]. The exact origins and mechanisms of these rearrangements remains largely unexplored, but they are generally hypothesized to stem from different host shift events [4][5], which have led to diverse types of host-parasite interactions.

Despite the enormous diversity of Plasmodium parasites, all studies to date (2016) show conservation of certain genomic characteristics. Fourteen chromosomes, a mitochondrial, and an apicoplast compose the entire repertoire of the Plasmodium genome in all sequenced species. This conservation in genomic complement is remarkable, especially considering the potential for altering the number of chromosomes without compromising genome size. As in the case of other parasites, Plasmodium genomes are relatively small (between 17-28Mb approximately) in comparison to those of the hosts (1Gb for birds; 2-3Gb for mammals), but larger than those of other Apicomplexan parasites (Theileria orientalis and Cryptosporidium parvum have genomes of approximately 9Mb) [6]. All Plasmodium species have a complex life cycle involving some kind of vertebrate host and a mosquito vector of the genus Anopheles. Though host and vector preferences different among species within the genus [7], all plasmodium species share lifecycle characteristics, which suggests the existence of a set of preserved core genes necessary for them to their lifecycle. These core genes represent are pivotal elements for the use of comparative genomics to study the evolution of Plasmodium.

An increase in funding devoted to malaria research during recent years has come hand in hand with increased understanding of Plasmodium genetics [8]. At the moment, there is an unprecedented amount of Plasmodium genomes and gene sequences publicly available. The most prominent repository is found in NCBI/Genbank [9]; while additional and unique sequences can also be found on other databases: PlasmoDB, GeneDB and MalAvi [10][11][12]. The availability of genomic data from Plasmodium species opens the possibility to:

  • identify the likely origin of certain traits, specialized phenotypes, and genomic landscapes
  • track the maintenance of conserved genes across the genus, as well as the rise and loss of genes unique to only a single or a group of closely related species
  • infer the potential historical interactions which might have lead to the development of adaptations as well as their putative consequences.

One of the many remarkable trends of Plasmodium genome evolution is the rapid change in GC content. P. falciparum and closely related parasites have a remarkably AT rich genome compared to other Plasmodium species [13]. While significant shifts in GC content have been reported in other parts of the tree of life such as Bacteria [14][15] and monocots [16], the short evolutionary time during which this change has occurred in Plasmodium is noteworthy. Moreover, the GC content variability observed amongst Plasmodium species has not yet been observed in other Apicomplexan genera. AT rich genomes not only present challenges for sequencing [17], but they result in entirely different trends of codon and amino acid usage. Furthermore, patterns of genome mutability and in the evolution of repetitive elements can also be markedly different in AT rich genomes. By utilizing various analysis tools for comparative genomics, it is possible to assess the evolutionary origins and trace patterns of GC content shift across the Plasmodium genus.

Another important aspect in Plasmodium evolution is the unique patterns of genome variability and the diverse responses to numerous selective pressures observed in different Plasmodium genomes. In this regard, comparative analyses performed between Plasmodium species and strains can elucidate the key elements behind these differences (e.g. different hosts pressures or an earlier species split), as well as to identify genomic regions and elements where this type of change is more prominent. But perhaps more significantly in Plasmodium evolution, and in that of parasites in general [18], might be the origin and evolution of multigene families. Within the Plasmodium genome, numerous multigene families show specific tracks of gene gain/loss events, and can be associated to variable syntenic changes. Moreover, the differences in the ancestry of these families is also noteworthy, with many of them being observed only in a single Plasmodium species or those which are closely related, and others being observed across the entire genus but not in other Apicomplexa parasites [19]. In this sense, each multigene family can illustrate a different aspect of the evolutionary history of the genus.

In the following paper, we will demonstrate how to use the CoGe platform to analyze genomes and evaluate diverse evolutionary hypotheses. Through a case study on Plasmodium evolution, we will illustrate how CoGe can be used for the analysis of both genes (specifically multigene families) and whole genomes (genome composition, rearrangement events, conservation).

Finding and importing data into CoGe

An increasing number of Plasmodium genomes have been sequenced in recent years; even more, the amount of genomic data available for the genus will continue to increase in upcoming years. Tools that allow the rapid incorporation of genomic information and its subsequent analysis are essential in Plasmodium research. Moreover, online platforms that reduce computational time and costs, and that foment collaboration initiatives worldwide are of particular interest in the study of malaria.

The first step in sequence analysis using CoGe is the import of new sequences to the platform.


Finding about the Plasmodium genomes already present in CoGe

Figure 1. Search bar on top of most CoGe windows

While the amount of Plasmodium genomic data has significantly risen during the past few years, important advances in Plasmodium genomics have been occurring for approximately 20 decades. An extensive amount of historical genomic data can be found on CoGe’s repositories.

One of the most significant accomplishments in the study of Plasmodium genomics, has been the sequencing and assembly of the P. falciparum genome [20]. Subsequent technological improvements have lead to the re-annotation and re-evaluation of this genome. The CoGe platform incorporates new versions of a genome without removing previous ones; thus, you can find the original P. falciparum sequenced genome, as well as posterior re-annotations.

Before importing a genome into CoGe, and to prevent redundancy of genomic information, it is recommended to identify what Plasmodium genomic data has already been incorporated (Figure 1). You can search CoGe’s Plasmodium genomes by typing the word "Plasmodium" into the Search bar at the top of most pages. This will retrieve all organisms and genomes with names matching the search term. Clicking on any organisms will produce the details of the upload. Alternatively, you can find the Tools section on the main CoGe page and click on to Organism View (https://genomevolution.org/coge/OrganismView.pl) to explore CoGe’s Plasmodium genomes

Figure 2. CoGe main page

All publicly available genomes imported into CoGe and their corresponding metadata can be found in the Organism View section (Figure 2). You find any genome on Organism View type the organism's scientific name into the Search box. You will find the following information (Figure 3):

Figure 3. Screen capture of OrganismView
  • Organisms: In the case of Plasmodium spp., the different parasitic strains already imported. Any imported organelle genomes (mitochondrial and apicoplast).
  • Organism Information: provides an outline of the organisms’ taxonomy (as published on NCBI/Genbank). This section also includes links to some of CoGe's main analysis tools.
  • Genomes: All genome versions available for the organism of interest. Note that by selecting different genome versions, all the other associated genomic information also changes. You can select different genome versions in this section.
  • Genome information: Includes genome IDs, type of sequences uploaded and their length. You can also access CoGe's genome analysis tools in this section.
  • Datasets: This section includes the number of datasets for the specified genome. In the case of completely sequenced genomes imported from NCBI/GenBank it will indicate the accession numbers of each chromosome.
  • Dataset information: Provides information for each dataset including: accession numbers (if available), source of the import, chromosome length, and GC%.
  • Chromosomes: Shows the number of chromosome in the selected genome. However, depending of the method used to import the genome into CoGe and the dataset itself, the number and length of the chromosomes will be high (e.g. display of the number of contigs in lieu of the number of chromosomes).
  • Chromosome information: Shows each chromosome's ID and number of base pairs (bp).

You can access a more detailed description of any genome by accessing the Genome Info section within Genome Information. You can also access links to the majority of CoGe’s comparative analysis tools in this section. Keep in mind that genomes imported to CoGe can have a Public or Restricted display. Genomes made public can be seen and analyzed by anyone using the CoGe platform. On the other hand, Restricted genomes can only be seen and/or analyzed by the user that imported them or those with whom the information has been shared with: Sharing_data

Importing Plasmodium genomes into CoGe

While data can be uploaded into CoGe using a variety of methods, we will focus on two of the most likely to be used in the incorporation of Plasmodium genomes. For additional information, please check the following link: How_to_load_genomes_into_CoGe. Depending on your interests and hypotheses, it might be necessary to perform analyses using complete Plasmodium genomes or to focus only in specific organelles and chromosomes. The methods described here can be used to upload either of these types of data:

Figure 4: Screen capture of P. vivax genome's webpage on NCBI
1. Go to the genome database on NCBI/GenBank and type "Plasmodium" on the search box. You can select any genome of interest.
2. Find the Representative Genome section in the upper section of your screen. Below you will find the Download Sequences in FASTA format and Download Genome Annotation sections (Figure 4).
- To download a complete P. vivax genome, click on Genome under Download Sequences in FASTA
- To download a complete annotation for the P. vivax genome, click on GFF under Download Genome Annotation
Alternatively, you can use the RefSeq and INSDC numbers for each chromosome and, if available, of the organelles.
3. Go to CoGe and login. You can follow this link: https://genomevolution.org/coge/
4. Click on the MyData section on the upper left part of the screen. This will lead to the Data section of your personal CoGe page (Figure 5). This section will fill up as genomes of interest are uploaded into CoGe.
5. On the upper left section of the screen, click the NEW button and select New Genome from the dropdown menu.
Figure 5: Screen capture of researcher's CoGe MyData tab
6. On the Create a New Genome window you will input information about the organisms' taxonomy and genome's origin must be entered (Figure 6). Keep in mind that depending on the type of organism being uploaded, taxonomic information might not have been incorporated into CoGe just yet (e.g. a private species of strain). If this is the case, make sure to create a new organism by following these steps:
a. Click on NEW on the "Organism:" section
b. On the Search NCBI box type the scientific name of the organism to be uploaded. If the organism of interest is not on NCBI yet, select its closest taxonomic relative. In the case of Plasmodium, several strains might be available for a given species (particularly P. vivax and P. falciparum), make sure to select the correct strain or, if a new strain is being uploaded, to add the new strain's name.
c. Click Create
Figure 6: Screen capture of Create New Organism window at CoGe. Notice the different name of the selected strain and the one written under "Name"
7. After successfully creating a new strain/genome, is time to include any additional information that might be needed in the future as well. Depending on the number of versions for the selected genome already available at CoGe, a different number will be typed on Version. Thus, it is important to check the latest genome version available on CoGe before importing a new version of the same genome (e.g. P. falciparum currently has 5 versions, so any new version incorporated should be numbered as version 6). Under the Type section, select the adequate sequence type from the drop down menu (most sequences can be identified as unmasked, Masked). Select the Source in the next dropdown menu (in this case the source is NCBI, but other databases as well as Private sources are also available). Finally, tick the check box if you desire your genome to be Restricted. Remember that:
- Restricted genomes can only be seen and analyzed by the user and those with whom the genome has been shared.
- Unrestricted genomes are available to anybody using CoGe.
8. Click Next
9. This new window allows you to import genome files by using four different strategies: first, data can be imported directly from the Cyverse Data Store (if the data is not already on the Data Store it can be easily imported from CoGe afterwards); second, creating an HTP/FTTP link directly to the data; third, Upload the data from a private computer, and fourth, importing the data using GenBank accession numbers.
  • To import genomes using Upload:
a. Select a genome file downloaded from your local computer and wait for it to be read by CoGe, once the process is completed select Next. Note that you should select a FASTA, FST or FAA file.
b. Click Start on the next screen to begin the upload.
c. Once the file upload has concluded all information included by the user, as well as any specifics regarding the FASTA file itself, will be visible in the Genome Information page. Note that genomes in earlier stages of assembly (e.g. Scaffolds) can be easily uploaded into CoGe by this method.
  • To import genomes using NCBI/Genebank:
a. Select the GenBank accession numbers option. Type or Copy/Paste the INSDC numbers for each Plasmodium chromosome (or for specific Plasmodium organelles) and click the Get button. Note that genomes can be uploaded one at the time using this method. Information from each imported genome should appear under Selected file(s). Once all genomes have been imported (14 chromosomes in the case of Plasmodium), click on the Next button.
b. After the genome has been imported, all information included by the user, as well as any specifics regarding the genome FASTA file itself will be visible in the Genome Information page. Note that uploading chromosomes/genomes using this method also imports any information of genome annotation already included in NCBI/GenBank. Also note that genomes uploaded using this method will be unrestricted, and thus, visible to all CoGe users.
Figure 7: Complete genome and annotation upload into CoGe
c. At this point, genome annotation files can be also uploaded into CoGe for this genome. These files can be included by clicking on the green Load Sequence Annotation button under the Sequence & Gene Annotation menu. Note that some analyses can be performed in CoGe even when genome annotation data is not yet available. Also, any specific upload can be updated at any point in time. Thus, genome annotation data, metadata or experimental data can be included for a genome already imported into CoGe as soon as they become available.
10. The process to importing annotations is similar to that of importing genomes. Under the Describe your annotation page, select the version and source of the annotation data and click Next. As previously described, the data can be uploaded directly from the Cyverse Data Store, by creating a HTP/FTTP link, or by using the Upload option. Note that both GFF and GTF files can be used for uploading genome annotation data. Click Next and the annotation data associated to the genome will be imported onto CoGe. This information should now be visible on the Genome Information page under the Sequence & Gene Annotation menu (Figure 7). For more details about uploading genome annotations follow this link: LoadAnnotation

Exporting genomes from CoGe to Cyverse

Data can be exported into Cyverse for easy sharing and storage after it has been imported onto CoGe. While this is not needed to use CoGe or perform any analyses, it is a highly recommended step for complete and Certified genomes (those which represent the latest and most complete version of a given species' genome up to date). You can use CoGe to export data into the CyVerse Data Store by following these steps:
1. While logged into CoGe, go to the Genome Information page on your genome of interest.
2. Under the Tools menu, find the Export to CyVerse Data Store option. Click either on the FASTA or the GFF file options to upload genomic data and its annotation, respectively. Make sure to specify a name for the GFF file before performing the export.
3. Wait until the export is completed. From this point forward, your FASTA and GFF files data will be also found in the CyVerse Data Store. Note that no modification can be performed to the uploaded genomes, so it is recommended to keep a list of the uploaded genome codes that is provided by CyVerse and their associated organism or strain.

Using CoGe tools to perform comparative analyses

Analyzing GC content and other genomic properties (GenomeList)

Figure 8: Upload of 12 Plasmodium genomes to Genome List

There are significant variations on average GC content and GC content distribution between the two main agents of human malaria: P. vivax and P. falciparum. In P. vivax, the average GC content is 42.3% while in P. falciparum is 19.4%. GC poor regions are mostly located on P.vivax’s subtelomeres, but they are widespread across the entire P. falciparum genome [21]. Changes in GC content inside the Plasmodium genus are considered as evidence of genome composition reversal, shifting from an AT rich ancestor to GC rich species [22]. Thanks to the increasing number of fully sequenced Plasmodium genomes we can evaluate the patterns of GC content variation across three of the four main currently described Plasmodium clades.

The CoGe platform can calculate GC content by using the GenomeInfo tool found on Genome Information. By default, GC content will be displayed for genomes imported from GenBank; however, genomes uploaded from private computers, obtained from other databases, or in earlier stages of assembly will not have GC content on display. To calculate GC content, click on %GC under the Length and/or Noncoding sequence sections on the Statistics tab.

You can also compare and contrast GC content (and other genomic features) across several species and/or strains by using GenomeList. This tool creates a list of genomes selected by the user and calculates features such as: amino acid usage, codon usage, CDS GC content, number of genes, and number of introns. GenomeList also summarizes the metadata included by the user during genome import including: sequence type, sequence origin, taxonomy, provenance, version uploaded to CoGe, etc.

Figure 9: Genome List used to compare 12 Plasmodium species. The number of columns on display has been modified
The following steps indicate how to perform comparative analyses using the GenomeList tool in CoGe:

1. Go to: https://genomevolution.org/coge/ and login into CoGe

2. Find the Tools and click on Organism View. You can also follow this link: https://genomevolution.org/coge/OrganismView.pl

3. Type the scientific name of the organism of interest on the Search box and select the desired genome version.

4. Find the Genome Information tile on the right side of the screen. Under Tools, find and click on Add to GenomeList. This will automatically generate a new window indicating that the selected genome has been added.

5. Without closing this window, type the scientific name of other organisms of interest on the Search bar. Once you have selected your other genomes, click on Add to GenomeList (Figure 8).

6. Once you have included all genomes of interest click on the green Send to Genome list button.

7. After a few of seconds, a table including all the selected genomes will appear in a new window . Here you can select and compare the different features of the genomes on your list. Moreover, links to different types of calculations (e.g. amino acid composition, %AT, etc.) are available for each genome. You either select specific genomes for analysis or you can analyze all genome at the same time by simply clicking on the green Get All found below each column's tittle. Depending on the number of genomes included and their quality, calculations might take a couple of minutes. You can also select the columns on display by clicking on the Change Viewable Columns button (Figure 9).

8. You can download information from the genomes in GenomeList using "Send Selected Genomes to". Note that the information downloaded will correspond to the genomes themselves and not to the calculations and analyses performed on GenomeList.


You can follow a link to an example analysis here: https://genomevolution.org/r/lys1

We used GenomeList to compare 12 fully sequenced Plasmodium genomes. Our results show that species closely related to P. falciparum share equally AT rich genomes while GC content steadily increases in more recently divergent clades (simian and rodent); particularly, GC content was highest on the P. vivax, P. cynomolgi and P. knowlesi genomes (divergence estimated to approximately 6-14 Mya). These results support the hypothesis that GC content is undergoing a reversal in the genus Plasmodium. We observed variations on GC content across Plasmodium species infecting humans but not on those infecting rodents, this suggest that GC content is genomic property strongly influenced by evolutionary relations and not by host-environmental pressures.

GC content is thought to have a significant role on genome evolution, and in the development and maintenance of genome variability. Our results indicate that the evolutionary strategies for maintaining genome variability employed by different Plasmodium species infecting humans are largely different [23]. Interestingly, GC content was markedly low on P. malariae (another human parasite) in comparison to other species of the simian clade. This could suggest that P. malariae’s might follow different evolutionary trends than other simian Plasmodium parasites.

Identifying gene homologs (CoGeBlast)

Figure 10: Screen capture of CoGeBlast input window. Genomes of interest and the query sequence are shown

A particular challenge in comparative genomics is the correct identification of multigene family members. In Plasmodium, multigene families perform a wide array of functions and have diverse evolutionary patterns. Many Plasmodium families are arranged in tandem; however, many others are organized in more complex patterns. Families located in the chromosome subtelomeres are commonly associated with parasitic functions such as antigenic variation and immune evasion (var, stevor, rifin in P. falciparum and pir on P. vivax). Subtelomeric families also undergo rapid sequence evolution, making the identification of ortholog/paralog relations a difficult task, [24][25][26][27]. This is specially problematic when family members are scattered across different genome regions and chromosomes.

We will use CoGeBlast to identify multigene family members of the Plasmodium vir family [28][29]. CoGeBlast incorporates visualization into BLAST analyses, and thus facilitates the study of complex evolutionary patterns. There are 313 members on the vir family [30] which, based on sequence similarity, can be grouped into 10 subfamilies [31]. Only fifteen vir genes are shared across all sequenced P. vivax strains. The genetic diversity of these 15 genes is lower than that of other vir family member. Within this group, sequence similarity is highest on PVX_113230 suggesting that this gene might have been the founder of the vir family. [32] As such, PVX_113230 was used as an example to demonstrate the functionality and features of CoGeBlast.

Figure 11: Screen capture of CoGeBlast output. The relative position of hits to the query sequence is shown for the PO1 and Salvador-1 P. vivax strains. https://genomevolution.org/r/mjg3
The following steps show how to use theCoGeBlast tool in the CoGe platform:

1. Go to: https://genomevolution.org/coge/ and login into CoGe.

2. Click on CoGeBlast (Alternatively, you can follow this link: https://genomevolution.org/coge/CoGeBlast.pl).

3. Under Select Target Genomes, type the scientific name of the Organism of interest on the Search box. All organism and genomes with names matching the search term will appear under the Matching Organisms menu. Also, any Notebooks matching the term will appear in a different window named Import List.

4. Select all the organisms of interest by using Crtl+click or Command+click, and click on the green + Add button. The added organisms will appear on the Selected Genomes menu on the right. Alternatively, you can select any of the Notebooks found on Import List, and all genomes included in the Notebook will be automatically selected.

5. Copy the query sequence in FASTA format on the Query Sequence(s) section at the bottom of the screen. If desired, the BLAST analysis itself can be modified by changing the BLAST Parameters (Figure 10).

6. Once the analysis has been completed the output will include: a table showing the number of hits to the query sequence in the analyzed genomes, a graphic depiction of the location of these hits on the genome, and a list showing information for each hit including their similarity index.


You can follow a link to the example analysis here: https://genomevolution.org/r/mjg3

CoGeBlast detected the highly conserved PVX_113230 gene across the evaluated P. vivax strains [33]. Our results suggest that even within relatively conserved family members, the vir superfamily is still highly diverse. We observed a variable number of detected homologs across strains. Mauritania, PO1, and the Salvador-1 showed the largest numbers of identified homologs. Sequence hits on P. vivax PO1 (not included on previous studies) and the Salvador-1 strain were located on the same chromosome regions (Figure 11). As expected, the number of BLAST hits and their location varied largely across P. vivax strains when less conserved vir family members were analyzed using CoGeBlast.

Identifying microsyntenic regions (GEvo)

Large genome rearrangements are not prominent amongst closely related Plasmodium species; however, small rearrangements in specific genome regions portions are common. Plasmodium microsynteny is usually lost in regions of significant evolutionary interest such as those of high recombination frequency or rapid gene turnover. In species of the subgenus Laverania gene order and location is lost in genes involved in parasite-host interaction. Among these, members of the RBL family are essential for successful erythrocyte invasion. Two genes: the reticulocyte-binding-like homologous protein 5 (Rh5) and the cysteine-rich protective antigen (CyRPA), are thought to have originated via horizontal genome transfer (HGT) early on the evolution of the subgenus. Evidence for this event comes from differences in gene tree and species tree topology [34]. We will use the CoGe tool GEvo to evaluate genome properties of the region where Rh5 and CyRPA are located and search evidence of HGT.

Figure 12: GC content is shown in the background (green for GC rich regions and white for AT rich regions). Gene's wobble GC content is shown by a color gradient (low GC content in red, ~50% GC content in yellow, and high GC content in green). Rerun the analysis using this link: https://genomevolution.org/r/m4dq
The following steps show how to use GEvo to analyze microsyntenic regions:

1. Go to: https://genomevolution.org/coge/ and login into CoGe.

2. Click on the GEvo tool on the main CoGe page (Alternatively, you can follow this link: (https://genomevolution.org/coge/GEvo.pl).

3. Each displayed box found under Sequence Submission allows you to select a sequence. You can specify as many as 25 sequences before performing a GEvo analysis. In each box you will find: a drop down menu of sequence databases (CoGe database, NCBI GenBank or Direct Submission), the name of the selected sequence (e.g. gene ID numbers), the length of genome segment to be displayed to the left and right of the sequence, and green button used to specify additional Sequence Options (skip sequence from the analysis, set sequence as reference, set sequence as reverse complement, or mask the sequence). You can import sequences for analysis by entering their gene IDs on the Name: bar. Alternatively, you can select pairs of genes for microsynteny directly from SynMap, either by zooming (SynMap2) or clicking (SynMap Legacy) on specific regions of the SynMap display.

4. Once you have selected your sequences, click on the red Run GEvo button.

5. The GEvo analysis will display the syntenic regions between the compared genome regions. Genes are shown in green at their genome location and syntenic genome are signaled as light colored red bars on top of each genome. You can connect syntenic regions between genomes by clicking on these bars.

6. The GEvo analysis itself can be modified by changing the parameters on the Algorithm tab. Also, you can modify the information of the graphical display by altering the options on the Results Visualization Options tab.


You can follow a link to an example analysis here: https://genomevolution.org/r/m1qw (CoGeBlast) and here https://genomevolution.org/r/m4dq (GEvo).

Figure 13: The analysis shows a region of synteny loss between P. vivax (Salvador-1), P. vivax (PO1) and P. cynomolgi. Poorly sequence segments are shown in orange

We searched Rh5 orthologs in five fully sequenced Laveranian Plasmodium genomes (P. falciparum strains 3D7 and IT, P. reichenowi strains CDC and SY57, and P. gaboni strain SY75) with CoGeBlast. We used CoGeBlast output to perform a microsynteny analysis of the genome region using GEvo. Our results show that microsynteny is largely maintained in the regions surrounding Rh5 and CyRPA. There does not appear to be marked differences in background GC content in the region either. Changes in GC content that do not correspond to background GC content, could suggest HGT event (Figure 12). We modified the GEvo display to show variation on wobble GC content and did not observed any patterns suggesting a HGT event on either Rh5 and CyRPA [35]. It is possible that an HGT event between genomes of similar nucleotide composition might not be detected, and thus additional tests might be required. However, it should be noted that genes expressed during blood parasitic stages and involved on erythrocyte invasion, are expected to be largely affected by hosts’ selective pressure [36]. Thus, differences in gene tree topology could be the results of factors not related to HGT.

We also used GEvo to evaluate regions of synteny loss in more detail. Synteny between P. vivax (Salvador-1 and PO1 strains) and its sister species P. cynomolgi (B-strain) shows an inversion event in P. vivax (Salvador-1). Synteny is maintained between P. cynomolgi and the P. vivax (PO1). A microsynteny analysis on the border regions of the inversion event shows a poorly sequenced region on the P. vivax (Salvador-1) (Figure 13). This suggests that the inversion event observed in P. vivax (Salvador-1) might be the product of a poorly sequenced genome segment.

Performing syntenic analyses between two genomes (SynMap)

One of the most important tools found in the CoGe platform is SynMap. This tool is used to identify syntenic ortholog genes between two genomes and provide a graphical output across the entire genome. Information obtained in SynMap is useful in identifying both highly conserved genome regions and sections where synteny has been loss, as well as to provide a starting point for the analysis of the events leading to loss of synteny (e.g. gene duplication events) and their consequences in genome evolution (e.g. neighboring gene effects on gene expression and transcription). There are two types of information which can be obtained by using SynMap:

Figure 14: SynMap input screen. Genomes for two different species are selected as an example: P. cynomolgi B strain (Organism 1), and P. vivax Salvador-1 strain (Organism 2)
The following steps can be followed to perform comparative analyses using the SynMap tool on CoGe:

1. Go to: https://genomevolution.org/coge/ and login into CoGe

2. On the main CoGe page find the Tools section and click on Organism View (Alternatively, you can also follow this link: https://genomevolution.org/coge/OrganismView.pl)

3. Type the scientific name of a species on the Search box and select the appropriate genome. Then, click on the GenomeInfo link under the Genome Information section.

4. Find the link to the SynMap tool under the Analyze section on Tools.

5. By default, SynMap will allow you to evaluate the synteny of a genome with itself. This can be of used when characterizing a genome or when attempting to identify putative duplication events [37]. Alternatively, two different genomes or two different organisms can be analyzed by using. Genomes for Organism 1 or for Organism 2 can be selected by typing the species scientific name on the Search bar and then selecting the genome. Once you have selected both organisms run the analysis by clicking on Generate SynMap (Figure 14).

6. Once the analysis has been completed, SynMap will output a graphical depiction of the syntenic regions between the two genomes. There are currently two version of SynMap: the default version, SynMap2, allows the user to interact with the analysis and dynamically alter the output (e.g. zoom in into a particular region), and the older version, SynMap Legacy, which provides static images of the analysis. You can exchange between each version after performing the analysis.

7. Specific gene pairs of interest observed in SynMap can be analyzed in more detail in GEvo. The syntenic gene pair can be selected by zooming on the SynMap plot either by clicking on the region of interest on SynMap Legacy or by dragging the mouse over the region on SynMap2. GEvo can then be run for specific gene pairs by double clicking on their syntenic point (SynMap Legacy), or by selecting the point and clicking on Compare in GEvo >>> (SynMap2)


You can follow a link to an example analysis here: https://genomevolution.org/r/lj12 (P. vivax vs. P. cynomolgi), https://genomevolution.org/r/lj1x (P. knowlesi vs. P. cynomolgi), https://genomevolution.org/r/lj1t (P. knowlesi vs. P vivax), https://genomevolution.org/r/lq5x (P. knowlesi vs. P. malariae), https://genomevolution.org/r/lj2b (P. coatneyi vs. P. knowlesi), https://genomevolution.org/r/lq5y (P. coatneyi vs. P. malariae), https://genomevolution.org/r/lq5t (P. ovale vs. P. malariae), https://genomevolution.org/r/lq65 (P. coatneyi vs. P. ovale), and https://genomevolution.org/r/lq5v (P. ovale vs. P. knowlesi).

Figure 15: Independent rearrangement events observed in SynMap2

Identifying syntenic gene pairs

Comparative analyses in Plasmodium can be used to identify the origin and evolution of novel genes, changes in gene position and in gene order between genomes. Significant variations on gene order have the potential to affect neighboring genes in a process known as XXXX. Specifically, gene expression is affected by genome position [38] [39]. In eukaryotes, gene expression and gene regulation is largely dependent on genome location. Co-expression clusters have a significant role in eukaryotic gene regulation [40]. In Plasmodium, there is evidence that certain genes are strictly up-regulated during specific parasite life stages [41], and that up-regulation can be affected by changes in gene order. Tools that can identify syntenic gene pairs across multiple paired genome combinations can be used to pinpoint regions where changes in gene order have occurred. The effect of gene order on gene expression and regulation on Plasmodium could be later evaluated in the laboratory after identifying these areas.

Figure 16: Independent rearrangement events observed in SynMap Legacy

Identifying chromosomal inversions, fusions, fissions and other events between two genomes

For the most part, synteny in the phylum Apicomplexa is lost among genera but conserved in species of the same genus. As a larger number of Plasmodium genomes are made available, it has become evident that several genome rearrangements occurred within the genus. Overall, closely related Plasmodium species have largely syntenic genomes while more divergent species form different Plasmodium clades show numerous rearrangements [42]. SynMap can be used to identify genome rearrangements caused by duplications, inversions, fusion or fission events. Moreover, the origin of many species-specific genomic rearrangements events can be estimated by performing paired SynMap comparisons across different genome sets.

There are two previously reported inversion events on the 3rd and 6th chromosomes between P. vivax, P. cynomolgi and P. knowlesi. We used SynMap to evaluate the three species pairs and detected no inversion events between P. cynomolgi and P. knowlesi. This could suggest that the inversion events reported on chromosomes 3 and 6 might have occurred after the split of P. cynomolgi and P. vivax (approximately between 3.43-3.87 Mya) [43]. However, a detailed analysis of this region using GEvo shows a poorly sequence genome segment on P.vivax (Salvador-1), making possible that the detected inversion event could be an artifact.

SynMap can be used to identify sets of chromosome fusion/fission. Pairwise comparisons between the genomes of four closely related Plasmodium parasites: P. ovale curtisi, P. malariae, P. coatneyi and P. knowlesi show one inversion and two sets of fusion events. The first set of fusions is located on P. malariae’s 5th and 9th chromosome (Figure 15 and 16, red squares); the second, on P. coatneyis 13th and 14th chromosomes (Figure 15 and 16, green squares). The inversion event is located on the central central region of P. malariaes chromosome 4th (Figure 15 and 16, blue circle).

Measuring Kn/Ks values between genomes (SynMap - CodeML analysis tool)

Differences in nucleotide loci will accumulate between two genomes as the result of evolution. The nature of the accumulated changes can be assessed between homologous coding sequences can be evaluated to infer the evolutionary forces at play. Nucleotide changes that do change amino acid sequence are called synonymous and changes that change the amino acid sequence are called non-synonymous. Synonymous substitutions are largely neutral and reflect background evolutionary changes in the genome. Alternatively, non-synonymous substitutions are largely affected by natural selection. As such, the Kn/Ks ratio can be used to determine the role of selection on gene evolution by measuring the rate of efficient changes respect to background changes. Under neutrality it is expected that synonymous and non-synonymous changes will occur at the same rate (Kn/Ks = 1); when non-synonymous substitutions are fixated at a faster rate than synonymous ones Kn/Ks > 1 it indicates positive selection; and when the rate of fixation of amino acid changes is reduced by new amino acid changes being eliminated Kn/Ks < 1 it indicates purifying selection.

The CoGe platform has the unique capability of calculating the Ks, Kn and Kn/Ks ratio on syntenic gene pairs across the genome. Thus Ks, Kn and Kn/Ks analyses generated on CoGe can be used to: determine the role of natural selection in relation to the relative position of syntenic gene pairs, identify genome regions evolving at an accelerated or reduced rates in comparison to the rest of the genome, infer the relative age of genome rearrangement events (e.g. duplications), and establish genome-specific evolutionary trends. In Plasmodium parasites, variation on Ks, Kn and Kn/Ks ratio across species can suggest species- or genus-specific adaptive trends or be the product of parasite-hosts interactions.

Kn/Ks analyses can be performed for two annotated genomes after a SynMap analysis has been completed. The output analysis will modify the Syntenic_dotplot display to represent the distribution of the Ks, Kn or Kn/Ks values across syntenic pairs.

Figure 17: Paired Ks analyses between Plasmodium species of the Laverania subgenus. From right to left: P. gaboni vs. P. reichenowi; P. falciparum vs. P. reichenowi; P. gaboni vs. P. falciparum
The following steps show how to perform Kn/Ks analyses using the CodeML tool available on SynMap:

1. Go to: https://genomevolution.org/coge/ and login into CoGe.

2. Perform a SynMap analysis between two genomes. CoGe has the capacity to store all analyses performed under a users' account, so previously generated SynMap analyses are available for further testing down the line. Note that regardless on their levels of assembly, Ks, Kn, and Kn/Ks ratios can only be calculated for annotated genomes (genomes with imported .gff files).

3. Once SynMap has been generated, find the CodeML tool under the Analysis Options tab at the bottom of the screen. Click on the Calculate syntenic CDS pairs and color dots:________ substitution rates(s) section and select Synonymous (Ks) from the dropdown menu. You can repeat the analyses by selecting the: Non-synonymous (Kn) and (Ks/Kn) options. The display can be modified by choosing a different Color Scheme, specifying the axis default Min Val. or Max Val., or by changing the Log10 Transform. data options.

4. The resulting output will display the distribution of Ks values (or Kn or Ks/Kn) across the syntenic regions between the two evaluated genomes. In addition, the output will include a Histogram of Ks values (or Kn or Ks/Kn). In SynMap2, specific regions/chromosomes can be dynamically selected to view the Ks, Kn or Ks/Kn values in that region.


You can follow a link to an example analysis here: https://genomevolution.org/r/ljhj (Kn, P. reichenowi vs. P. falciparum) and https://genomevolution.org/r/lsz2 (Ks, P. reichenowi vs. P. falciparum); https://genomevolution.org/r/ljhq (Kn, P. reichenowi vs. P. gaboni) and https://genomevolution.org/r/lsyy (Ks, P. reichenowi vs. P. falciparum); https://genomevolution.org/r/ljhl (Kn, P. falciparum vs. P. gaboni) and https://genomevolution.org/r/lsz5 (Ks, P. falciparum vs. P. gaboni).

Figure 18: Paired Kn analyses between Plasmodium species of the Laverania subgenus. From right to left: P. gaboni vs. P. reichenowi; P. falciparum vs. P. reichenowi; P. gaboni vs. P. falciparum

Ks analyses between P. gaboni (SY57) and P. reichenowi (CDC) showed that there are more recent synonymous substitution than between P. gaboni- P. falciparum (3D7) (Figure 17). P. reichenowi and P. falciparum share a more recent common ancestor (divergence time is estimated to approximately 5.28-5.93 Mya [44]), than with P. gaboni [45]. The different Ks rates in P. falciparum and P. reichenowi vs. P. gaboni, suggest that there has been a change in the rate of synonymous substitution after the split of P. reichenowi and P. falciparum. Additionally, Ks values between P. reichenowi - P. falciparum are slightly smaller than seen in P. falciparum - P. gaboni. This suggests that Ks rates have increased in P. reichenowi and that syntenic P. reichenowi genes evolve at a more rapid rate than other Laveranian species.

Non-synonymous (Kn) substitution rates between P. gaboni - P. falciparum and P. gaboni - P. reichenowi are largely similar (Figure 18). This suggests that there is a comparable rate of non-synonymous substitutions after the split of P. reichenowi and P. falciparum from P. gaboni. The recent number of non-synonymous substitutions observed between P. falciparum - P. reichenowi indicates that a significant amount of non-synonymous substitutions are unique for each species. Previous studies have shown that non-synonymous substitution rates are particularly large in a significant number of ortholog sequences between P. reichenowi and P. falciparum; particularly, during blood stages of the parasite's life cycle [46].

Our results support this hypothesis, suggesting that that natural selection and host-parasite interactions have a significant role on the evolution of Laveranian Plasmodium. Nonetheless, our results also show that different Lavernian species have intrinsically different rates of background evolution, which should be also considered in future evolutionary studies, particularly on P. reichenowi.

Identifying sets of syntenic genes amongst several genomes (SynFind)

Figure 19: Screen capture of GEvo analysis using the output from Synfind. Lines connect syntenic regions between members of the SERA multigene family

In comparative genomics, tools that identify the location of syntenic regions can also aid to pinpoint regions with complex evolutionary patterns. Within the genus Plasmodium, genome rearrangements are predominantly found between clades; however, they also occur, albeit to a lesser degree, amongst species found within the same clade. Species-specific gene gain/loss events and changes in gene organization often lead to these small-scale genomic rearrangements. These events are frequently confined to nearby genomic segments; however, genes with the same evolutionary origin can also be found scattered in different parts of the genome. Small-scale genomic rearrangements can be detected with the careful examination of selected genome regions and with powerful bioinformatics tools capable of identifying genes related to a specific query gene in a reference genome. As an example, the point of origin and evolutionary trajectory of small-scale changes in genome organization can be readily detected in multigene families with members arranged in a tandem disposition. We will use the Plasmodium-specific SERA (serine repeat antigen) multigene family to illustrate how CoGe’s tool, SynFind can be used to identify syntenic genes across any number of genomes.

SERA multigene family members are expressed during various stages of the Plasmodium life cycle. Members of this family share the distinguishing attribute of coding proteins with a papain-like cysteine protease motif [47]. One member (SERA-5), expressed during late trophozoite and schizont parasitic stages, has been considered as a promising malaria vaccine target [48]. The evolutionary history of the SERA family is highly dynamic. The family has experienced a significant number of contractions, expansions and rearrangements amongst Plasmodium species. However, it remains to be assessed if this variability is also observed at an intraspecific level. Henceforth, we will assess changes in the SERA multigene family organization between 6 P. vivax strains.

Figure 20: Screen capture of GEvo analysis using Synfind output. Lines connect syntenic regions. Small syntenic fragments are found across intergenic regions
These steps show how to use SynFind to search for syntenic regions associated to particular sets of genes from a reference genome:

1. Go to: https://genomevolution.org/coge/ and login into CoGe.

2. On the main CoGe page find and click the SynFind tools tile (Alternatively, you can follow this link: (https://genomevolution.org/CoGe/SynFind.pl).

3. Type the scientific name of your desired organism on the search bar under the Search tab on the Select Target Genomes section. Organisms and genomes with names matching the search term will be displayed on the Matching Organisms menu.

4. Select all the genomes of interest using Crtl+click or Command+click. After you have selected all genomes of interest click on the green + Add button and the added genomes will appear on the Selected Genomes menu on the right.

5. Type the Name, Annotation or Organisms of interest in the Specify Features section. It is recommended to provide as many specifics for this query as possible; nonetheless, the analysis can be performed even with less specific terms (e.g. it is possible to retrieve the sequences of interest just by typing "sera" on the Name box). Once you have specified the features click on the green Search button.

6. All matches to the search term, and the genome where they have been found, will appear in new menu within the same section. Select all relevant Matches (e.g. all SERA genes) and your reference Genome (e.g. P. falciparum strain 3D7 v5).

7. Click the red Run SynFind button to start the analysis.

8. SynFind will output all syntenic regions to the query sequence found on the reference genome and their Syntenic depth. Using this output, sequences can be further analyzed with any of the tools available on CoGe (generate SynMap dotplots, perform a microsynteny analysis with GEvo, etc.).


You can follow a link to an example analysis here: https://genomevolution.org/r/lszj (SynFind) and https://genomevolution.org/r/lszj (GEvo).

We used Synfind to identify genes evolutionarily related to SERA-5 across 6 P. vivax genomes. We used the output information from Synfind to perform a more detailed analysis of the identified syntenic regions with GEvo. Our results show that there are 12 SERA paralogs in all evaluated P. vivax strains. Specific members of the SERA family are only found in P. vivax and closely related species, suggesting that some paralogs have a recent evolutionary origin [49].We observed a small change on the genomic location and gene order on P. vivax Brazil-1 strain (Figure 19, shown as second from the upper part of the screen). This suggests that there might be some degree of intraspecific variation on the organization of recently duplicated paralogs. Alternatively, is possible that changes in family organization and location might provide a certain evolutionary advantage.

Multigene families are largely characterized by shared family motifs (e.g. a papain-like cysteine protease motif characterizes SERA family members). However, similar motifs can also be found in non-family members. We found that SynFind is capable of identifying genome segments outside the SERA multigene family (Figure 20, shows a conserved motif shared by non member of the SERA family). It is possible that these regions are incorrectly identified as members of the SERA family due to a shared a papain-like cysteine protease motif.

Identifying codon and amino acid substitution frequencies (CodeOn)

Figure 21: Amino acid usage tables in Plasmodium species from the simian clade

The extreme changes in compositional bias observed in Plasmodium parasites have significant effects on determining codon and amino acid usage. Consequently, compositional bias also has an effect on translational accuracy and efficiency. Despite P. falciparum AT rich genome, many highly expressed genes are known to be majorly composed of C-ended codons. This pattern could highlight a certain level of translational selection, where decreasing the energetic costs during infection, by usage of less energetically expensive amino acids, can provide an evolutionary advantage [50]. On the other hand, in the GC rich P. vivax genome, translational selection has been shown to have a small role on codon usage bias [51].

The large number of Plasmodium genomes currently available opens the possibility to measure the effects of composition bias codon and amino acid usage in detail. This information can later be used to explore the effects of codon and amino acid usage on natural selection trends and in other evolutionary forces. We will use CoGe’s tool CodeOn to calculated amino acid usage across different %GC levels, and to determine the number of CDS under different %GC tiers. We will evaluate the role of compositional bias on 6 representant Plasmodium species from three mayor Plasmodium clades (Laveranian, simian and rodent).

Figure 22: Amino acid usage tables in Plasmodium species from the Laveranian subgenus
The following steps indicate how to built amino acid usage tables for any given genome:

1. Go to: https://genomevolution.org/coge/ and login into CoGe.

2. Find the organism and genome of interest in Organism View (https://genomevolution.org/coge/OrganismView.pl).

3. Find the Genome Information section on the right side of the screen. Find the CodeOn tool under Tools and click to start the analysis. After a couple of minutes, the analysis output will be shown in a different tab.

Closely related Plasmodium species showed more similar patterns of %GC and amino acid usage than species from different clades (Figure 21 and Figure 22). Within the simian clade, the number of CDS found on the 45-55% GC was largest on P. vivax, while closely related species had a slightly skewed distribution of CDS to the 40-45% GC tier. Alternatively, the number of CDS with a reduced 20-30% GC was significantly large on Plasmodium species of the Laveranian subgenus. Regardless overall trends, variations on amino acid usage in relation to the genome’s compositional bias where unique to each species.

The pattern of amino acid usage was similar between P. vivax and P. coatneyi; moreover, both species presented different amino acid usage patterns when compared to their respective sister species: P. cynomolgi and P. knowlesi (Figure 21). Nonetheless, genome composition is more similar between both sister species pairs. This result suggests that compositional genome bias might be one of numerous factors influencing amino acid usage bias inside the simian clade. Thus, it’s possible that translational selection, and other evolutionary forces, might have different patterns even among closely related species. In species of the Laveranian subgenus, P. falciparum and P. reichenowi showed similar amino acid usage and number of CDS on low %GC tiers (Figure 22). P. gaboni showed similar %GC patterns but dissimilar trends in amino acid composition. This pattern suggests that compositional bias is a significant factor on determining amino acid usage within the Laveranian subgenus. It is possible that the changes in amino acid usage observed on P. reichenowi and P. falciparum, represent a trait that originated after the split from P. gaboni.

Using Syntenic Path Assembly (SPA) to make analysis of poor or early genome assemblies easier (SynMap - SPA tool)

Figure 23: Syntenic Path Assembly (SPA) window analysis

There are a large number of Plasmodium genomes that remain to be fully sequenced, assembled and annotated. Incomplete genomic data comes from a variety of sources: published genomic information on early assembly stages, partially sequenced genomes, poorly sequenced genome segments, etc. The successful sequencing of 'Plasmodium genomes is a difficult task. However, sequencing projects can be slightly simplified by the use of a reference genome as a guideline for genome assembly. While unassembled and non-annotated genomes can be of use in smaller scale studies (ortholog genes can be identified using BLAST), there are some significant limitations in their use for large-scale comparative genomics.

Figure 24: Syntenic Path Assembly (SPA) of P. inui contigs using P. coatneyi genome as a reference

Tools capable of quickly generating preliminary genome assemblies and finding syntenic orthologs to a reference genome provide a foundation for comparative analyses, even before official assemblies and annotations are made publicly available. CoGe’s tool, the Syntenic_path_assembly (SPA), provides graphically display of syntenic gene pairs between two genomes which can be used to quickly generate a genome assembly based on any selected reference genome. Alternatively, SPA can also be used to correct the orientation of syntenic regions that were annotated using reverse DNA strands. We will use SPA to assemble the P. inui genome (currently on scaffold level) against the assembled P. coatneyi genome.

The following steps shows how to use the SPA tool found in SynMap:

1. Go to: https://genomevolution.org/coge/ and login into CoGe

2. Run a SynMap analysis between an assembled genome and a non-assembled one (this might longer than analyses between two fully assembled genomes).

3. Once SynMap has been generated go to the Display Options tab and find the SPA tool (Figure 23). Select the tool by clicking on the check mark next to: The Syntenic Path Assembly (SPA)?

4. After a few minutes (depending of the number of contigs) the incomplete genome will be assembled using the second genome as a reference.


You can follow a link to an example analysis here: https://genomevolution.org/r/ljen

There are some limitations regarding assembly interpretation using this SPA. First, incomplete genomes will be assembled using the provided genome reference, thus, contigs will be arranged to increase synteny between the incomplete genome and the reference. As a result, using different reference genomes will likely result in different preliminary assemblies. In the case of P. inui, analyses performed using P. coatneyi (a closely related species) or P. falciparum (a species from the Laveranian subgenus) as a reference, will result on significantly different assemblies. In both cases, synteny between the non-assembled genome and the reference will be maximized, even though significant rearrangement events have occurred between P. coatneyi and P. falciparum. Therefore, SPA reference genomes should be selected after consideration of the biological and evolutionary relation between species.

Second, rearrangement events such as inversions or duplications between genomes cannot be identified using SPA. Several contigs can be syntenic to the same region on the reference genome and should not be confused with duplication of genome regions. In addition, contigs sequenced using reverse DNA strands should not be confused with genome inversion. Both scenarios are shown on the P. inui SPA assembly performed using P. coatneyi genome as reference (Figure 24, events are indicated with black circles).

Overall conclusions

The number of available Plasmodium genomes has increased markedly during recent years. This increate of genomic information creates an unprecedented opportunity for the study of the unique qualities observed on Plasmodium genomes and to understand evolutionary patterns shaping this genus. Comparative analyses of Plasmodium genomes with different levels of relation allow for a better understanding of the origin, nature and predominance of these evolutionary forces.

Thanks to worldwide efforts, there has been a large reductions in the number of malaria cases and deaths between 2000 and 2015. By 2015, it was estimated that the number of malaria cases had decreased from 262 million to 214 million, and the number of malaria related deaths from 839,000 to 438,000 [52]. While this is an enormous achievement for malaria treatment and control strategies, there are still numerous aspects which need to be fully understood in the study of malaria and of the Plasmodium parasite itself. Human infectious of P. cynomolgi [53] and P. knowlesi [54] have been reported on SouthEast Asia. Also, various Plasmodium species from the Laveranian subgenus, including P. falciparum strains, have been found in African primates [55][56] suggesting a potential role of wild primates as malaria reservoirs. Both cases illustrate the plasticity of the Plasmodium genome and shown how feeble species barriers and host-specificity can be within the genus. In consequence, molecular studies on Plasmodium would highly benefit from a genus level approach instead of a more limited species-specific one; moreover, the implementation of tools which permit the straightforward assessment of genome levels trends across the genus is imperative. Thus, the use of platforms like CoGe, where genomes can be easily imported, analyzed, visualized and made public represents an essential step in furthering comparative genomes in the genus Plasmodium.

Here we demonstrated how different tools available on CoGe can be used to successfully test a number of hypotheses and patterns relevant in understanding Plasmodium genome evolution. We have also used this platform to further characterize both general and specific genome elements on sequenced Plasmodium species and strains. Regardless, the present study is not without its limitations given the lack of fully sequenced non-mammal Plasmodium species. In order to illustrate a more complete panorama on the complex evolutionary history in this genus, genomes from Plasmodium species ancestral to the Laveranian subgenus will be required. Evolutionary questions such as the origins on the AT richness observed in the Laveranian subgenus, the potential changes in synteny between mammal and non-mammal infecting Plasmodium species, the role of genome elements in the development of host-specificity and in virulence, and the expansion/contraction/origin of multigene families can be more clearly evaluated once these genomes are available. When this time comes, their incorporation into the CoGe platform and consequent analysis using CoGe's tools will aid in the evaluations of these hypothesis. Overall, our results show that the complexities of the Plasmodium genome can be effectively analyzed in CoGe, and that by doing this, more opportunities for furthering our understanding of malaria evolution are opened.


Useful links

Plasmodium Notebooks in CoGe

Link to Notebook for published Plasmodium genome data: https://genomevolution.org/coge/NotebookView.pl?lid=1753
Link to Notebook for published P. falciparum strains: https://genomevolution.org/coge/NotebookView.pl?lid=1758
Link to Notebook for published P. vivax strains: https://genomevolution.org/coge/NotebookView.pl?lid=1760
Link to Notebook for published Plasmodium apicoplast data: https://genomevolution.org/coge/NotebookView.pl?lid=1754
Link to Notebook for published Plasmodium mitochondrion data: https://genomevolution.org/coge/NotebookView.pl?lid=1756

Sample data

Gene sequence used on CoGeBlast analysis (obtained from PlasmoDB):
PVX_113230.1 | Plasmodium vivax Sal-1 | variable surface protein Vir14-related (http://plasmodb.org/plasmo/app/record/gene/PVX_113230)
PVX_003830.1 | Plasmodium vivax Sal-1 | serine-repeat antigen 5 (SERA) (http://plasmodb.org/plasmo/app/record/gene/PVX_003830)
Gene sequences used on CoGeBlast used to inform GEvo analysis (obtained from PlasmoDB):
PF3D7_0424100.1 | Plasmodium falciparum 3D7 | reticulocyte binding protein homologue 5 (http://plasmodb.org/plasmo/app/record/gene/PF3D7_0424100)
PVX_096410.1 | Plasmodium vivax Sal-1 | cysteine repeat modular protein 2, putative (http://plasmodb.org/plasmo/app/record/gene/PVX_096410)

References

  1. Jackson AP. 2015. Preface. The evolution of parasite genomes and the origins of parasitism. Parasitology. 142 Suppl 1:S1-5. https://www.ncbi.nlm.nih.gov/pubmed/25656359
  2. Carlton JM, Perkins SL, Deitsch KW. 2013. Malaria Parasites. Caister Academic Press
  3. Tachibana SI, Sullivan SA, Kawai S, Nakamura S, Kim HR, Goto N, Arisue N, Palacpac NM, Honma H, Yagi M, Tougan T, Katakai Y, Kaneko O, Mita T, Kita K, Yasutomi Y, Sutton PL, Shakhbatyan R, Horii T, Yasunaga T, Barnwell JB, Escalante AA, Carlton JM, Tanabe K. 2012. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat Genet. 44: 1051–1055. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759362/
  4. Prugnolle F, Durand P, Ollomo B, Duval L, Ariey F, Arnathau C, Gonzalez JP, Leroy E, Renaud F. 2011. A Fresh Look at the Origin of Plasmodium falciparum, the Most Malignant Malaria Agent. PLoS Pathog. 7: e1001283. http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1001283
  5. Prugnolle F, Rougeron V, Becquart P, Berry A, Makanga B, Rahola N, Arnathau C, Ngoubangoye B, Menard S, Willaume E, Ayala FJ, Fontenille D, Ollomo B, Durand P, Paupy C, Renaud F. 2013. Diversity, host switching and evolution of Plasmodium vivax infecting African great apes. Proc Natl Acad Sci U S A. 110:8123-8. https://www.ncbi.nlm.nih.gov/pubmed/23637341
  6. DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/
  7. Sinka ME, Bangs MJ, Manguin S, Rubio-Palis Y, Chareonviriyaphap T, Coetzee M, Mbogo CM, Hemingway J, Patil AP, Temperley WH, Gething PW, Kabaria CW, Burkot TR, Harbach RE, Hay SI. 2012. A global map of dominant malaria vectors. Parasit Vectors. 5:69. https://www.ncbi.nlm.nih.gov/pubmed/22475528
  8. Buscaglia CA, Kissinger JC, Agüero F. 2015. Neglected Tropical Diseases in the Post-Genomic Era. Trends Genet. 31:539-55. https://www.ncbi.nlm.nih.gov/pubmed/26450337
  9. Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2016. GenBank. Nucleic Acids Res. 44: D67–D72. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702903/
  10. Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ Jr, Treatman C, Wang H. 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37:D539-43. https://www.ncbi.nlm.nih.gov/pubmed/18957442
  11. Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, Carver T, Aslett M, Olsen C, Subramanian S, Phan I, Farris C, Mitra S, Ramasamy G, Wang H, Tivey A, Jackson A, Houston R, Parkhill J, Holden M, Harb OS, Brunk BP, Myler PJ, Roos D, Carrington M, Smith DF, Hertz-Fowler C, Berriman M. 2012. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40:D98-108. https://www.ncbi.nlm.nih.gov/pubmed/22116062
  12. Bensch S, Hellgren O, Pérez-Tris J. 2009. MalAvi: a public database of malaria parasites and related haemosporidians in avian hosts based on mitochondrial cytochrome b lineages. Mol Ecol Resour. 9:1353-8. https://www.ncbi.nlm.nih.gov/pubmed/21564906
  13. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511
  14. Wu H, Zhang Z, Hu S, Yucorresponding S. 2012. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct. 2012; 7: 2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3274465/
  15. Lassalle F, Périan S, Bataillon T, Nesme X, Duret L, Daubin V. 2015. GC-Content Evolution in Bacterial Genomes: The Biased Gene Conversion Hypothesis Expands. PLoS Genet. 11: e1004941. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4450053/
  16. Šmarda P, Bureš P, Horová L, Leitch IJ, Mucina L, Pacini E, Tichý L, Grulich V, Rotreklováa O. 2014. Ecological and evolutionary significance of genomic GC content diversity in monocots. Proc Natl Acad Sci U S A. 111: E4096–E4102. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4191780/
  17. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511
  18. Jackson AP. 2015. Preface. The evolution of parasite genomes and the origins of parasitism. Parasitology. 142 Suppl 1:S1-5. https://www.ncbi.nlm.nih.gov/pubmed/25656359
  19. DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/
  20. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511
  21. Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, Cheng Q, Coulson RM, Crabb BS, Del Portillo HA, Essien K, Feldblyum TV, Fernandez-Becerra C, Gilson PR, Gueye AH, Guo X, Kang'a S, Kooij TW, Korsinczky M, Meyer EV, Nene V, Paulsen I, White O, Ralph SA, Ren Q, Sargeant TJ, Salzberg SL, Stoeckert CJ, Sullivan SA, Yamamoto MM, Hoffman SL, Wortman JR, Gardner MJ, Galinski MR, Barnwell JW, Fraser-Liggett CM. 2008. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 455:757-63. https://www.ncbi.nlm.nih.gov/pubmed/18843361
  22. Nikbakht H, Xia X, Hickey DA. 2014. The evolution of genomic GC content undergoes a rapid reversal within the genus Plasmodium. Genome. 9:507-511. https://www.ncbi.nlm.nih.gov/pubmed/25633864
  23. Das A, Sharma M, Gupta B, Dash AP. 2009. Plasmodium falciparum and Plasmodium vivax: so similar, yet very different. Parasitol Res. 105:1169-71. https://www.ncbi.nlm.nih.gov/pubmed/19543915
  24. Niang M, Yan Yam X, Preiser PR. 2009. The Plasmodium falciparum STEVOR multigene family mediates antigenic variation of the infected erythrocyte. PLoS Pathog. 5:e1000307. https://www.ncbi.nlm.nih.gov/pubmed/19229319
  25. Witmer K, Schmid CD, Brancucci NM, Luah YH, Preiser PR, Bozdech Z, Voss TS. 2012. Analysis of subtelomeric virulence gene families in Plasmodium falciparum by comparative transcriptional profiling. Mol Microbiol. 84:243-59. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3491689/
  26. Petter M, Bonow I, Klinkert MQ. 2008. Diverse expression patterns of subgroups of the rif multigene family during Plasmodium falciparum gametocytogenesis. PLoS One. 3:e3779. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0003779
  27. Singh V, Gupta P, Pande V. 2014. Revisiting the multigene families: Plasmodium var and vir genes. J Vector Borne Dis. 51:75-81. https://www.ncbi.nlm.nih.gov/pubmed/24947212
  28. Fernandez-Becerra C, Yamamoto MM, Vêncio RZ, Lacerda M, Rosanas-Urgell A, del Portillo HA. 2009. Plasmodium vivax and the importance of the subtelomeric multigene vir superfamily. Trends Parasitol. 2009 25:44-51. https://www.ncbi.nlm.nih.gov/pubmed/19036639
  29. Lopez FJ, Bernabeu M, Fernandez-Becerra C, del Portillo HA. 2013. A new computational approach redefines the subtelomeric vir superfamily of Plasmodium vivax. BMC Genomics. 14:8. https://www.ncbi.nlm.nih.gov/pubmed/?term=A+new+computational+approach+redefines+the+subtelomeric+vir+superfamily+of+Plasmodium+vivax
  30. Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, Cheng Q, Coulson RM, Crabb BS, Del Portillo HA, Essien K, Feldblyum TV, Fernandez-Becerra C, Gilson PR, Gueye AH, Guo X, Kang'a S, Kooij TW, Korsinczky M, Meyer EV, Nene V, Paulsen I, White O, Ralph SA, Ren Q, Sargeant TJ, Salzberg SL, Stoeckert CJ, Sullivan SA, Yamamoto MM, Hoffman SL, Wortman JR, Gardner MJ, Galinski MR, Barnwell JW, Fraser-Liggett CM. 2008. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 455:757-63. https://www.ncbi.nlm.nih.gov/pubmed/18843361
  31. Lopez FJ, Bernabeu M, Fernandez-Becerra C, del Portillo HA. 2013. A new computational approach redefines the subtelomeric vir superfamily of Plasmodium vivax. BMC Genomics. 14:8. https://www.ncbi.nlm.nih.gov/pubmed/?term=A+new+computational+approach+redefines+the+subtelomeric+vir+superfamily+of+Plasmodium+vivax
  32. Neafsey DE, Galinsky K, Jiang RH, Young L, Sykes SM, Saif S, Gujja S, Goldberg JM, Young S, Zeng Q, Chapman SB, Dash AP, Anvikar AR, Sutton PL, Birren BW, Escalante AA, Barnwell JW, Carlton JM. 2012. The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum. Nat Genet. 44:1046-50. https://www.ncbi.nlm.nih.gov/pubmed/22863733
  33. Neafsey DE, Galinsky K, Jiang RH, Young L, Sykes SM, Saif S, Gujja S, Goldberg JM, Young S, Zeng Q, Chapman SB, Dash AP, Anvikar AR, Sutton PL, Birren BW, Escalante AA, Barnwell JW, Carlton JM. 2012. The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum. Nat Genet. 44:1046-50. https://www.ncbi.nlm.nih.gov/pubmed/22863733
  34. Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, Shaw KS, Ayouba A, Peeters M, Speede S, Shaw GM, Bushman FD, Brisson D, Rayner JC, Sharp PM, Hahn BH. 2016. Genomes of cryptic chimpanzee Plasmodium species reveal key evolutionary events leading to human malaria. Nat Commun. 7:11078. https://www.ncbi.nlm.nih.gov/pubmed/27002652
  35. Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, Shaw KS, Ayouba A, Peeters M, Speede S, Shaw GM, Bushman FD, Brisson D, Rayner JC, Sharp PM, Hahn BH. 2016. Genomes of cryptic chimpanzee Plasmodium species reveal key evolutionary events leading to human malaria. Nat Commun. 7:11078. https://www.ncbi.nlm.nih.gov/pubmed/27002652
  36. Forni D, Pontremoli C, Cagliani R, Pozzoli U, Clerici M, Sironi M. 2015. Positive selection underlies the species-specific binding of Plasmodium falciparum RH5 to human basigin. Mol Ecol. 24:4711-22. https://www.ncbi.nlm.nih.gov/pubmed/26302433
  37. Tang H, Lyons E. 2012. Unleashing the Genome of Brassica Rapa. Front Plant Sci. 3: 172. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3408644/
  38. Ghanbarian AT, Hurst LD. 2015. Neighboring Genes Show Correlated Evolution in Gene Expression. Mol Biol Evol. doi: 10.1093/molbev/msv053http://mbe.oxfordjournals.org/content/early/2015/04/01/molbev.msv053.full
  39. De S, Teichmann SA, Babu MM. 2009. The impact of genomic neighborhood on the evolution of human and chimpanzee transcriptome. Genome Res. 19(5): 785–794. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675967/
  40. Michalak P. 2008. Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics. 91:(43–248) http://www.sciencedirect.com/science/article/pii/S0888754307002807
  41. Lanfrancotti A, Bertuccini L, Silvestrini F, Alano P. 2007. Plasmodium falciparum: mRNA co-expression and protein co-localisation of two gene products upregulated in early gametocytes. Exp Parasitol. 116:497-503. https://www.ncbi.nlm.nih.gov/pubmed/17367781
  42. Tachibana SI, Sullivan SA, Kawai S, Nakamura S, Kim HR, Goto N, Arisue N, Palacpac NM, Honma H, Yagi M, Tougan T, Katakai Y, Kaneko O, Mita T, Kita K, Yasutomi Y, Sutton PL, Shakhbatyan R, Horii T, Yasunaga T, Barnwell JB, Escalante AA, Carlton JM, Tanabe K. 2012. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat Genet. 44: 1051–1055. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759362/
  43. Pacheco MA, Reid MJ, Schillaci MA, Lowenberger CA, Galdikas BM, Jones-Engel L, Escalante AA. 2012. The origin of malarial parasites in orangutans. PLoS One. 7:e34990. https://www.ncbi.nlm.nih.gov/pubmed/22536346
  44. Pacheco MA, Reid MJ, Schillaci MA, Lowenberger CA, Galdikas BM, Jones-Engel L, Escalante AA. 2012. The origin of malarial parasites in orangutans. PLoS One. 7:e34990. https://www.ncbi.nlm.nih.gov/pubmed/22536346
  45. Rayner JC, Liu W, Peeters M, Sharp PM, Hahn BH. 2011. A plethora of Plasmodium species in wild apes: a source of human infection? Trends Parasitol. 27:222-9. https://www.ncbi.nlm.nih.gov/pubmed/21354860?dopt=Abstract&holding=npg
  46. Otto TD, Rayner JC, Böhme U, Pain A, Spottiswoode N, Sanders M, Quail M, Ollomo B, Renaud F, Thomas AW, Prugnolle F, Conway DJ, Newbold C, Berriman M. 2014. Genome sequencing of chimpanzee malaria parasites reveals possible pathways of adaptation to human hosts. Nat Commun. 5:4754. https://www.ncbi.nlm.nih.gov/pubmed/25203297
  47. Arisue N, Kawai S, Hirai M, Palacpac NM, Jia M, Kaneko A, Tanabe K, Horii T. 2011. Clues to Evolution of the SERA Multigene Family in 18 Plasmodium Species. PLoS One. 6: e17775. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017775
  48. Arisue N, Hirai M, Arai M, Matsuoka H, Horii T. 2007. Phylogeny and evolution of the SERA multigene family in the genus Plasmodium. J Mol Evol. 65:82-91. http://link.springer.com/article/10.1007%2Fs00239-006-0253-1
  49. Arisue N, Kawai S, Hirai M, Palacpac NM, Jia M, Kaneko A, Tanabe K, Horii T. 2011. Clues to Evolution of the SERA Multigene Family in 18 Plasmodium Species. PLoS One. 6: e17775. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017775
  50. Peixoto L, Fernández V, Musto H. 2004. The effect of expression levels on codon usage in Plasmodium falciparum. Parasitology. 128:245-51. https://www.ncbi.nlm.nih.gov/pubmed/15074874
  51. Yadav MK, Swati D. 2012. Comparative genome analysis of six malarial parasites using codon usage bias based tools. Bioinformation. 8:1230-9. https://www.ncbi.nlm.nih.gov/pubmed/23275725
  52. World Health Organization. (2015). World Malaria Report 2015. Retrieved from http://www.who.int/malaria/publications/world-malaria-report-2015/report/en/
  53. Ta TH, Hisam S, Lanza M, Jiram AI, Ismail N, Rubio JM. 2014. First case of a naturally acquired human infection with Plasmodium cynomolgi. Malar J. 13: 68. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3937822/
  54. Singh B, Daneshvar C. 2013. Human infections and detection of Plasmodium knowlesi. Clin Microbiol Rev. 26:165-84. https://www.ncbi.nlm.nih.gov/pubmed/23554413
  55. Prugnolle F, Durand P, Neel C, Ollomo B, Ayala FJ, Arnathau C, Etienne L, Mpoudi-Ngole E, Nkoghe D, Leroy E, Delaporte E, Peeters M, Renaud F. 2010. African great apes are natural hosts of multiple related malaria species, including Plasmodium falciparum. Proc Natl Acad Sci U S A. 107:1458-63. https://www.ncbi.nlm.nih.gov/pubmed/20133889
  56. Duval L, Fourment M, Nerrienet E, Rousset D, Sadeuh SA, Goodman SM, Andriaholinirina NV, Randrianarivelojosia M, Paul RE, Robert V, Ayala FJ, Ariey F. 2010. African apes as reservoirs of Plasmodium falciparum and the origin and diversification of the Laverania subgenus. Proc Natl Acad Sci U S A. 107:10561-6. https://www.ncbi.nlm.nih.gov/pubmed/20498054