Difference between revisions of "Using CoGe for the analysis of Plasmodium spp"

From CoGepedia
Jump to: navigation, search
(About this guide)
 
(44 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
=='''About this guide'''==
 
=='''About this guide'''==
This 'cookbook' style document is meant to provide an introduction to many of our tools and services, and is structured around a case study of investigating genome evolution of the malaria-causing ''Plasmodium'' spp. The small size and unique features of this pathogen's genome make it ideal for beginning to understand how our tools can be used to conduct comparative genomic analyses and uncover meaningful discoveries.  
+
This 'cookbook' style document is meant to provide an introduction to many of our tools and services and is structured around a case study of investigating genome evolution of the malaria-causing ''Plasmodium'' spp. The small size and unique features of this pathogen's genome make it ideal for beginning to understand how our tools can be used to conduct comparative genomic analyses and uncover meaningful discoveries.  
  
 
Through a number of example analyses, this guide will teach users about the following tools:
 
Through a number of example analyses, this guide will teach users about the following tools:
Line 10: Line 10:
 
* '''[[GEvo]]''': Microsynteny analysis.
 
* '''[[GEvo]]''': Microsynteny analysis.
 
* '''[[SynMap]]''': Whole genome syntenic analysis.
 
* '''[[SynMap]]''': Whole genome syntenic analysis.
:- '''[[Calculating and displaying synonymous/non-synonymous (Ks, Kn) data]]''': Characterize the evolution of populations of genes.
+
:- '''[[SynMap#Calculating_and_displaying_synonymous.2Fnon-synonymous_.28Ks.2C_Kn.29_data]]''': Characterize the evolution of populations of genes.
 
:- '''[[SPA]]''' tool: Syntenic Path Assembly to assist in genome analysis.
 
:- '''[[SPA]]''' tool: Syntenic Path Assembly to assist in genome analysis.
 
* '''[[SynFind]]''': Identify syntenic genes across multiple genomes.
 
* '''[[SynFind]]''': Identify syntenic genes across multiple genomes.
 
* '''[[CodeOn]]''': Characterize patterns of codon and amino acid evolution in coding sequence.
 
* '''[[CodeOn]]''': Characterize patterns of codon and amino acid evolution in coding sequence.
 +
 +
 +
<span style="color:#006F00">'''FOLLOW THIS LINK FOR A QUICK OVERVIEW OF [[Plasmodia comparative genomics]] WITH COGE.'''</span>
 +
  
 
=='''A brief introduction to ''Plasmodium'' genome evolution'''==
 
=='''A brief introduction to ''Plasmodium'' genome evolution'''==
  
The study of parasitic genomes via comparative genomics offers many unique challenges. Parasite genomes are characterized by a combination of gene loss and the acquisition of species- or lineage-specific genes; in particular, many specialized genes mediate host–parasite interaction <ref>Jackson AP. 2015. Preface. The evolution of parasite genomes and the origins of parasitism. Parasitology. 142 Suppl 1:S1-5. https://www.ncbi.nlm.nih.gov/pubmed/25656359</ref>. The dynamic nature of parasitic genomes is particularly evident within the genus ''Plasmodium''. The genus emerged ~40 million years ago and harbors roughly 200 species of parasitic protozoa better known as malaria parasites. All ''Plasmodium'' species have a complex life cycle involving some kind of vertebrate host and a mosquito vector of the genus ''Anopheles'' (mammals) or ''Culex'' (birds). In addition, ''Plasmodium'' species share similar life cycle characteristics, albeit with a few exceptions (''e.g.'' hypnozoites). However, host and vector preferences differ among ''Plasmodium'' species <ref>Sinka ME, Bangs MJ, Manguin S, Rubio-Palis Y, Chareonviriyaphap T, Coetzee M, Mbogo CM, Hemingway J, Patil AP, Temperley WH, Gething PW, Kabaria CW, Burkot TR, Harbach RE, Hay SI. 2012. A global map of dominant malaria vectors. Parasit Vectors. 5:69. https://www.ncbi.nlm.nih.gov/pubmed/22475528</ref>.  
+
The genus ''Plasmodium'' emerged ~40 million years ago and harbors roughly 200 species of parasitic protozoa better known as malaria parasites. All ''Plasmodium'' species have a complex life cycle involving some kind of vertebrate host and a mosquito vector. In addition, ''Plasmodium'' species share similar life cycle characteristics, albeit with a few exceptions (''e.g.'' hypnozoites). ''Plasmodium'' genomes are tiny (between 17-28Mb) in comparison to those of their vertebrate (1Gb for birds; 2-3Gb for mammals) and mosquito (230–284Mbp) hosts <ref>DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/</ref>. All ''Plasmodium'' genomes consist of fourteen chromosomes (nuclear genome), as well as a mitochondrial and apicoplast genome. Despite these shared genomic characteristics, the structural organization, gene content, and sequence of ''Plasmodium'' genomes is highly variably within the genus <ref>Carlton JM, Perkins SL, Deitsch KW. 2013. '''''Malaria Parasites'''''. Caister Academic Press</ref>. The exact origins and mechanisms of these differences remain largely unexplored, however, they are generally hypothesized to stem from host shift events <ref>Prugnolle F, Durand P, Ollomo B, Duval L, Ariey F, Arnathau C, Gonzalez JP, Leroy E, Renaud F. 2011. A Fresh Look at the Origin of Plasmodium falciparum, the Most Malignant Malaria Agent. PLoS Pathog. 7: e1001283. http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1001283</ref><ref>Prugnolle F, Rougeron V, Becquart P, Berry A, Makanga B, Rahola N, Arnathau C, Ngoubangoye B, Menard S, Willaume E, Ayala FJ, Fontenille D, Ollomo B, Durand P, Paupy C, Renaud F. 2013. Diversity, host switching and evolution of Plasmodium vivax infecting African great apes. Proc Natl Acad Sci U S A. 110:8123-8. https://www.ncbi.nlm.nih.gov/pubmed/23637341</ref>.
  
''Plasmodium'' genomes are tiny (between 17-28Mb) in comparison to those of their vertebrate (1Gb for birds; 2-3Gb for mammals) and mosquito (230–284Mbp) hosts <ref>DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/</ref>. All ''Plasmodium'' genomes consist of fourteen chromosomes (nuclear genome), as well as a mitochondrial and apicoplast genome. Despite these shared genomic characteristics, the structural organization, gene content, and sequence of ''Plasmodium'' genomes is highly variably within the genus <ref>Carlton JM, Perkins SL, Deitsch KW. 2013. '''''Malaria Parasites'''''. Caister Academic Press</ref>. The exact origins and mechanisms of these differences remain largely unexplored, however, they are generally hypothesized to stem from host shift events <ref>Prugnolle F, Durand P, Ollomo B, Duval L, Ariey F, Arnathau C, Gonzalez JP, Leroy E, Renaud F. 2011. A Fresh Look at the Origin of Plasmodium falciparum, the Most Malignant Malaria Agent. PLoS Pathog. 7: e1001283. http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1001283</ref><ref>Prugnolle F, Rougeron V, Becquart P, Berry A, Makanga B, Rahola N, Arnathau C, Ngoubangoye B, Menard S, Willaume E, Ayala FJ, Fontenille D, Ollomo B, Durand P, Paupy C, Renaud F. 2013. Diversity, host switching and evolution of Plasmodium vivax infecting African great apes. Proc Natl Acad Sci U S A. 110:8123-8. https://www.ncbi.nlm.nih.gov/pubmed/23637341</ref>.
+
An increase in funding devoted to malaria research has coincided with a dramatic increase in publicly available genomic information for ''Plasmodium'' <ref>Buscaglia CA, Kissinger JC, Agüero F. 2015. Neglected Tropical Diseases in the Post-Genomic Era. Trends Genet. 31:539-55. https://www.ncbi.nlm.nih.gov/pubmed/26450337</ref>. The most prominent repository is found at NCBI/Genbank <ref>Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2016. GenBank. Nucleic Acids Res. 44: D67–D72. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702903/</ref>; while additional and unique sequences can also be found on other databases:  [http://plasmodb.org/plasmo/ PlasmoDB] <ref>Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ Jr, Treatman C, Wang H. 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37:D539-43. https://www.ncbi.nlm.nih.gov/pubmed/18957442</ref>, [http://www.genedb.org/Homepage GeneDB] <ref>Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, Carver T, Aslett M, Olsen C, Subramanian S, Phan I, Farris C, Mitra S, Ramasamy G, Wang H, Tivey A, Jackson A, Houston R, Parkhill J, Holden M, Harb OS, Brunk BP, Myler PJ, Roos D, Carrington M, Smith DF, Hertz-Fowler C, Berriman M. 2012. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40:D98-108. https://www.ncbi.nlm.nih.gov/pubmed/22116062</ref>, and [http://mbio-serv2.mbioekol.lu.se/Malavi/ MalAvi] <ref>Bensch S, Hellgren O, Pérez-Tris J. 2009. MalAvi: a public database of malaria parasites and related haemosporidian in avian hosts based on mitochondrial cytochrome b lineages. Mol Ecol Resour. 9:1353-8. https://www.ncbi.nlm.nih.gov/pubmed/21564906</ref>. This wealth of genomic data facilitates detailed comparative genomic approaches, opening the possibility to:  
 
+
An increase in funding devoted to malaria research has coincided with a dramatic increase in publicly available genomic information for ''Plasmodium'' <ref>Buscaglia CA, Kissinger JC, Agüero F. 2015. Neglected Tropical Diseases in the Post-Genomic Era. Trends Genet. 31:539-55. https://www.ncbi.nlm.nih.gov/pubmed/26450337</ref>. The most prominent repository is found in NCBI/Genbank <ref>Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2016. GenBank. Nucleic Acids Res. 44: D67–D72. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702903/</ref>; while additional and unique sequences can also be found on other databases:  [http://plasmodb.org/plasmo/ PlasmoDB] <ref>Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ Jr, Treatman C, Wang H. 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37:D539-43. https://www.ncbi.nlm.nih.gov/pubmed/18957442</ref>, [http://www.genedb.org/Homepage GeneDB] <ref>Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, Carver T, Aslett M, Olsen C, Subramanian S, Phan I, Farris C, Mitra S, Ramasamy G, Wang H, Tivey A, Jackson A, Houston R, Parkhill J, Holden M, Harb OS, Brunk BP, Myler PJ, Roos D, Carrington M, Smith DF, Hertz-Fowler C, Berriman M. 2012. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40:D98-108. https://www.ncbi.nlm.nih.gov/pubmed/22116062</ref>, and [http://mbio-serv2.mbioekol.lu.se/Malavi/ MalAvi] <ref>Bensch S, Hellgren O, Pérez-Tris J. 2009. MalAvi: a public database of malaria parasites and related haemosporidian in avian hosts based on mitochondrial cytochrome b lineages. Mol Ecol Resour. 9:1353-8. https://www.ncbi.nlm.nih.gov/pubmed/21564906</ref>. This wealth of genomic data facilitates detailed comparative genomic approaches, opening the possibility to:  
+
 
* Infer origins of certain traits, specialized phenotypes, and genomic features.
 
* Infer origins of certain traits, specialized phenotypes, and genomic features.
 
* Track the maintenance of conserved genes across the genus, as well as the gain or loss of genes unique to a single species or a group of closely related species.
 
* Track the maintenance of conserved genes across the genus, as well as the gain or loss of genes unique to a single species or a group of closely related species.
 
* Identify the potential historical interactions that might have lead to the development of genomic adaptations.
 
* Identify the potential historical interactions that might have lead to the development of genomic adaptations.
  
One of the many remarkable trends of ''Plasmodium'' genome evolution is the rapid change in GC content. ''Plasmodium falciparum'' and closely related parasites have a remarkably AT-rich genome compared to other ''Plasmodium'' species <ref>Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511</ref>. While significant shifts in GC content have been reported in other parts of the tree of life such as ''Bacteria'' <ref>Wu H, Zhang Z, Hu S, Yu S. 2012. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct. 2012; 7: 2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3274465/</ref><ref>Lassalle F, Périan S, Bataillon T, Nesme X, Duret L, Daubin V. 2015. GC-Content Evolution in Bacterial Genomes: The Biased Gene Conversion Hypothesis Expands. PLoS Genet. 11: e1004941. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4450053/</ref> and monocots <ref>Šmarda P, Bureš P, Horová L, Leitch IJ, Mucina L, Pacini E, Tichý L, Grulich V, Rotreklováa O. 2014. Ecological and evolutionary significance of genomic GC content diversity in monocots. Proc Natl Acad Sci U S A. 111: E4096–E4102. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4191780/</ref>, the short evolutionary time during which this change has occurred in ''Plasmodium'' is noteworthy. Moreover, the GC content variability observed amongst ''Plasmodium'' species has not yet been observed within other closely related genera. AT-rich genomes are not only challenging to sequence relative to GC-rich genomes<ref>Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511</ref>, but also differ in codon usage, patterns of genome mutability, and evolution of repetitive elements. A comparative genomic approach makes it possible to assess the evolutionary origins and trace the patterns of GC content shift across the ''Plasmodium'' genus. 
 
  
Perhaps one of the most significant aspects of ''Plasmodium'' evolution, and of parasites in general <ref>Jackson AP. 2015. Preface. The evolution of parasite genomes and the origins of parasitism. Parasitology. 142 Suppl 1:S1-5. https://www.ncbi.nlm.nih.gov/pubmed/25656359</ref>, are the evolution of multigene families. Within ''Plasmodium'' numerous multigene families show specific patterns of gene gain/loss. The differences in the ancestry of these families are also noteworthy, with many gain/loss events being observed either in a single species <ref>DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/</ref>, a single clade, or the entire ''Plasmodium'' genus. In this sense, each multigene family illustrates a different aspect of the evolutionary history of ''Plasmodium'' and its adaptation to different hosts and vectors.
+
== '''Finding and integrating Plasmodium genomes in CoGe ''' ==
  
Through a case study on  ''Plasmodium'' evolution, we will illustrate how CoGe can be used for the analysis of multigene families, local synteny, and whole genome comparisons (genome composition, rearrangement events, and gene order conservation).
+
You can find the details of ''Plasmodium'' spp. genome integration in the following link: [[Finding and intregating Plasmodium genomes to CoGe]]
  
== '''Finding and integrating genomes in CoGe ''' ==
 
  
An increasing number of ''Plasmodium'' genomes have been sequenced in recent years, a number that will likely increase in the future. Thus, tools that permit rapid integration of genomic information and its subsequent analysis are essential for ''Plasmodium'' research. Online platforms aid in reducing computational time, costs, and foment worldwide collaborations. CoGe is one of these platforms.
+
=='''Comparative analyses workflows'''==
  
The first step in analyzing ''Plasmodium'' genomes with CoGe is determining which genomes are already included in the data repository.
+
The following links direct to specific tools for the comparative analysis of ''Plasmodium'' genomes:
  
=== ''Finding the Plasmodium genomes already present in CoGe'' ===
+
[[Plasmodium analysis workflow 1: Tools that evaluate genomic properties and amino acid usage]]
  
[[File:Screen Shot 2016-09-29 at 1.43.09 PM.png|thumb|250px|'''Figure 1.''' ''Search'' bar on top of most CoGe windows]]
+
[[Plasmodium analysis workflow 2: Tools for the syntenic analysis of whole genomes and microsyntenic regions]]
  
A significant accomplishment in the study of ''Plasmodium'' genomics was the full sequencing and assembly of the ''P. falciparum'' genome <ref>Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite ''Plasmodium falciparum''. Nature. 419:498-511</ref>. Over the years, this genome has been revised and re-annotated, resulting in different "releases", or versions of the ''P. falciparum'' genome. CoGe’s repositories contain each of these releases with a unique version identifier (i.e., v5, v4, etc). This happens because the CoGe platform incorporates new versions of a genome without deleting previous ones. Thus, you can find the initial ''P. falciparum'' sequenced genome loaded onto CoGe (v3) alongside the more current releases (v5).
+
[[Plasmodium analysis workflow 3: Tools useful on the study of multigene families]]
 
+
Before importing a genome into CoGe, and to prevent redundancy of genomic information, it is recommended to identify what data has previously been imported. You can search CoGe’s ''Plasmodium'' genomes by typing the word "Plasmodium" into the ''Search'' bar at the top of most pages ('''Figure 1'''). This will retrieve all organisms and genomes with names matching the search term. For instance, when searching the term "plasmodium falciparum 3D7", you will see that there are currently eight publicly available genomes associated with this specific strain of ''P. falciparum''.  Clicking on any organism will produce the details of the upload. Alternatively, you can find the '''Tools''' section on the main CoGe page ('''Figure 2''') and click on '''OrganismView''' (https://genomevolution.org/coge/OrganismView.pl). 
+
 
+
[[File:Newcoge.png|thumb|250px|'''Figure 2.''' CoGe main page]]
+
 
+
All publicly available genomes imported into CoGe, and their corresponding metadata, can be found in '''OrganismView'''. To search for any genome on '''OrganismView''', type a scientific name into the ''Search'' box. The following information will be displayed ('''Figure 3'''):
+
 
+
[[File:Intro2.png|thumb|250px|'''Figure 3.''' Screen capture of '''OrganismView''']]
+
 
+
:* '''Organisms''': In the case of ''Plasmodium'' spp., the different parasitic strains are already imported. In addition, organellar genomes (mitochondrial and apicoplast) have also been imported.
+
:* '''Organism Information''': An outline of the organism's taxonomy (as published on NCBI/Genbank). This section also includes links to some of CoGe's main analysis tools. 
+
 
+
:* '''Genomes''': All genome versions available. Note that by selecting different genome versions, all associated genomic information changes.
+
:* '''Genome information''': Includes genome IDs, type of sequences uploaded, and sequence length. You can also access CoGe's genome analysis tools in this section.
+
 
+
:* '''Datasets''': This section includes the number of datasets for the specified genome. In the case of completely sequenced genomes imported from NCBI/GenBank, it will indicate the chromosome’s accession numbers. 
+
:* '''Dataset information''': Provides information for each dataset including accession numbers (if available), the source of the import, chromosome length, and GC%.
+
 
+
:* '''Chromosomes''': Shows the number of chromosome in the selected genome. However, depending on the method used to import the genome into CoGe and the dataset itself, the number and length of the chromosomes will vary.
+
:* '''Chromosome information''': Shows each chromosome's ID and lenght on base pairs (bp).
+
 
+
You can find a more detailed description of any genome by accessing the '''Genome Info''' section within '''Genome Information'''. You can also access links to the majority of CoGe’s comparative analysis tools in this section. <span style="color:green">Keep in mind that genomes imported to CoGe can be made “Public” or “Restricted”. Genomes made “Public” can be seen and analyzed by anyone using the CoGe platform. “Restricted” genomes can only be seen and/or analyzed by the user and shared accounts ('''[[Sharing_data]]''').</span>
+
 
+
=== ''Importing Plasmodium genomes into CoGe'' ===
+
 
+
If a genome is not found on CoGe's repository then it must be imported before analysis. Genomic data can be imported into CoGe using a variety of methods. We will focus on the two methods most likely to be used when importing genomes. For additional information about other methods please see '''[[How_to_load_genomes_into_CoGe]]'''. Depending on your intended analyses, you might want to use a complete ''Plasmodium'' genome, a specific chromosome, or focus on an organelle. The methods described here can be used to upload either of these data. To import a genome onto CoGe follow these steps:
+
 
+
[[File:PVXgenomeNCBI.png|thumb|250px|'''Figure 4.''' ''P. vivax'' genome's page on NCBI.]]
+
 
+
:'''1.'''    Go to the genome database on NCBI/GenBank (or your favorite database) and type "Plasmodium" in the search box.
+
 
+
:'''2.'''    In the '''Representative Genome''' section you will find links to ''Download Sequences in FASTA'' format and ''Download Genome Annotation'' ('''Figure 4'''). 
+
::- To download a complete ''Plasmodium'' genome click on '''Genome''' under ''Download Sequences in FASTA''.
+
::- To download a complete annotation for a ''Plasmodium'' genome click on '''GFF''' under ''Download Genome Annotation''.
+
 
+
:You can also download single chromosomes and, if available, organellar genomes by clicking on their respective '''RefSeq''' or '''INSDC''' numbers.
+
 
+
:'''3.'''    Go to CoGe and log in. You can follow this link: https://genomevolution.org/coge/
+
 
+
:'''4.'''    Click on '''MyData''' to reach the ''Data'' section of your personal CoGe page ('''Figure 5'''). This section will fill up as you import genomes and load '''[[Experiments]]''' into CoGe.
+
 
+
:'''5.'''    Click on '''NEW''' and select ''New Genome'' from the dropdown menu.
+
 
+
[[File:MyDatasectiononCoGe.png|thumb|250px|'''Figure 5.''' MyData tab in CoGe.]]
+
 
+
:'''6.'''    Input information about the organism's taxonomy and the genome's source on the '''Create a New Genome''' window ('''Figure 6'''). Consider that taxonomic information for that genome might not have been incorporated into CoGe yet. If this is the case, follow these steps to create a "new organism":
+
 
+
:::'''a.'''    Click on '''NEW''' on the "'''Organism:'''" section.
+
:::'''b.'''    Type the scientific name of the organism to be imported on the ''Search NCBI'' box. If the organism does not show up select its closest taxonomic relative. In the case of ''Plasmodium'', several strains might be available for a given species (particularly ''P. vivax'' and ''P. falciparum''). Make sure to select the correct strain or, if a new strain is being imported, to add its name.
+
:::'''c.'''    Click '''Create'''.
+
 
+
[[File:NamingnewstraininCOGE.png|thumb|250px|'''Figure 6.''' CoGe’s ''Create New Organism'' window. Notice the difference between the name of the selected strain and the one under "'''Name'''".]]
+
 
+
:'''7.'''    After creating a new strain/genome, you must also include the import’s metadata. Type the import's genome version in '''Version''' after confirming which genome versions are available on CoGe. If this if the first genome imported, the version number should be “1”. Select the sequence type from the dropdown menu on the '''Type''' section. Most sequences can be identified as unmasked (check this wiki’s '''[[Masked]]''' section for further details). Select the '''Source''' in the next dropdown menu (in this case NCBI). Finally, tick the check box if you desire your genome to be '''Restricted'''.
+
 
+
:'''8.'''    Click '''Next'''.
+
 
+
:'''9.'''    Genome files can be imported to CoGe using four different strategies: 1) import directly from the '''CyVerse Data Store'''; 2) create a direct '''HTP/FTTP''' link to the data; 3) import the files from a private computer using '''Upload'''; and 4) use '''GenBank''' accession numbers.
+
 
+
:*To import genomes using '''Upload''':
+
 
+
:::'''a.'''    Select a genome file from your local computer and wait for it to be read by CoGe. Once the process is completed select '''Next'''.
+
 
+
:::'''b.'''    Click '''Start''' to begin the import.
+
 
+
:::'''c.'''    When the import has concluded, the file’s metadata will be visible in the '''Genome Information''' page.
+
 
+
[[File:Completeuplatedgenomeandannotation.png|thumb|250px|'''Figure 7.''' Complete genome and annotation upload.]]
+
 
+
:::'''d.'''    To import annotation data click on '''Load Sequence Annotation''' under the '''Sequence & Gene Annotation''' menu. Note that any upload can be updated at any point. Thus, genome annotations or experimental data can be added later to any genome already in CoGe.
+
 
+
:::'''e.'''    In the '''Describe your annotation''' page, select the version and source of the annotation data and click '''Next'''. The data can be uploaded from the '''CyVerse Data Store''', by creating a '''HTTP/FTP''' link, or by using the '''Upload''' option. Once concluded, the genome annotation should be visible on the '''Genome Information''' page under the '''Sequence & Gene Annotation''' menu ('''Figure 7'''). For more details about uploading genome annotations please check '''[[LoadAnnotation]]'''.
+
 
+
:*To import genomes using '''NCBI/Genebank''':
+
 
+
:::'''a.'''        Select the '''GenBank''' accession numbers option. Type or Copy/Paste the '''RefSeq''' or '''INSDC''' numbers for each chromosome or organelle and click '''Get'''. Information from each imported genome should appear under '''Selected file(s)'''. Once all genomes have been imported (''e.g.'' the 14 ''Plasmodium'' chromosomes) click on '''Next'''.
+
 
+
:::'''b.'''        Once the import has concluded, the file’s metadata will be visible in the '''Genome Information''' page. Note that NCBI/GenBank genome annotations will be automatically imported to CoGe when using this method and that genomes uploaded using this method will be automatically made “Public”.
+
 
+
===''Exporting genomes from CoGe to CyVerse''===
+
 
+
:Data can be exported into CyVerse for easy sharing and storage after it has been imported onto CoGe. While this is not required to use any of CoGe's tools, it is a recommended step. You can export data from CoGe into the ''CyVerse Data Store'' by following these steps:
+
 
+
:'''1.'''    While logged into CoGe, go to the genome's '''Genome Information''' page.
+
 
+
:'''2.'''    Under the '''Tools''' menu, find the ''Export to CyVerse Data Store'' option. Click either on the FASTA or the GFF file options to upload genomic data and/or its annotation.
+
 
+
:'''3.'''    Wait until the export is completed. From this point forward, your FASTA and GFF files will also be found in the ''CyVerse Data Store''.
+
 
+
== '''Using CoGe tools to perform comparative analyses''' ==
+
 
+
[[File:Genomelistnew.png|thumb|250px|'''Figure 8. Genome List''' upload window as seem from '''OrganismView'''. Twelve ''Plasmodium'' genomes have been included. Analysis can be run following this link:  https://genomevolution.org/r/lys1]]
+
 
+
=== ''Analyzing GC content and other genomic properties (GenomeList)'' ===
+
 
+
There are significant variations in average GC content and GC content distribution between the two main human malaria agents: ''P. vivax'' and ''P. falciparum''. The average GC content is ''P. vivax'' compared to 19.4% in ''P. falciparum''. GC-poor regions are restricted to the subtelomeric regions of ''P.vivax''’s genome, whereas they are ubiquitous across the ''P. falciparum'' genome <ref>Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, Cheng Q, Coulson RM, Crabb BS, Del Portillo HA, Essien K, Feldblyum TV, Fernandez-Becerra C, Gilson PR, Gueye AH, Guo X, Kang'a S, Kooij TW, Korsinczky M, Meyer EV, Nene V, Paulsen I, White O, Ralph SA, Ren Q, Sargeant TJ, Salzberg SL, Stoeckert CJ, Sullivan SA, Yamamoto MM, Hoffman SL, Wortman JR, Gardner MJ, Galinski MR, Barnwell JW, Fraser-Liggett CM. 2008. Comparative genomics of the neglected human malaria parasite ''Plasmodium vivax''. Nature. 455:757-63. https://www.ncbi.nlm.nih.gov/pubmed/18843361</ref>. The current model is that AT-rich genomes represent the ancestral state and GC-rich genomes the derived state in specific ''Plasmodium'' lineages <ref>Nikbakht H, Xia X, Hickey DA. 2014. The evolution of genomic GC content undergoes a rapid reversal within the genus Plasmodium. Genome. 9:507-511. https://www.ncbi.nlm.nih.gov/pubmed/25633864</ref>. Here, we will evaluate the patterns of GC content variation across three of the four main ''Plasmodium'' clades.
+
 
+
[[File:Genomelistmakedwithcorrectcolors.png|thumb|250px|'''Figure 9. Genome List''' output window shows the analysis of 12 ''Plasmodium'' genomes. Clades are indicated with colors: simian clade (brown), rodent clade (red), and ''Laveranian'' subgenus (blue). The number of columns on display has been modified.]]
+
 
+
CoGe can display a genome’s GC content in '''[[GenomeInfo]]'''. To calculate GC content, click on '''%GC''' under the Length and/or Noncoding sequence sections on the ''Statistics tab''. You can compare and contrast GC content (and other genomic features) across several species and/or strains using '''[[GenomeList]]'''. This tool creates a list of genomes selected by the user and calculates features such as:
+
*Amino acid usage.
+
*Codon usage.
+
*Coding sequence (CDS) GC content.
+
*Number of genes.
+
*Number of introns.
+
 
+
'''GenomeList''' also summarizes some of the genomes’s metadata including:
+
*[[Sequence type]].
+
*[[Sequence origin]].
+
*Taxonomy.
+
*Provenance.
+
*Genome version.
+
 
+
[[File:Genoemlistresultsonphylogeny.png|thumb|250px|'''Figure 10.''' GC content is written in color text next to each analyzed ''Plasmodium'' genome. Species are colored according to their clade: simian (brown), rodent (red), ''Laveranian'' subgenus (blue), and reptile-birds (green/purple). Figure modified from Hayakawa et al. (2008) <ref>Hayakawa T, Culleton R, Otani H, Horii T, Tanabe K. 2008. Big bang in the evolution of extant malaria parasites. Mol Biol Evol. 10:2233-9. https://www.ncbi.nlm.nih.gov/pubmed/18687771</ref>]]
+
 
+
{| class=wikitable align=center style="background: #F5FFF5;"
+
 
+
|The following steps indicate how to perform comparative analyses using the '''GenomeList''' tool in CoGe:
+
 
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login to CoGe
+
 
+
'''2.''' Click on '''OrganismView''' or follow this link: https://genomevolution.org/coge/OrganismView.pl
+
 
+
'''3.''' Type the scientific name of any organism of interest on the ''Search'' box. Then, select a genome version.
+
 
+
'''4.''' Find the '''Tools''' section under '''Genome Information''' and click on '''Add to GenomeList'''. The first genome added to '''GenomeList''' will appear in a new window.
+
 
+
'''5.''' Without closing this window, type the scientific name of another organism on the ''Search'' box. Select the genome version and click on '''Add to GenomeList'''.
+
 
+
'''6.''' Once you have added all genomes click on '''Send to GenomeList''' ('''Figure 8''').
+
 
+
'''7.''' '''GenomeList''' will generate a table including all the selected genomes. You can use '''GenomeList''' to select and compare different genomic features and attributes. The analyses can be run on specific genomes or on all the included genomes. You can also select the display columns by clicking on '''Change Viewable Columns'''.
+
 
+
'''8.''' Click on "''Send Selected Genomes to''" to download the genomes included on '''GenomeList'''.
+
 
+
 
+
<span style="color:#8B008B">'''You can follow a link to an example analysis here:'''</span> https://genomevolution.org/r/lys1
+
 
+
|}
+
 
+
==== ''Comparing genomic composition sequence: GenomeList'' ====
+
 
+
We used '''GenomeList''' to compare 12 fully sequenced ''Plasmodium'' genomes ('''Figure 8'''). Our results show that species closely related to ''P. falciparum'' (subgenus ''Laverania'') have similarly AT rich genomes. GC content was higher in ''Plasmodium'' species of the simian and rodent clades ('''Figure 9''' and '''Figure 10'''). The highest GC content values were observed in species of the simian clade (''P. vivax'', ''P. cynomolgi'' and ''P. knowlesi''). Tellingly, these species all share a common ancestor and diverged from one another recently. GC content varied widely across ''Plasmodium'' species infecting humans (''P. vivax'', ''P. ovale'', ''P. malariae'', and ''P. falciparum'') but not on species infecting rodents (''P. berghei'', ''P. chabaudi'', and ''P. yoelii''). GC content also varied in human-infecting ''Plasmodium'' within the simian clade (''P. vivax'' =  46.89%, ''P. ovale'' = 32.83%, and ''P. malariae'' = 25.12%). Our results suggest that GC-richness (> 30%) evolved recently and is a derived state within the genus. Our results suggest that a correlation between GC-content and evolutionary relatedness, but not with host-related selective pressures.
+
 
+
AT-richness as an ancestral state for the ''Plasmodium'' genus is unusual since closely related genera within the phylum ''Apicomplexas'' frequently have GC-rich genomes (''Toxoplasma gondii'' = 52.28%, ''Cryptosporidium parvum'' = 30.4%, ''C. muris'' = 28.5%, ''Theileria orientalis'' = 41.58%, ''T. equii'' = 39.47%, ''Babesia bovis'' = 36.3%, ''Eimeria tenella'' = 51.07%, etc.). Our data suggests that ''Plasmodium'' GC content may be in the process of being reinstated to values that can be considered typical for the phylum. The implications of and mechanisms behind the extreme variability in GC-content within ''Plasmodium'' are currently being investigated <ref>Bensch S, Canbäck B, DeBarry JD, Johansson T, Hellgren O, Kissinger JC, Palinauskas V, Videvall E, Valkiūnas G. 2016. The Genome of Haemoproteus tartakovskyi and Its Relationship to Human Malaria Parasites. Genome Biol Evol. 8:1361-73.https://www.ncbi.nlm.nih.gov/pubmed/27190205</ref>.
+
 
+
===''Identifying gene homologs (CoGeBLAST)''===
+
 
+
[[File:Input.png|thumb|250px|'''Figure 11.''' Screen capture of '''CoGeBLAST''' input. Genomes included in the analysis and the used query sequence are shown.]]
+
 
+
The identification of homology based on sequence similarity is a key tool for gaining insight into an organism’s biology and genetics. Defining evolutionary relationships and inferring common ancestry is particularly challenging when dealing with multigene families. ''Plasmodium'' multigene families perform a wide array of functions, have diverse gene organization, and distinct evolutionary histories. Here we focus on a set of multi-gene families arising from the subtelomere (''e.g.'' ''var'', ''stevor'', ''rifin'', or ''vir'') that have very complex evolutionary patterns and organizations <ref>Singh V, Gupta P, Pande V. 2014. Revisiting the multigene families: ''Plasmodium var'' and ''vir'' genes. J Vector Borne Dis. 51:75-81. https://www.ncbi.nlm.nih.gov/pubmed/24947212</ref>. These four gene families are of particular interest because of their role in immune evasion and cell invasion. In addition, these families have undergone rapid sequence evolution and gene turnover <ref>Niang M, Yan Yam X, Preiser PR. 2009. The ''Plasmodium falciparum'' STEVOR multigene family mediates antigenic variation of the infected erythrocyte. PLoS Pathog. 5:e1000307. https://www.ncbi.nlm.nih.gov/pubmed/19229319</ref><ref>Witmer K, Schmid CD, Brancucci NM, Luah YH, Preiser PR, Bozdech Z, Voss TS. 2012. Analysis of subtelomeric virulence gene families in ''Plasmodium falciparum'' by comparative transcriptional profiling. Mol Microbiol. 84:243-59. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3491689/</ref><ref>Petter M, Bonow I, Klinkert MQ. 2008. Diverse expression patterns of subgroups of the ''rif'' multigene family during ''Plasmodium falciparum'' gametocytogenesis. PLoS One. 3:e3779. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0003779</ref>. These factors make inferring orthology/paralogy and gene gain/loss events in ''Plasmodium'' subtelomeric families a complex task.
+
 
+
The 313 members of ''P. vivax''’s ''vir'' family are grouped into 10 subfamilies based on their sequence similarity. Gene size and structure (number of exons) is largely variable among family members <ref>Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, Cheng Q, Coulson RM, Crabb BS, Del Portillo HA, Essien K, Feldblyum TV, Fernandez-Becerra C, Gilson PR, Gueye AH, Guo X, Kang'a S, Kooij TW, Korsinczky M, Meyer EV, Nene V, Paulsen I, White O, Ralph SA, Ren Q, Sargeant TJ, Salzberg SL, Stoeckert CJ, Sullivan SA, Yamamoto MM, Hoffman SL, Wortman JR, Gardner MJ, Galinski MR, Barnwell JW, Fraser-Liggett CM. 2008. Comparative genomics of the neglected human malaria parasite ''Plasmodium vivax''. Nature. 455:757-63. https://www.ncbi.nlm.nih.gov/pubmed/18843361</ref><ref>Lopez FJ, Bernabeu M, Fernandez-Becerra C, del Portillo HA. 2013. A new computational approach redefines the subtelomeric ''vir'' superfamily of ''Plasmodium vivax''. BMC Genomics. 14:8. https://www.ncbi.nlm.nih.gov/pubmed/?term=A+new+computational+approach+redefines+the+subtelomeric+vir+superfamily+of+Plasmodium+vivax</ref><ref>Fernandez-Becerra C, Yamamoto MM, Vêncio RZ, Lacerda M, Rosanas-Urgell A, del Portillo HA. 2009. ''Plasmodium vivax'' and the importance of the subtelomeric multigene ''vir'' superfamily. Trends Parasitol. 2009 25:44-51. https://www.ncbi.nlm.nih.gov/pubmed/19036639</ref>. The genetic diversity in the ''vir'' family is larger than that of other ''P. vivax'' families. Only fifteen of the 313 ''vir'' genes are shared across all sequenced ''P. vivax'' strains despite the recent emergence of the species ~ five million years ago. Within this group, PVX_113230 has been proposed as a potential family founder based on its high sequence conservation <ref>Neafsey DE, Galinsky K, Jiang RH, Young L, Sykes SM, Saif S, Gujja S, Goldberg JM, Young S, Zeng Q, Chapman SB, Dash AP, Anvikar AR, Sutton PL, Birren BW, Escalante AA, Barnwell JW, Carlton JM. 2012. The malaria parasite ''Plasmodium vivax'' exhibits greater genetic diversity than ''Plasmodium falciparum''. Nat Genet. 44:1046-50. https://www.ncbi.nlm.nih.gov/pubmed/22863733</ref>.
+
 
+
Here we use '''[[CoGeBLAST]]''' to identify the proposed founder of the ''Plasmodium vir'' family (PVX_113230) in six ''P. vivax'' strains (including the recently sequenced PO1 strain). '''CoGeBLAST''' incorporates genome visualization into BLAST analyses. Therefore, this tool facilitates the study of complex evolutionary patterns.
+
 
+
[[File:Position.png|thumb|250px|'''Figure 12.''' Screen capture of the genomic HSP visualization section of '''CoGeBLAST'''. Salvador-1 (left) and PO1 (right) are shown side by side. Analysis can be replicated following this link:  https://genomevolution.org/r/mjg3]]
+
 
+
{| class=wikitable align=center style="background: #F5FFF5;"
+
 
+
|The following steps show how to use '''CoGeBLAST''' in the CoGe platform:
+
 
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login to CoGe.
+
 
+
'''2.''' Click on '''CoGeBLAST''' or follow this link: https://genomevolution.org/coge/CoGeBlast.pl
+
 
+
'''3.''' Type the scientific name of the ''Organism'' of interest in the ''Search'' box. All genomes with names matching the search term will appear under the '''Matching Organisms''' menu. [[Notebooks]] matching the term will appear in a new window after clicking on '''Import List'''.
+
 
+
'''4.''' Select all the genomes of interest and click on '''+ Add'''. The genomes will now appear on the '''Selected Genomes''' menu. You can also select any of your Notebooks and include all the genomes contained in it.
+
 
+
'''5.''' Enter your query sequence in FASTA format. If desired, you can change the '''BLAST Parameters''' before starting the analysis.
+
 
+
'''6.''' Once all information is included click on '''Run CoGe BLAST''' ('''Figure 11''').
+
 
+
'''7.''' The analysis output will include:
+
*A table showing the high-scoring segment pairs (HSP) counts for each genome.
+
*A graphic depiction of the location of BLAST hits (Genomic HSP Visualization).
+
*A HSP table detailing genetic information for each hit. 
+
 
+
 
+
<span style="color:#8B008B">'''You can follow a link to an example analysis here:'''</span> https://genomevolution.org/r/mjg3
+
 
+
<span style="color:#8B008B">'''You can find links to the FASTA sequences used in this analysis in the "Sample data" section at the end of this page.'''</span>
+
|}
+
 
+
Sequences with significant similarity to PVX_113230 were found in all the evaluated ''P. vivax'' strains, including PO1. However, the number of high-scoring segment pairs for each ''P. vivax'' genome was variable. The highest number of sequence homologs was observed in the strains: Mauritania, PO1, and Salvador-1.
+
Sequence divergence of ''vir'' members within ''P. vivax'' seems to affect the number of high-scoring segment pairs per strain. Thus, the variation in the number of HSPs across strains further supports observations about the high sequence variation among ''vir'' homologs.
+
 
+
The location of HSPs appears to be slightly variable across genomes. However, we cannot confirm this patterns until the Mauritania, North Korea, Brazil I, and India VII genomes are fully assembled. Between the two fully assembled ''P. vivax'' genomes (Salvador-1 and PO1), BLAST hits were located largely in the same chromosome regions ('''Figure 12'''). As expected, a higher number of BLAST hits and a more variable genome location were observed when a less conserved ''vir'' family member (PVX_096004.1) was used as a query (analysis can be run following this link: https://genomevolution.org/r/mkcg).
+
 
+
===''Identifying microsyntenic regions (GEvo)''===
+
 
+
[[File:Marked.png|thumb|250px|'''Figure 13.''' Background GC content: GC-rich regions (green), AT-rich regions (white). Wobble GC content: GC-poor (red), ~50% GC (yellow), and GC-rich (green). The location of CyRPA and Rh5 is marked with sapphire and teal lines, respectively. You can rerun the analysis following this link:  https://genomevolution.org/r/m4dq]]
+
 
+
Changes in local genome organization can be used to ascertain the evolutionary history of a region (microsynteny). In ''Plasmodium'', many genes related to parasite-host interactions are rapidly evolving and undergo frequent rearrangements, gain/loss events, and horizontal transfer. These evolutionary processes leave "genomic signals" by altering the local genome organization. Erythrocyte invasion is a multi-step process that represents one of the most crucial steps in the ''Plasmodium'' life cycle <ref>Cowman AF, Crabb BS. 2006. Invasion of red blood cells by malaria parasites. Cell. 124:755-66. https://www.ncbi.nlm.nih.gov/pubmed/16497586</ref>. Recently, two ''P. falciparum'' genes (the reticulocyte-binding-like homologous protein 5 (Rh5) and the cysteine-rich protective antigen (CyRPA)) were shown to be the result of a horizontal gene transfer between ''P. faciparum'' and ''P. adleri'' progenitors within the . Remarkably, comparative genomics demonstrated that this horizontal gene transfer was localized to an 8kb segment on chromosome 4. The localized nature of this event, plus interspecific hybridization barriers suggest that the gene transfer occurred by the capture of a small segment of ''P. adleri'' progenitor genomic DNA by the ''P. falciparum'' progenitor within the ''Laveranian'' subgenus. As Rh5 and CyRPA are crucial for host erythrocyte invasion by ''P. falciparum'', it has been proposed that the capture of these two genes conferred a strong fitness advantage that allowed the ''P. falciparum'' progenitor to infect humans <ref>Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, Shaw KS, Ayouba A, Peeters M, Speede S, Shaw GM, Bushman FD, Brisson D, Rayner JC, Sharp PM, Hahn BH. 2016. Genomes of cryptic chimpanzee Plasmodium species reveal key evolutionary events leading to human malaria. Nat Commun. 7:11078. https://www.ncbi.nlm.nih.gov/pubmed/27002652</ref>. In sum, the genomic region surrounding these two genes represents an excellent case study on how to examine microsyteny with CoGe.
+
 
+
Here, we will use CoGe’s tool '''[[GEvo]]''' to evaluate genomic properties within this region and assess the hypothesized horizontal transfer event.
+
 
+
[[File:Pvsal1pvpo1pcy.png|thumb|250px|'''Figure 14.''' The analysis shows a region of synteny loss between ''P. vivax'' (Salvador-1), ''P. vivax'' (PO1) and ''P. cynomolgi''. Low quality segments are shown in orange. You can rerun the analysis following this link: https://genomevolution.org/r/mjjv]]
+
 
+
{| class=wikitable align=center style="background: #F5FFF5;"
+
 
+
|The following steps show how to use '''GEvo''' to analyze microsyntenic regions:
+
 
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe.
+
 
+
'''2.''' Click on '''GEvo''' or follow this link: https://genomevolution.org/coge/GEvo.pl
+
 
+
'''3.''' Specify a sequence for each box found under '''Sequence''' (you can specify a maximum of 25 sequences). Each box contains:
+
*A drop down menu of sequence databases (CoGe database, NCBI GenBank, or Direct Submission).
+
*The name of the selected sequence (''e.g.'' gene ID numbers).
+
*The length of the genome segment to display in GEvo.
+
*Additional ''Sequence Options'' including: skip sequence from the analysis, set sequence as a reference, set sequence as a reverse complement, and mask the sequence.
+
 
+
You can either import sequences for GEvo analysis by entering their gene IDs in the ''Name'' box, or you can select gene pairs for analysis directly from '''SynMap'''.
+
 
+
'''4.''' Click on '''Run GEvo'''.
+
 
+
'''5.''' The '''GEvo''' analysis will display the syntenic region between the compared genomes.
+
 
+
'''6.''' You can modify the parameters of the '''GEvo''' analysis in the '''Algorithm''' tab. Also, you can modify the information of the graphical display by altering the options on the '''Results Visualization Options''' tab.
+
 
+
 
+
<span style="color:#8B008B">'''You can follow a link to an example analysis here: '''</span> https://genomevolution.org/r/m4dq  <span style="color:#8B008B">'''and here'''</span> https://genomevolution.org/r/mjjv
+
 
+
|}
+
 
+
We performed a microsynteny analysis of the genome region containing Rh5 and CyRPA. The analysis was conducted using the five fully sequenced ''Laveranian'' genomes currently available: ''P. falciparum'' strains 3D7 and IT, ''P. reichenowi'' strains CDC and SY57, and ''P. gaboni'' strain SY75. Our results show that microsynteny is largely maintained in the regions surrounding Rh5 and CyRPA. We modified the '''Results Visualization Options''' tab to display background and wobble GC content for genes in this region. Neither background GC content across the region, nor wobble GC content for either Rh5 or CyRPA vary significantly ('''Figure 13'''). It has been proposed that significant changes in background or wobble GC content could be used as evidence of a horizontal transfer event. However, we did not observe such a pattern in our analyses. It is possible that a horizontal transfer event between ancestral ''Laveranian'' genomes might not be detected using this method due to the similar nucleotide composition of species in the subgenus. Therefore, an additional test might be required to further support the proposed horizontal transfer event.
+
 
+
We also used '''GEvo''' to further analyze regions where putative inversion breakpoints are located. Comparative analyses between ''P. vivax'' (Salvador-1) and ''P. vivax'' (PO1), and between ''P. vivax'' (Salvador-1) and ''P. cynomolgi'' show two inversion events. These events are not observed in comparisons between ''P. cynomolgi'' and ''P. vivax'' (PO1). A detailed study of the inversion breakpoints using '''GEvo''' shows genome segments of low sequence quality on ''P. vivax'' (Salvador-1) ('''Figure 14'''). This suggests that the reported inversion event might be the product of a sequencing artifact instead of a real rearrangement.
+
 
+
===''Performing synteny analyses between two genomes (SynMap)''===
+
 
+
Over evolutionary time, neighboring genes often maintain their relative position and order within a chromosomal segment. Chromosomal regions from different species that contain colinear homologs are said to be syntenic, i.e., genomic regions of shared ancestry. Changes in colinearity within syntenic regions are used to ascertain the nature, location, and extension of rearrangement events between related species. The main use of CoGE’s tool, '''[[SynMap]]''', is to find syntenic regions where gene order is preserved. '''SynMap'''’s graphical output allows for easy and fast interpretation of these results.
+
 
+
[[File:Synmappvvspcy.png|thumb|250px|'''Figure 15.''' SynMap input screen. Genomes for two different species are selected: ''P. cynomolgi'' B strain ('''Organism 1'''), and ''P. vivax'' Salvador-1 strain ('''Organism 2''').]]
+
 
+
[[File:Synmapexample1.png|thumb|250px|'''Figure 16.''' Inversion events observed in '''SynMap Legacy'''. Inversions seen on pairwise comparisons with ''P. vivax'' are marked with orange circles. See steps section (green box) to find links to rerun these analyses.]]
+
 
+
[[File:2.png|thumb|250px|'''Figure 17.''' Independent rearrangement events observed in '''SynMap Legacy'''. Identified rearrangement events: fusion/fission originated on chromosome 5 and 9 of ''P. malariae'' (red squares), fusion/fission originated on chromosome 13 and 14 of ''P. coatneyi'' (green squares), an inversion found on the central region of chromosome 4 of ''P. malariae'' (blue circle). See steps section (green box) to find links to rerun the analyses.]]
+
 
+
{| class=wikitable align=center style="background: #F5FFF5;"
+
 
+
|The following steps show how to analyze syntenic gene pairs with '''SynMap''':
+
 
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login to CoGe
+
 
+
'''2.''' Click on '''Organism View''' or follow this link: https://genomevolution.org/coge/OrganismView.pl
+
 
+
'''3.''' Type a scientific name in the ''Search'' box and select the appropriate genome. Then, click on the '''GenomeInfo''' link under the '''Genome Information''' section.
+
 
+
'''4.''' Find the link to the '''SynMap''' tool under the '''Analyze''' section.
+
 
+
'''5.''' By default, '''SynMap''' will perform a self-comparison of any selected genome. This is of use when characterizing a genome or when attempting to identify the relative age of putative duplication events <ref>Tang H, Lyons E. 2012. Unleashing the Genome of Brassica Rapa. Front Plant Sci. 3: 172. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3408644/</ref>. To analyze two different genomes, type a scientific name on the ''Search'' box of either Organism 1 or Organism 2. Once finished, click on '''Generate SynMap''' to run the analysis ('''Figure 15''').
+
 
+
'''6.''' '''SynMap''' will output a graphical depiction of the syntenic regions between two genomes. There are currently two version of '''SynMap''':
+
*''SynMap2'', allows the user to interact and dynamically alter the analysis.
+
*''SynMap Legacy'', provides static images of the analysis.
+
 
+
'''7.''' You can further analyze regions or genes of interest using the '''GEvo''' tool linked to '''SynMap'''. To do this, double click on a syntenic gene pair ('''SynMap Legacy'''), or select a syntenic gene pair and click on ''Compare in GEvo >>>'' ('''SynMap2''').
+
 
+
 
+
<span style="color:#8B008B">'''You can follow a link to the first example analyses here (Figure 16):'''</span>
+
 
+
https://genomevolution.org/r/lj12 <span style="color:#8B008B">('''''P. vivax'' vs. ''P. cynomolgi''''')</span>
+
 
+
https://genomevolution.org/r/lj1x <span style="color:#8B008B">('''''P. knowlesi'' vs. ''P. cynomolgi''''')</span>
+
 
+
https://genomevolution.org/r/lj1t <span style="color:#8B008B">('''''P. knowlesi'' vs. ''P vivax''''')</span> 
+
 
+
 
+
<span style="color:#8B008B">'''You can follow a link to the second example analyses here (Figure 17):'''</span>
+
 
+
https://genomevolution.org/r/lq5x <span style="color:#8B008B">('''''P. knowlesi'' vs. ''P. malariae''''')</span> 
+
 
+
https://genomevolution.org/r/lj2b <span style="color:#8B008B">('''''P. coatneyi'' vs. ''P. knowlesi''''')</span>
+
 
+
https://genomevolution.org/r/lq5y <span style="color:#8B008B">('''''P. coatneyi'' vs. ''P. malariae''''')</span>
+
 
+
https://genomevolution.org/r/lq5t <span style="color:#8B008B">('''''P. ovale'' vs. ''P. malariae''''')</span> 
+
 
+
https://genomevolution.org/r/lq65 <span style="color:#8B008B">('''''P. coatneyi'' vs. ''P. ovale''''')</span>
+
 
+
https://genomevolution.org/r/lq5v <span style="color:#8B008B">('''''P. ovale'' vs. ''P. knowlesi''''')</span>
+
|}
+
 
+
====''Identifying syntenic gene pairs''====
+
 
+
Gene position can be critical in gene expression. In many eukaryotes, expression of neighboring genes is coordinated by adjacent regulatory elements <ref>Ghanbarian AT, Hurst LD. 2015. Neighboring Genes Show Correlated Evolution in Gene Expression. Mol Biol Evol. doi:10.1093/molbev/msv053http://mbe.oxfordjournals.org/content/early/2015/04/01/molbev.msv053.full</ref><ref>De S, Teichmann SA, Babu MM. 2009. The impact of genomic neighborhood on the evolution of human and chimpanzee transcriptome. Genome Res. 19(5): 785–794. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675967/</ref><ref>Michalak P. 2008. Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics. 91:(43–248) http://www.sciencedirect.com/science/article/pii/S0888754307002807</ref>. Thus, changes in gene position and order can potentially alter gene expression inside the genomic neighborhood. In ''P. falciparum'', there is evidence that coordinated expression is absent in the highly dynamic subtelomeric regions. Furthermore, subtelomeric neighboring genes are known to form small independently expressed groups in a process thought to increase parasite’s adaptive potential <ref>Rovira-Graells N, Gupta AP, Planet E, Crowley VM, Mok S, Ribas de Pouplana L, Preiser PR, Bozdech Z, Cortés A. 2012. Transcriptional variation in the malaria parasite ''Plasmodium falciparum''. Genome Res. 5:925-38. https://www.ncbi.nlm.nih.gov/pubmed/22415456</ref>. It is still unknown if these transcriptional "islands" are found outside the subtelomeric regions, or even in other ''Plasmodium'' parasites. The first step to address this issue is to use tools that allow the rapid identification of changes in gene order and position. We can use '''SynMap''' to determine the origin, establish a gene’s relative location, and identify changes in gene position and order. This information can be used to later establish if patterns of coordinated expression, or lack of thereof, are prevalent across the ''Plasmodium'' genus.
+
 
+
==== ''Identifying chromosomal inversions, fusions, fissions and other events between two genomes'' ====
+
 
+
Numerous genome rearrangements have taken place throughout the evolution of the genus ''Plasmodium''. There is a strong correlation between synteny and divergence times. In other words, the more closely related two species are, the more likely synteny will be observed between their genomes <ref>Tachibana SI, Sullivan SA, Kawai S, Nakamura S, Kim HR, Goto N, Arisue N, Palacpac NM, Honma H, Yagi M, Tougan T, Katakai Y, Kaneko O, Mita T, Kita K, Yasutomi Y, Sutton PL, Shakhbatyan R, Horii T, Yasunaga T, Barnwell JB, Escalante AA, Carlton JM, Tanabe K. 2012. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat Genet. 44: 1051–1055. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759362/</ref>. We can use '''SynMap''' to infer the putative evolutionary origin and relative location of rearrangement events across the length of the genome.
+
 
+
We used '''SynMap''' to confirm the relative genome location and origin of reported inversions between ''P. vivax'', ''P. cynomolgi'' and ''P. knowlesi''’s 3rd and 6th chromosomes. We performed pairwise comparisons to evaluate changes in genome organization amongst the three species ('''Figure 16'''). We only detected inversion events in pairwise comparisons with ''P. vivax'' ('''Figure 16''', orange circles). This suggests that the inversion events reported on chromosomes 3 and 6 occurred after the split of ''P. cynomolgi'' and ''P. vivax'' (approximately 3.43-3.87 Mya) <ref>Pacheco MA, Reid MJ, Schillaci MA, Lowenberger CA, Galdikas BM, Jones-Engel L, Escalante AA. 2012. The origin of malarial parasites in orangutans. PLoS One. 7:e34990. https://www.ncbi.nlm.nih.gov/pubmed/22536346</ref>. However, a detailed analysis of the breakpoint regions in ''P. vivax'' using '''GEvo''' ('''Figure 14''') shows a genome segment of low sequence quality. Thus, it is possible that the inversion event reported on ''P. vivax'' could actually be an artifact.
+
 
+
We also used '''SynMap''' to infer changes in gene order and composition amongst another group of closely related ''Plasmodium'' species. Pairwise comparisons were performed between four closely related ''Plasmodium'' parasites from the simian clade: ''P. ovale curtisi'', ''P. malariae'', ''P. coatneyi'' and ''P. knowlesi''. We identified independent sets of chromosome fusion/fission events across these species. A set of fusions/fissions was found on ''P. malariae''’s 5th and 9th chromosomes ('''Figure 17''', red squares); another set of fusion/fission events was found on ''P. coatneyi''’s 13th and 14th chromosomes ('''Figure 17''', green squares). In addition, we found an inversion event located in the central region of ''P. malariae''’s 4th chromosome ('''Figure 17''', blue circle).
+
 
+
=== ''Measuring Kn/Ks values between genomes (SynMap - CodeML analysis tool)'' ===
+
 
+
Differences in nucleotide loci will accumulate between two genomes as the result of evolution. Nucleotide changes that alter the coded amino acid are called non-synonymous and those that do not are called synonymous. Synonymous substitutions are largely neutral and mostly reflect background evolutionary changes. On the other hand, non-synonymous substitutions are largely affected by natural selection. Under neutrality, the rate of synonymous (Ks) and non-synonymous (Kn) substitutions will be equivalent. Deviations from this expectation indicate a significant role of natural selection. Insights into trends of natural selection are gained from evaluating the Kn/Ks ratio. We observe Kn/Ks = 1 under neutrality; we observe Kn/Ks > 1 when non-synonymous substitutions are fixated at a faster rate than synonymous ones (positive selection); and, we observe Kn/Ks < 1 when new amino acid changes are eliminated (purifying selection).
+
 
+
The CoGe platform has the capability of calculating the Kn/Ks ratio on syntenic gene pairs across the length of a genome. CoGe’s Kn/Ks analyses can be used to:
+
*Identify hotspots of strong positive or purifying selection across the length of the genome. 
+
*Establish associations between genome position (''e.g. '' telomeres vs. centromeres) and trends of natural selection.
+
*Describe species- or genus-specific adaptive trends.
+
 
+
CoGe uses the CodeML analysis tool to measure the Kn/Ks ratio between two annotated genomes. The CodeML analysis tool can be accessed from '''[[SynMap]]'''. Here, we evaluated the selective trends of three closely related species from the ''Laveranian'' subgenus ('''Figure 18''').
+
 
+
[[File:Tree.png|thumb|250px|'''Figure 18.''' Phylogeny of ''Plasmodium'' species of the ''Laverania'' subgenus built using mitochondrial sequences. Species included in our analysis are marked with a red asterisk. Modified from Rayner et al. (2011) <ref>Rayner JC, Liu W, Peeters M, Sharp PM, Hahn BH. 2011. A plethora of Plasmodium species in wild apes: a source of human infection? Trends Parasitol. 27:222-9. https://www.ncbi.nlm.nih.gov/pubmed/21354860?dopt=Abstract&holding=npg </ref>]]
+
 
+
[[File:Ks.png|thumb|250px|'''Figure 19.''' Paired Ks analyses between species of the ''Laverania'' subgenus. '''A'''. ''P. gaboni'' vs. ''P. reichenowi''; '''B'''. ''P. falciparum'' vs. ''P. reichenowi''; and, '''C'''. ''P. gaboni'' vs. ''P. falciparum'']]
+
 
+
[[File:Kn.png|thumb|250px|'''Figure 20.''' Paired Kn analyses between species of the ''Laverania'' subgenus. '''A'''. ''P. gaboni'' vs. ''P. reichenowi''; '''B'''. ''P. falciparum'' vs. ''P. reichenowi''; and, '''C'''. ''P. gaboni'' vs. ''P. falciparum'']]
+
 
+
{| class=wikitable align=center style="background: #F5FFF5;"
+
 
+
|The following steps show how to perform Kn/Ks analyses using '''SynMap'''’s CodeML tool:
+
 
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe.
+
 
+
'''2.''' Run '''SynMap''' or select a previous '''SynMap''' analysis from ''My Data'' (CoGe stores all ran analyses under a users' account).
+
 
+
'''3.''' Find the '''CodeML tool''' under the '''Analysis Options''' tab. Click on ''Calculate syntenic CDS pairs and color dots:      substitution rates(s)'' and select ''Synonymous (Ks)'' from the dropdown menu. Repeat the analysis selecting the ''Non-synonymous (Kn)'' and ''(Kn/Ks)'' options. You can alter the display selecting a different ''Color Scheme'', specifying ''Min Val.'' or ''Max Val.'' axis values, or changing the ''Log10 Transform.'' data option.
+
 
+
'''4.''' The analysis will modify the ''' [[Syntenic_dotplot]] ''' display to represent the distribution of the Ks, Kn or Kn/Ks values across syntenic gene pairs. A ''Histogram of Ks values'' (or Kn or Ks/Kn) will also be generated. In '''SynMap2''', specific regions can be dynamically selected to view the Ks, Kn or Kn/Ks values.
+
 
+
 
+
<span style="color:#8B008B">'''You can follow a link to Ks example analyses here (Figure 19):'''</span>
+
 
+
https://genomevolution.org/r/ljhj <span style="color:#8B008B">('''''P. reichenowi'' vs. ''P. falciparum''''')</span>
+
 
+
https://genomevolution.org/r/ljhl <span style="color:#8B008B">('''''P. falciparum'' vs. ''P. gaboni''''')</span>
+
 
+
https://genomevolution.org/r/ljhq <span style="color:#8B008B">('''''P. reichenowi'' vs. ''P. gaboni''''') </span>
+
 
+
 
+
<span style="color:#8B008B">'''You can follow a link to Kn example analyses here (Figure 20):'''</span>
+
 
+
https://genomevolution.org/r/lsyy <span style="color:#8B008B">('''''P. reichenowi'' vs. ''P. gaboni''''')</span>
+
 
+
https://genomevolution.org/r/lsz2 <span style="color:#8B008B">('''''P. reichenowi'' vs. ''P. falciparum''''')</span>
+
+
https://genomevolution.org/r/lsz5 <span style="color:#8B008B">('''''P. falciparum'' vs. ''P. gaboni''''')</span>
+
 
+
|}
+
 
+
''P. reichenowi'' and ''P. falciparum'' are thought to have diverged approximately 5.28-5.93 Mya  <ref>Pacheco MA, Reid MJ, Schillaci MA, Lowenberger CA, Galdikas BM, Jones-Engel L, Escalante AA. 2012. The origin of malarial parasites in orangutans. PLoS One. 7:e34990. https://www.ncbi.nlm.nih.gov/pubmed/22536346</ref>. The divergence time of either species with ''P. gaboni'' is estimated to be larger <ref>Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, Shaw KS, Ayouba A, Peeters M, Speede S5, Shaw GM, Bushman FD, Brisson D, Rayner JC, Sharp PM, Hahn BH. 2016. Genomes of cryptic chimpanzee Plasmodium species reveal key evolutionary events leading to human malaria. Nat Commun. 7:11078. https://www.ncbi.nlm.nih.gov/pubmed/27002652
+
</ref>. Based on these evolutionary relationships, it is expected that the number of accumulated differences in nucleotide loci will be smaller between ''P. reichenowi'' and ''P. falciparum'', than between both species and ''P. gaboni''.
+
 
+
We found smaller Ks values between ''P. gaboni'' (SY57) - ''P. reichenowi'' (CDC) than between ''P. gaboni'' (SY57) - ''P. falciparum'' (3D7) ('''Figure 19'''). Also, smaller Ks values were observed between ''P. reichenowi'' - ''P. falciparum'' than between ''P. falciparum'' - ''P. gaboni''. The same trends were observed when a different ''P. reichenowi'' strain (SY75) was used (results can be replicated in the following links: https://genomevolution.org/r/mr5u  for ''P. reichenowi'' vs. ''P. gaboni'', and https://genomevolution.org/r/lzrr for ''P. reichenowi'' vs. ''P. falciparum''). The differences in Ks rates suggest that a recent number of synonymous substitutions occurred on the ''P. reichenowi'' genome. Genome composition and codon usage are largely similar amongst ''Laveranian'' species ('''Figures 10''' and '''24'''). Thus, this variation could indicate an increased mutation rate on ''P. reichenowi'', resulting in a rapidly evolving genome compared to other ''Laveranian''. However, the reasons for this putative increment remain unexplored.
+
 
+
Non-synonymous (Kn) substitution rates were largely similar between ''P. gaboni'' - ''P. falciparum'' and ''P. gaboni'' - ''P. reichenowi'' ('''Figure 20'''). Smaller Kn substitution values were observed between ''P. falciparum'' - ''P. reichenowi''. Similar trends were seen when ''P. reichenowi'' (SY75) was used (results can be replicated in the following links: https://genomevolution.org/r/mr5z for ''P. reichenowi'' vs. ''P. gaboni'', and https://genomevolution.org/r/mr5x for ''P. reichenowi'' vs. ''P. falciparum''). These results suggest that a comparable rate of Kn changes occurred since the divergence of the ''P. reichenowi''/''P. falciparum'' ancestor. These changes were followed by a significant number of species-specific substitutions on both ''P. falciparum'' and ''P. reichenowi''. Previous studies have found large Kn values in ''P. reichenowi'' - ''P. falciparum'' comparisons; particularly, in genes expressed during blood parasite's stages <ref>Otto TD, Rayner JC, Böhme U, Pain A, Spottiswoode N, Sanders M, Quail M, Ollomo B, Renaud F, Thomas AW, Prugnolle F, Conway DJ, Newbold C, Berriman M. 2014. Genome sequencing of chimpanzee malaria parasites reveals possible pathways of adaptation to human hosts. Nat Commun. 5:4754. https://www.ncbi.nlm.nih.gov/pubmed/25203297</ref>. Thus, our results likely reflect Kn changes related to parasite-host interactions and adaptations to infection of different host types.
+
 
+
 
+
===''Identifying sets of syntenic genes amongst several genomes (SynFind)''===
+
 
+
[[File:Synfind.png|thumb|250px|'''Figure 21.''' Screen capture of '''Synfind''' analysis output. Additional links to CoGe's analyses can be found under '''Links'''. Results can be replicated here: https://genomevolution.org/r/moya]]
+
 
+
Small-scale genomic rearrangements are often linked to species-specific gene gain/loss events. Family-linked rearrangements are observed amongst closely related ''Plasmodium'' species, and in occasion, at an intra-specific level. CoGe’s tool, '''[[SynFind]]''', is used to identify gene homologs across any number of genomes, and thus can be of use to identify these rearrangements.
+
 
+
The evolutionary trajectory of multigene families can be difficult to infer, especially in those with a scattered organization or rapid gene turnover. While this issue is particularly prevalent in species-specific families; genus-specific families can present intricate evolutionary patterns as well. Among these, the evolutionary history of the SERA (serine repeat antigen) family is highly dynamic. This family has experienced a significant number of inter-specific contractions, expansions, and rearrangements. These patterns remain to be evaluated at an intra-specific level. We will use '''SynFind''' to study family's organization of SERA paralogs in 6 ''P. vivax'' strains.
+
 
+
SERA paralogs are expressed during various stages of the ''Plasmodium'' life cycle. All SERA family members code proteins with a papain-like cysteine protease motif
+
<ref>Arisue N, Kawai S, Hirai M, Palacpac NM, Jia M, Kaneko A, Tanabe K, Horii T. 2011. Clues to Evolution of the SERA Multigene Family in 18 Plasmodium Species. PLoS One. 6: e17775. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017775</ref>. These motifs are commonly found both inside and outside the genus ''Plasmodium'' <ref>Prasad R, Atul, Soni A, Puri SK, Sijwali PS. 2012. Expression, characterization, and cellular localization of knowpains, papain-like cysteine proteases of the Plasmodium knowlesi malaria parasite. PLoS One. 12:e51619. https://www.ncbi.nlm.nih.gov/pubmed/23251596</ref><ref>Brömme D. 2001. Papain-like cysteine proteases. Curr Protoc Protein Sci. 21. doi: 10.1002/0471140864.ps2102s21. https://www.ncbi.nlm.nih.gov/pubmed/18429163</ref>. One member (SERA-5), expressed during late trophozoite and schizont stages, has been considered as a promising malaria vaccine target <ref>Arisue N, Hirai M, Arai M, Matsuoka H, Horii T. 2007. Phylogeny and evolution of the SERA multigene family in the genus Plasmodium. J Mol Evol. 65:82-91. http://link.springer.com/article/10.1007%2Fs00239-006-0253-1</ref>. We will use this gene sequence as a query for the '''SynFind''' analysis.
+
 
+
[[File:Newsynfind.png|thumb|250px|'''Figure 22. GEvo''' analysis using the '''Synfind''' output. The number of sequences and display order has been modified to include only the SERA family: PVX_003850 (Salvador-1, set as reference), PVP01_0417200.1 (P01), cds1276 (Brazil I), cds1241 (North Korea), cds1011 (India VII), and cds1227 (Mauritania). Connector lines show syntenic regions between SERA family members. Brazil I strain is marked with a blue diamond. Strain-specific changes on family's organization are highlighted with a blue parallelogram. Results can be replicated here: https://genomevolution.org/r/mpdf]]
+
 
+
{| class=wikitable align=center style="background: #F5FFF5;"
+
 
+
|The following steps show how to use '''SynFind''':
+
 
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe.
+
 
+
'''2.''' Click on '''SynFind''' or follow this link: https://genomevolution.org/CoGe/SynFind.pl.
+
 
+
'''3.''' Type a scientific name of your search bar under '''Select Target Genomes'''. Organisms and genomes with names matching the search term will be displayed on the '''Matching Organisms''' menu.
+
 
+
'''4.''' Select the genomes of interest using Crtl+click or Command+click, then click on '''+ Add'''. The genomes will appear on the '''Selected Genomes''' menu. You can also import genomes from your Notebooks.
+
 
+
'''5.''' Type the ''Name'', ''Annotation'', or ''Organisms'' on the '''Specify Features''' section. It is recommended to include as many specific terms as possible. Once done click on '''Search'''.
+
 
+
'''6.''' All matches to the search term and the genome where they have been found will appear in a new menu within the same section. Select all relevant '''Matches''' and the reference '''Genome'''.
+
 
+
'''7.''' Click on '''Run SynFind''' to start the analysis. 
+
 
+
'''8.''' '''SynFind''' will output all syntenic regions from the reference genome and their [[Syntenic depth]]. This output can be used as a query for other CoGe tools.
+
 
+
 
+
<span style="color:#8B008B">'''You can follow a link to a SynFind example analysis here:'''</span> https://genomevolution.org/r/moya <span style="color:#8B008B">
+
 
+
<span style="color:#8B008B">'''GEvo results can be replicated here:''' </span> https://genomevolution.org/r/mpdf
+
+
|}
+
 
+
We used '''Synfind''' to identify genes homologous to SERA-5 across 6 ''P. vivax'' genomes ('''Figure 21'''). '''Synfind'''’s output was used as a query for a '''GEvo''' analysis of the region. Our results show a conserved number of SERA paralogs in all ''P. vivax'' strains.  The organization of the SERA family was different on the Brazil I strain respect to other ''P. vivax'' strains ('''Figure 22'''). Previous studies on SERA have suggested that some family members are unique to ''P. vivax'' and closely related species <ref>Arisue N, Kawai S, Hirai M, Palacpac NM, Jia M, Kaneko A, Tanabe K, Horii T. 2011. Clues to Evolution of the SERA Multigene Family in 18 Plasmodium Species. PLoS One. 6: e17775. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017775</ref>. Our results indicate that family organization is not completely conserved on the intra-specific level. This is most evident on recently duplicated paralogs.
+
 
+
'''SynFind''' also identified matching segments outside the SERA multigene family. These segments belonged to hypothetical protein coding genes, ATP proteases, and uncharacterized transcripts. Papain-like cysteine protease motifs are commonly found outside both ''Plasmodium'' and the SERA family. Thus, is likely that these segments share a papain-like cysteine protease motif but are not evolutionarily related to SERA.
+
 
+
 
+
=== ''Identifying codon and amino acid substitution frequencies (CodeOn)'' ===
+
 
+
[[File:Sco.png|thumb|250px|'''Figure 23.''' Amino acid usage tables of simian clade ''Plasmodium'' species. '''Upper row:''' sister species ''P. vivax'' and ''P. cynomolgi''. '''Bottom row:''' sister species ''P. knowlesi'' and ''P.coatneyi''. See steps section (green box) to find links to rerun the analyses.]]
+
 
+
Codon and amino acid usage are significantly shaped by two factors: selection for translational efficiency and genome composition. The significance of translational selection on genome evolution varies across the genus ''Plasmodium''. It is believed that usage of less energetically expensive amino acids provides an evolutionary advantage by decreasing energetic costs during infection <ref>Peixoto L, Fernández V, Musto H. 2004. The effect of expression levels on codon usage in Plasmodium falciparum. Parasitology. 128:245-51. https://www.ncbi.nlm.nih.gov/pubmed/15074874</ref>. 
+
On ''P. falciparum'' many highly expressed genes are majorly composed of C-ended codons despite the AT-rich genome. On the GC-rich ''P. vivax'' genome, translational selection and codon usage bias are not strongly related <ref>Yadav MK, Swati D. 2012. Comparative genome analysis of six malarial parasites using codon usage bias based tools. Bioinformation. 8:1230-9. https://www.ncbi.nlm.nih.gov/pubmed/23275725</ref>. Genome composition is also a powerful force in protein evolution.
+
 
+
Here, we will use '''[[CodeOn]] ''' to calculated amino acid usage across a range of GC-rich to GC-poor genomes. We will measure the effects of genome composition bias on amino acid usage across 7 ''Plasmodium'' genomes from two major clades (''Laveranian'' and simian).
+
 
+
[[File:Lco.png|thumb|250px|'''Figure 24.''' Amino acid usage tables in ''Plasmodium'' species from the ''Laveranian'' subgenus. '''Upper row:''' sister species ''P. falciparum'' and ''P. reichenowi''. '''Bottom row:''' ''P. gaboni''. See steps section (green box) to find links to rerun the analyses.]]
+
 
+
{| class=wikitable align=center style="background: #F5FFF5;"
+
 
+
|The following steps indicate how to built amino acid usage tables using '''CodeOn''':
+
 
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe.
+
 
+
'''2.''' Find the genome of interest in '''OrganismView''' or follow this link https://genomevolution.org/coge/OrganismView.pl
+
 
+
'''3.''' Click on '''CodeOn''' to start the analysis. After a couple of minutes, the output will show in a different tab.
+
 
+
 
+
<span style="color:#8B008B">'''You can follow links to CodeOn example analyses for the simian clade here (Figure 23):'''</span>
+
 
+
https://genomevolution.org/coge/CodeOn.pl?oid=27002 <span style="color:#8B008B">('''''P. vivax''''')</span>
+
 
+
https://genomevolution.org/coge/CodeOn.pl?dsgid=32770 <span style="color:#8B008B">('''''P. cynomolgi''''')</span>
+
 
+
https://genomevolution.org/coge/CodeOn.pl?oid=26997 <span style="color:#8B008B">('''''P. knowlesi''''')</span>
+
 
+
https://genomevolution.org/coge/CodeOn.pl?oid=40698 <span style="color:#8B008B">('''''P. coatneyi''''')</span>
+
 
+
 
+
<span style="color:#8B008B">'''You can follow links to CodeOn example analyses for the ''Laveranian'' subgenus here (Figure 24):'''</span>
+
 
+
https://genomevolution.org/coge/CodeOn.pl?oid=26992 <span style="color:#8B008B">('''''P. falciparum''''')</span>
+
 
+
https://genomevolution.org/coge/CodeOn.pl?oid=40801 <span style="color:#8B008B">('''''P. reichenowi''''')</span>
+
 
+
https://genomevolution.org/coge/CodeOn.pl?oid=40696 <span style="color:#8B008B">('''''P. gaboni''''')</span>
+
 
+
|}
+
 
+
Amino acid usage trends were markedly different in species from different clades ('''Figure 23''' and '''Figure 24'''). On the other hand, closely related ''Plasmodium'' species showed similar amino acid usage patterns.
+
 
+
''P. vivax'' (Salvador-1) had the highest number of CDS with 45-55% GC content. Closely related species (''P. cynomolgi'', ''P. knowlesi'', and ''P.coatneyi'') had a higher number of CDS in the 40-45% GC tier ('''Figure 23'''). Genome composition is similar between ''P. cynomolgi'', ''P. knowlesi'', and ''P. coatneyi'' ('''Figure 9''' and '''Figure 10'''). However, patterns of amino acid usage were markedly different on ''P. coatneyi'' respect to other simian species ('''Figure 23''').
+
 
+
In the ''Laveranian'' subgenus, the number of CDS with 20-30% GC content was significantly larger. Amino acid usage was similar in ''P. falciparum'' (3D7) and ''P. reichenowi'' (SY57), but slightly different on ''P. gaboni'' ('''Figure 24'''). This variation is noteworthy given that the three species share a similar compositional bias ('''Figure 9''' and '''Figure 10'''). This result suggests that compositional genome bias is a significant factor in amino acid usage on both the simian clade and ''Laveranian'' subgenus. However, we cannot discard the significance of additional factors not evaluated here.
+
 
+
 
+
=== ''Using Syntenic Path Assembly (SPA) to make analysis of poor or early genome assemblies easier (SynMap - SPA tool)'' ===
+
 
+
[[File:Spacapture.png|thumb|250px|'''Figure 25. Syntenic Path Assembly (SPA)''' window analysis]]
+
 
+
There is a large number of ''Plasmodium'' genomes that remain to be fully sequenced, assembled and annotated. Incomplete genomic data comes from a variety of sources:
+
*Genomic information published on early assembly stages.
+
*Partially sequenced genomes.
+
*Low-quality genome segments.
+
 
+
Sequencing projects can be slightly simplified by the use of a reference genome as a guideline for genome assembly. While unassembled and non-annotated genomes can be used in smaller-scale studies (''e.g. '' orthologs can be identified with BLAST), there are limitations in their usability in large-scale comparative genomics.
+
 
+
[[File:SyntenicPathAssembly.png|thumb|250px|'''Figure 26. ''P. inui'' Syntenic Path Assembly (SPA)''' using ''P. coatneyi'' as a reference genome. Black circles show putative interpretation errors. The analysis can be replicated following this link: https://genomevolution.org/r/ljen]]
+
 
+
Tools that generate preliminary assemblies have great significance in comparative analyses, especially when large amounts of genomic data become available. CoGe’s tool, '''[[Syntenic_path_assembly]]''' ('''SPA'''), creates a graphical display of syntenic gene pairs based on a reference genome. We will use '''SPA''' to assemble the ''P. inui'' genome (on scaffold level as in 2016) using the fully assembled ''P. coatneyi'' genome as a reference.
+
 
+
{| class=wikitable align=center style="background: #F5FFF5;"
+
 
+
|The following steps show how to use ''' SynMap - SPA tool''':
+
 
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe
+
 
+
'''2.''' Run '''SynMap''' between an assembled and a non-assembled genome (this might take longer than analyses using two fully assembled genomes).
+
 
+
'''3.''' After running '''SynMap''' click on the ''Display Options'' tab and find the '''SPA''' tool ('''Figure 25'''). Select the tool by clicking on the check mark next to: <span style="color:blue">The Syntenic Path Assembly (SPA)?</span>
+
 
+
'''4.''' After a few minutes, the incomplete genome will be assembled using the second genome as a reference.
+
 
+
 
+
<span style="color:#8B008B">'''You can follow a link to an example analysis here:'''</span> https://genomevolution.org/r/ljen
+
|}
+
 
+
'''SPA''' is extremely useful to generate quick and dirty genome assemblies; however, there are some limitations regarding assembly interpretation. We highlight two scenarios seen on the ''P. inui''’s '''SPA''' using ''P. coatneyi''’s genome as a reference ('''Figure 26''').
+
 
+
Rearrangement events such as inversions or duplications cannot be identified using '''SPA'''. For one, several contigs can be syntenic to the same region of the reference genome without signaling a duplication event. Also, contigs syntenic to a reverse DNA strand might not reflect chromosome inversions (black circles, '''Figure 26''').
+
 
+
In addition, contigs will be arranged to increase synteny between the unassembled and the reference genome. Thus, using different reference genomes will result in different preliminary assemblies. In the case of ''P. inui'', using ''P. coatneyi'' (a closely related species) or ''P. falciparum'' (a distant species) as reference genomes will result on different assemblies. Therefore, before running '''SPA''', the reference genomes should be selected after consideration of the biological and evolutionary relation between species. Also, interpretation of '''SPA''' assemblies might be problematic when working with transposon-rich genomes.
+
  
  
 
=='''Overall conclusions'''==
 
=='''Overall conclusions'''==
  
The number of available ''Plasmodium'' genomes has increased considerably during recent years. The increment of genomic information creates an unprecedented opportunity to study the unique genomic qualities of this genus.
+
Insights into the unique patterns of ''Plasmodium'' biology, epidemiology, ecology, and genetics can be obtained from molecular and comparative genomic studies. The rapid growth of genomic information makes implementing tools that facilitate assessing genome evolutionary trends an imperative task. The services and tools provided by the CoGe platform are of considerable use in advancing ''Plasmodium'' comparative genomics. Here, we showed how various CoGe tools could be used to assess evolutionary patterns unique to ''Plasmodium''. We also showed how to use this platform to further characterize sequenced ''Plasmodium'' genomes. Overall, we have demonstrated that CoGe’s tools can be used to address evolutionary questions such as:  
 
+
Thanks to worldwide efforts, there has been a significant reduction in the number of malaria cases and malaria-related deaths between 2000 and 2015. By 2015, it was estimated that the number of malaria cases decreased from 262 million to 214 millions, and the number of malaria-related deaths from 839,000 to 438,000 <ref>World Health Organization. (2015). World Malaria Report 2015. Retrieved from http://www.who.int/malaria/publications/world-malaria-report-2015/report/en/</ref>. There have been tremendous achievements in malaria treatment and control strategies. However, there are still numerous aspects that need further addressing in malaria research. 
+
 
+
The intricacies of parasite-host relations in ''Plasmodium'' infection might be more complex than previously considered <ref>Garamszegi LZ. 2009. Patterns of co-speciation and host switching in primate malaria parasites. Malar J. 110. doi: 10.1186/1475-2875-8-110. https://www.ncbi.nlm.nih.gov/pubmed/19463162</ref>. Humans have been infected by ''Plasmodium'' species classically considered specific of non-human primates (''e.g.'' a single infection with ''P. cynomolgi'' <ref>Ta TH, Hisam S, Lanza M, Jiram AI, Ismail N, Rubio JM. 2014. First case of a naturally acquired human infection with ''Plasmodium cynomolgi''. Malar J. 13: 68. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3937822/</ref> and various infections with ''P. knowlesi'' <ref>Singh B, Daneshvar C. 2013. Human infections and detection of Plasmodium knowlesi. Clin Microbiol Rev. 26:165-84.  https://www.ncbi.nlm.nih.gov/pubmed/23554413</ref>). African primates have been infected by unique ''P. falciparum'' strains (a parasite classically considered exclusive to humans) and are proposed to act as reservoirs for this parasite <ref>Prugnolle F, Durand P, Neel C, Ollomo B, Ayala FJ, Arnathau C, Etienne L, Mpoudi-Ngole E, Nkoghe D, Leroy E, Delaporte E, Peeters M, Renaud F. 2010. African great apes are natural hosts of multiple related malaria species, including ''Plasmodium falciparum''. Proc Natl Acad Sci U S A. 107:1458-63. https://www.ncbi.nlm.nih.gov/pubmed/20133889</ref><ref>Duval L, Fourment M, Nerrienet E, Rousset D, Sadeuh SA, Goodman SM, Andriaholinirina NV, Randrianarivelojosia M, Paul RE, Robert V, Ayala FJ, Ariey F. 2010. African apes as reservoirs of ''Plasmodium falciparum'' and the origin and diversification of the ''Laverania'' subgenus. Proc Natl Acad Sci U S A. 107:10561-6. https://www.ncbi.nlm.nih.gov/pubmed/20498054</ref>. In bird ''Plasmodium'', the putative evolutionary time of parasite-host associations has a significant role in the development of pathogenicity and in host mortality <ref>Krizanauskiene A, Hellgren O, Kosarev V, Sokolov L, Bensch S, Valkiunas G. 2006. Variation in host specificity between species of avian haemosporidian parasites: evidence from parasite morphology and cytochrome B gene sequences. J Parasitol. 6:1319-24. https://www.ncbi.nlm.nih.gov/pubmed/17304814</ref>. Finally, multiple host-switch events between largely divergent host types are thought to have occurred on bat ''Haemosporidia'' <ref>Duval L, Robert V, Csorba G, Hassanin A, Randrianarivelojosia M, Walston J, Nhim T, Goodman SM, Ariey F. 2007. Multiple host-switching of Haemosporidia parasites in bats. Malar J. 6:157. https://www.ncbi.nlm.nih.gov/pubmed/18045505</ref>. These cases highlight the complexity of the ''Plasmodium'' infection landscape. Insights into the unique patterns of ''Plasmodium'' biology, epidemiology, ecology, and genetics can be obtained from molecular and comparative genomic studies.  
+
 
+
The rapid growth of genomic information makes implementing tools that facilitate assessing genome evolutionary trends an imperative task. The services and tools provided by the CoGe platform are of considerable use in advancing ''Plasmodium'' comparative genomics. Here, we showed how various CoGe tools could be used to assess evolutionary patterns unique to ''Plasmodium''. We also showed how to use this platform to further characterize sequenced ''Plasmodium'' genomes. Overall, we have demonstrated that CoGe’s tools can be used to address evolutionary questions such as:  
+
 
*The evolutionary origins of ''Laveranian'' AT-rich genomes.  
 
*The evolutionary origins of ''Laveranian'' AT-rich genomes.  
 
*The location and nature of genome rearrangements between ''Plasmodium''.
 
*The location and nature of genome rearrangements between ''Plasmodium''.

Latest revision as of 13:29, 14 February 2017

About this guide

This 'cookbook' style document is meant to provide an introduction to many of our tools and services and is structured around a case study of investigating genome evolution of the malaria-causing Plasmodium spp. The small size and unique features of this pathogen's genome make it ideal for beginning to understand how our tools can be used to conduct comparative genomic analyses and uncover meaningful discoveries.

Through a number of example analyses, this guide will teach users about the following tools:

  • LoadGenome: Add a new genome to CoGe.
  • LoadAnnotation: Add structural and/or functional annotations to a genome.
  • GenomeInfo: Get information about a genome.
  • GenomeList: Get information about several genomes in a table.
  • CoGeBLAST: BLAST against any set of genomes.
  • GEvo: Microsynteny analysis.
  • SynMap: Whole genome syntenic analysis.
- SynMap#Calculating_and_displaying_synonymous.2Fnon-synonymous_.28Ks.2C_Kn.29_data: Characterize the evolution of populations of genes.
- SPA tool: Syntenic Path Assembly to assist in genome analysis.
  • SynFind: Identify syntenic genes across multiple genomes.
  • CodeOn: Characterize patterns of codon and amino acid evolution in coding sequence.


FOLLOW THIS LINK FOR A QUICK OVERVIEW OF Plasmodia comparative genomics WITH COGE.


A brief introduction to Plasmodium genome evolution

The genus Plasmodium emerged ~40 million years ago and harbors roughly 200 species of parasitic protozoa better known as malaria parasites. All Plasmodium species have a complex life cycle involving some kind of vertebrate host and a mosquito vector. In addition, Plasmodium species share similar life cycle characteristics, albeit with a few exceptions (e.g. hypnozoites). Plasmodium genomes are tiny (between 17-28Mb) in comparison to those of their vertebrate (1Gb for birds; 2-3Gb for mammals) and mosquito (230–284Mbp) hosts [1]. All Plasmodium genomes consist of fourteen chromosomes (nuclear genome), as well as a mitochondrial and apicoplast genome. Despite these shared genomic characteristics, the structural organization, gene content, and sequence of Plasmodium genomes is highly variably within the genus [2]. The exact origins and mechanisms of these differences remain largely unexplored, however, they are generally hypothesized to stem from host shift events [3][4].

An increase in funding devoted to malaria research has coincided with a dramatic increase in publicly available genomic information for Plasmodium [5]. The most prominent repository is found at NCBI/Genbank [6]; while additional and unique sequences can also be found on other databases: PlasmoDB [7], GeneDB [8], and MalAvi [9]. This wealth of genomic data facilitates detailed comparative genomic approaches, opening the possibility to:

  • Infer origins of certain traits, specialized phenotypes, and genomic features.
  • Track the maintenance of conserved genes across the genus, as well as the gain or loss of genes unique to a single species or a group of closely related species.
  • Identify the potential historical interactions that might have lead to the development of genomic adaptations.


Finding and integrating Plasmodium genomes in CoGe

You can find the details of Plasmodium spp. genome integration in the following link: Finding and intregating Plasmodium genomes to CoGe


Comparative analyses workflows

The following links direct to specific tools for the comparative analysis of Plasmodium genomes:

Plasmodium analysis workflow 1: Tools that evaluate genomic properties and amino acid usage

Plasmodium analysis workflow 2: Tools for the syntenic analysis of whole genomes and microsyntenic regions

Plasmodium analysis workflow 3: Tools useful on the study of multigene families


Overall conclusions

Insights into the unique patterns of Plasmodium biology, epidemiology, ecology, and genetics can be obtained from molecular and comparative genomic studies. The rapid growth of genomic information makes implementing tools that facilitate assessing genome evolutionary trends an imperative task. The services and tools provided by the CoGe platform are of considerable use in advancing Plasmodium comparative genomics. Here, we showed how various CoGe tools could be used to assess evolutionary patterns unique to Plasmodium. We also showed how to use this platform to further characterize sequenced Plasmodium genomes. Overall, we have demonstrated that CoGe’s tools can be used to address evolutionary questions such as:

  • The evolutionary origins of Laveranian AT-rich genomes.
  • The location and nature of genome rearrangements between Plasmodium.
  • The evolutionary patterns of genes crucial in cell invasion.
  • The evolutionary trends of multigene families.


Useful links

Plasmodium Notebooks in CoGe

Link to Notebook for published Plasmodium genome data: https://genomevolution.org/coge/NotebookView.pl?lid=1753
Link to Notebook for published P. falciparum strains: https://genomevolution.org/coge/NotebookView.pl?lid=1758
Link to Notebook for published P. vivax strains: https://genomevolution.org/coge/NotebookView.pl?lid=1760
Link to Notebook for published Plasmodium apicoplast data: https://genomevolution.org/coge/NotebookView.pl?lid=1754
Link to Notebook for published Plasmodium mitochondrion data: https://genomevolution.org/coge/NotebookView.pl?lid=1756

Sample data

  • Gene sequences used on CoGeBLAST analysis (obtained from PlasmoDB):
PVX_113230.1 | Plasmodium vivax Sal-1 | variable surface protein Vir14-related (http://plasmodb.org/plasmo/app/record/gene/PVX_113230)
PVX_096004.1 | Plasmodium vivax Sal-1 | VIR protein (http://plasmodb.org/plasmo/app/record/gene/PVX_096004)
  • Gene sequence used on SynFind to inform GEvo analysis (obtained from PlasmoDB):
PVX_003830.1 | Plasmodium vivax Sal-1 | serine-repeat antigen 5 (SERA) (http://plasmodb.org/plasmo/app/record/gene/PVX_003830)
  • Gene sequences used on CoGeBLAST to inform GEvo analysis (obtained from PlasmoDB):
PF3D7_0424100.1 | Plasmodium falciparum 3D7 | reticulocyte binding protein homologue 5 (http://plasmodb.org/plasmo/app/record/gene/PF3D7_0424100)
PVX_096410.1 | Plasmodium vivax Sal-1 | cysteine repeat modular protein 2, putative (http://plasmodb.org/plasmo/app/record/gene/PVX_096410)


References

  1. DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/
  2. Carlton JM, Perkins SL, Deitsch KW. 2013. Malaria Parasites. Caister Academic Press
  3. Prugnolle F, Durand P, Ollomo B, Duval L, Ariey F, Arnathau C, Gonzalez JP, Leroy E, Renaud F. 2011. A Fresh Look at the Origin of Plasmodium falciparum, the Most Malignant Malaria Agent. PLoS Pathog. 7: e1001283. http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1001283
  4. Prugnolle F, Rougeron V, Becquart P, Berry A, Makanga B, Rahola N, Arnathau C, Ngoubangoye B, Menard S, Willaume E, Ayala FJ, Fontenille D, Ollomo B, Durand P, Paupy C, Renaud F. 2013. Diversity, host switching and evolution of Plasmodium vivax infecting African great apes. Proc Natl Acad Sci U S A. 110:8123-8. https://www.ncbi.nlm.nih.gov/pubmed/23637341
  5. Buscaglia CA, Kissinger JC, Agüero F. 2015. Neglected Tropical Diseases in the Post-Genomic Era. Trends Genet. 31:539-55. https://www.ncbi.nlm.nih.gov/pubmed/26450337
  6. Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2016. GenBank. Nucleic Acids Res. 44: D67–D72. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702903/
  7. Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ Jr, Treatman C, Wang H. 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37:D539-43. https://www.ncbi.nlm.nih.gov/pubmed/18957442
  8. Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, Carver T, Aslett M, Olsen C, Subramanian S, Phan I, Farris C, Mitra S, Ramasamy G, Wang H, Tivey A, Jackson A, Houston R, Parkhill J, Holden M, Harb OS, Brunk BP, Myler PJ, Roos D, Carrington M, Smith DF, Hertz-Fowler C, Berriman M. 2012. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40:D98-108. https://www.ncbi.nlm.nih.gov/pubmed/22116062
  9. Bensch S, Hellgren O, Pérez-Tris J. 2009. MalAvi: a public database of malaria parasites and related haemosporidian in avian hosts based on mitochondrial cytochrome b lineages. Mol Ecol Resour. 9:1353-8. https://www.ncbi.nlm.nih.gov/pubmed/21564906