Difference between revisions of "Using CoGe for the analysis of Plasmodium spp"

From CoGepedia
Jump to: navigation, search
(About this Guide)
 
(511 intermediate revisions by 5 users not shown)
Line 1: Line 1:
=='''About this Guide'''==
+
=='''About this guide'''==
Welcome to the Plasmodium spp. genome analysis with CoGe guide. This 'cookbook' style document is meant to provide an introduction to many of our tools and services, and is structured around a case study of investigating genome evolution of the malaria-causing Plasmodium spp. The small size and unique features of this pathogen's genome make it a great example for beginning to understand how our tools can be used to conduct comparative genomic analyses and uncover meaningful discoveries.  
+
This 'cookbook' style document is meant to provide an introduction to many of our tools and services and is structured around a case study of investigating genome evolution of the malaria-causing ''Plasmodium'' spp. The small size and unique features of this pathogen's genome make it ideal for beginning to understand how our tools can be used to conduct comparative genomic analyses and uncover meaningful discoveries.  
  
Through a number of guided examples, this guide will teach users how to use the following tools:
+
Through a number of example analyses, this guide will teach users about the following tools:
* LoadGenome
+
* '''[[LoadGenome]]''': Add a new genome to CoGe.
* GenomeInfo
+
* '''[[LoadAnnotation]]''': Add structural and/or functional annotations to a genome.
* GenomeList
+
* '''[[GenomeInfo]]''': Get information about a genome.
* CoGeBLAST
+
* '''[[GenomeList]]''': Get information about several genomes in a table.
* GEvo
+
* '''[[CoGeBLAST]]''': BLAST against any set of genomes.
* SynMap
+
* '''[[GEvo]]''': Microsynteny analysis.
* CodeOn
+
* '''[[SynMap]]''': Whole genome syntenic analysis.
 +
:- '''[[SynMap#Calculating_and_displaying_synonymous.2Fnon-synonymous_.28Ks.2C_Kn.29_data]]''': Characterize the evolution of populations of genes.
 +
:- '''[[SPA]]''' tool: Syntenic Path Assembly to assist in genome analysis.
 +
* '''[[SynFind]]''': Identify syntenic genes across multiple genomes.
 +
* '''[[CodeOn]]''': Characterize patterns of codon and amino acid evolution in coding sequence.
  
=='''A brief introduction to ''Plasmodium'' genome evolution'''==
 
  
 +
<span style="color:#006F00">'''FOLLOW THIS LINK FOR A QUICK OVERVIEW OF [[Plasmodia comparative genomics]] WITH COGE.'''</span>
  
The unique features of most parasitic genomes create unique challenges for their evolutionary study using comparative genomics. Parasites genomes are characterized by a mixture of genome reduction associated with gene loss (''e.g.'' homeobox genes), but also for the development of specialized genes. Many of the genes gained in parasitic genomes are involved in different aspects of host-parasite interaction and are, for the most part, species or lineage specific <ref>Jackson AP. 2015. Preface. The evolution of parasite genomes and the origins of parasitism. Parasitology. 142 Suppl 1:S1-5. https://www.ncbi.nlm.nih.gov/pubmed/25656359</ref>. The dynamism of parasitic genomes is evident within the phylum ''Apicomplexa'', and particularly, within the genus ''Plasmodium''. A marked loss of synteny between different ''Apicomplexa'' genera has been previously reported <ref>Carlton JM, Perkins SL, Deitsch KW. 2013. '''''Malaria Parasites'''''. Caister Academic Press</ref> with the arrangement of genes within species of a single genus being conserved to a larger degree. While this remains truth for many genera, the increasing number of sequenced ''Plasmodium'' genomes has shown that numerous clade and species-specific gain/loss events and chromosome rearrangements have occurred <ref>Tachibana SI, Sullivan SA, Kawai S, Nakamura S, Kim HR, Goto N, Arisue N, Palacpac NM, Honma H, Yagi M, Tougan T, Katakai Y, Kaneko O, Mita T, Kita K, Yasutomi Y, Sutton PL, Shakhbatyan R, Horii T, Yasunaga T,  Barnwell JB, Escalante AA, Carlton JM, Tanabe K. 2012. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat Genet. 44: 1051–1055. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759362/</ref>. The origins and mechanisms for this level of rearrangement still remain to be fully explored, but are likely to be related to the different host shift events <ref>Prugnolle F, Durand P, Ollomo B, Duval L, Ariey F, Arnathau C, Gonzalez JP, Leroy E, Renaud F. 2011. A Fresh Look at the Origin of Plasmodium falciparum, the Most Malignant Malaria Agent. PLoS Pathog. 7: e1001283. http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1001283</ref><ref>Prugnolle F, Rougeron V, Becquart P, Berry A, Makanga B, Rahola N, Arnathau C, Ngoubangoye B, Menard S, Willaume E, Ayala FJ, Fontenille D, Ollomo B, Durand P, Paupy C, Renaud F. 2013. Diversity, host switching and evolution of Plasmodium vivax infecting African great apes. Proc Natl Acad Sci U S A. 110:8123-8. https://www.ncbi.nlm.nih.gov/pubmed/23637341</ref>, and the diverse types of host-parasite interactions that prevail the evolutionary history of the genus.
 
  
Despite the enormous diversity of ''Plasmodium'' parasites, it remains truth so far that they all share certain characteristics. Fourteen chromosomes, a mitochondrial, and an apicoplast compose the entire repertoire of the ''Plasmodium'' genome in all sequenced species so far described. As in the case of other parasites, ''Plasmodium'' genomes are relatively small (between 17-28Mb approximately) in comparison to those of the hosts, but larger than those of other ''Apicomplexan'' parasites (''Theileria orientalis'' and ''Cryptosporidium parvum'' have genomes of approximately 9Mb) <ref>DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/</ref>. Moreover, a potential increment in the number of chromosomes within the genus ''Plasmodium'' without compromising genome the size can also be observed (''e.g.'' 4 chromosomes and 13Mb approximately in ''Babesia bovis'' vs. 14 chromosomes and 18Mb approximately in the smallest ''Plasmodium'' genome). In addition, all ''Plasmodium'' species have a complex life cycle involving some kind of vertebrate host and a mosquito vector of the genus ''Anopheles''. Thought specificities and preferences during the infection process are prevalent within the genus <ref>Sinka ME, Bangs MJ, Manguin S, Rubio-Palis Y, Chareonviriyaphap T, Coetzee M, Mbogo CM, Hemingway J, Patil AP, Temperley WH, Gething PW, Kabaria CW, Burkot TR, Harbach RE, Hay SI. 2012. A global map of dominant malaria vectors. Parasit Vectors. 5:69. https://www.ncbi.nlm.nih.gov/pubmed/22475528</ref>, the overall preservation of the life cycle characteristics indicate the existence of a set of preserved core genes. While these core genes are also affected by events leading to loss of synteny and can experience species-specific substitution rates, they represent a pivotal elements for the use of comparative genomics on the study of ''Plasmodium'' evolution.
+
=='''A brief introduction to ''Plasmodium'' genome evolution'''==
  
The increase in funding devoted to malaria research during recent years has come hand in hand with the augmented understanding of ''Plasmodium'' genetics <ref>Buscaglia CA, Kissinger JC, Agüero F. 2015. Neglected Tropical Diseases in the Post-Genomic Era. Trends Genet. 31:539-55. https://www.ncbi.nlm.nih.gov/pubmed/26450337</ref>. At the moment, there is an unprecedented amount of ''Plasmodium'' genomes and gene sequences publicly available in diverse databases. The most prominent repository is found in NCBI/Genbank <ref>Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2016. GenBank. Nucleic Acids Res. 44: D67–D72. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702903/</ref>; while additional and unique sequences can also be found on other databases:  [http://plasmodb.org/plasmo/ PlasmoDB], [http://www.genedb.org/Homepage GeneDB] and [http://mbio-serv2.mbioekol.lu.se/Malavi/ MalAvi] <ref>Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ Jr, Treatman C, Wang H. 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37:D539-43. https://www.ncbi.nlm.nih.gov/pubmed/18957442</ref><ref>Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, Carver T, Aslett M, Olsen C, Subramanian S, Phan I, Farris C, Mitra S, Ramasamy G, Wang H, Tivey A, Jackson A, Houston R, Parkhill J, Holden M, Harb OS, Brunk BP, Myler PJ, Roos D, Carrington M, Smith DF, Hertz-Fowler C, Berriman M. 2012. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40:D98-108. https://www.ncbi.nlm.nih.gov/pubmed/22116062</ref><ref>Bensch S, Hellgren O, Pérez-Tris J. 2009. MalAvi: a public database of malaria parasites and related haemosporidians in avian hosts based on mitochondrial cytochrome b lineages. Mol Ecol Resour. 9:1353-8. https://www.ncbi.nlm.nih.gov/pubmed/21564906</ref>. The increment of available ''Plasmodium'' sequences and genomes opens the possibility to: identify the likely origin of certain traits, specialized phenotypes, and genomic landscapes; track the maintenance of conserved genes across the genus, as well as the rise and loss of genes unique to only a single or a group of closely related species; and infer the potential historical interactions which might have lead to the development of adaptations as well as their putative consequences.
+
The genus ''Plasmodium'' emerged ~40 million years ago and harbors roughly 200 species of parasitic protozoa better known as malaria parasites. All ''Plasmodium'' species have a complex life cycle involving some kind of vertebrate host and a mosquito vector. In addition, ''Plasmodium'' species share similar life cycle characteristics, albeit with a few exceptions (''e.g.'' hypnozoites).  ''Plasmodium'' genomes are tiny (between 17-28Mb) in comparison to those of their vertebrate (1Gb for birds; 2-3Gb for mammals) and mosquito (230–284Mbp) hosts <ref>DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/</ref>. All ''Plasmodium'' genomes consist of fourteen chromosomes (nuclear genome), as well as a mitochondrial and apicoplast genome. Despite these shared genomic characteristics, the structural organization, gene content, and sequence of ''Plasmodium'' genomes is highly variably within the genus <ref>Carlton JM, Perkins SL, Deitsch KW. 2013. '''''Malaria Parasites'''''. Caister Academic Press</ref>. The exact origins and mechanisms of these differences remain largely unexplored, however, they are generally hypothesized to stem from host shift events <ref>Prugnolle F, Durand P, Ollomo B, Duval L, Ariey F, Arnathau C, Gonzalez JP, Leroy E, Renaud F. 2011. A Fresh Look at the Origin of Plasmodium falciparum, the Most Malignant Malaria Agent. PLoS Pathog. 7: e1001283. http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1001283</ref><ref>Prugnolle F, Rougeron V, Becquart P, Berry A, Makanga B, Rahola N, Arnathau C, Ngoubangoye B, Menard S, Willaume E, Ayala FJ, Fontenille D, Ollomo B, Durand P, Paupy C, Renaud F. 2013. Diversity, host switching and evolution of Plasmodium vivax infecting African great apes. Proc Natl Acad Sci U S A. 110:8123-8. https://www.ncbi.nlm.nih.gov/pubmed/23637341</ref>.
  
Specifically, one of the many remarkable trends of ''Plasmodium'' genome evolution is the rapid change in GC content. Particularly, ''P. falciparum'' and closely related parasites have a remarkably AT rich genome compared to other ''Plasmodium'' species <ref>Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511</ref>. While significant shifts in GC content have been reported in both ''Bacteria'' <ref>Wu H, Zhang Z, Hu S, Yucorresponding S. 2012. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct. 2012; 7: 2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3274465/</ref><ref>Lassalle F, Périan S, Bataillon T, Nesme X, Duret L, Daubin V. 2015. GC-Content Evolution in Bacterial Genomes: The Biased Gene Conversion Hypothesis Expands. PLoS Genet. 11: e1004941. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4450053/</ref> and monocots <ref>Šmarda P, Bureš P, Horová L, Leitch IJ, Mucina L, Pacini E, Tichý L, Grulich V, Rotreklováa O. 2014. Ecological and evolutionary significance of genomic GC content diversity in monocots. Proc Natl Acad Sci U S A. 111: E4096–E4102. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4191780/</ref>, the short evolutionary time during which this change has occurred in ''Plasmodium'' is noteworthy. Moreover, the GC content variability observed amongst ''Plasmodium'' species has not yet been observed in other ''Apicomplexan'' genera. AT rich genomes not only present their particular challenges for sequencing <ref>Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511</ref>, but they also have entirely different trends of codon and amino acid usage. Furthermore, patterns of genome mutability and in the evolution of repetitive elements can also be markedly different in AT rich genomes. By implementation novel and nontraditional analysis tools for comparative genomics it is possible to assess the evolutionary origins and trace patterns of GC content shift across the ''Plasmodium'' genus.
+
An increase in funding devoted to malaria research has coincided with a dramatic increase in publicly available genomic information for ''Plasmodium'' <ref>Buscaglia CA, Kissinger JC, Agüero F. 2015. Neglected Tropical Diseases in the Post-Genomic Era. Trends Genet. 31:539-55. https://www.ncbi.nlm.nih.gov/pubmed/26450337</ref>. The most prominent repository is found at NCBI/Genbank <ref>Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2016. GenBank. Nucleic Acids Res. 44: D67–D72. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702903/</ref>; while additional and unique sequences can also be found on other databases:  [http://plasmodb.org/plasmo/ PlasmoDB] <ref>Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ Jr, Treatman C, Wang H. 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37:D539-43. https://www.ncbi.nlm.nih.gov/pubmed/18957442</ref>, [http://www.genedb.org/Homepage GeneDB] <ref>Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, Carver T, Aslett M, Olsen C, Subramanian S, Phan I, Farris C, Mitra S, Ramasamy G, Wang H, Tivey A, Jackson A, Houston R, Parkhill J, Holden M, Harb OS, Brunk BP, Myler PJ, Roos D, Carrington M, Smith DF, Hertz-Fowler C, Berriman M. 2012. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40:D98-108. https://www.ncbi.nlm.nih.gov/pubmed/22116062</ref>, and [http://mbio-serv2.mbioekol.lu.se/Malavi/ MalAvi] <ref>Bensch S, Hellgren O, Pérez-Tris J. 2009. MalAvi: a public database of malaria parasites and related haemosporidian in avian hosts based on mitochondrial cytochrome b lineages. Mol Ecol Resour. 9:1353-8. https://www.ncbi.nlm.nih.gov/pubmed/21564906</ref>. This wealth of genomic data facilitates detailed comparative genomic approaches, opening the possibility to:
 +
* Infer origins of certain traits, specialized phenotypes, and genomic features.
 +
* Track the maintenance of conserved genes across the genus, as well as the gain or loss of genes unique to a single species or a group of closely related species.
 +
* Identify the potential historical interactions that might have lead to the development of genomic adaptations.
  
Another important aspect in ''Plasmodium'' evolution is the unique patterns of genome variability and the diverse responses to numerous selective pressures observed in different ''Plasmodium'' genomes. In this regard, comparative analyses performed between ''Plasmodium'' species and strains can elucidate the key elements behind these differences (e.g. different hosts pressures or an earlier species split), as well as to identify genomic regions and elements where this type of change is more prominent. But perhaps more significantly in ''Plasmodium'' evolution, and in that of parasites in general <ref>Jackson AP. 2015. Preface. The evolution of parasite genomes and the origins of parasitism. Parasitology. 142 Suppl 1:S1-5. https://www.ncbi.nlm.nih.gov/pubmed/25656359</ref>, might be the origin and evolution of multigene families. Within the ''Plasmodium'' genome, numerous multigene families show specific tracks of gene gain/loss events, and can be associated to variable syntenic changes. Moreover, the differences in the ancestry of these families is also noteworthy, with many of them being observed only in a single ''Plasmodium'' species or those which are closely related, and others being observed across the entire genus but not in other ''Apicomplexa'' parasites <ref>DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/</ref>. In this sense, each multigene family can illustrate a different aspect of the evolutionary history of the genus.
 
  
In the following paper, we will demonstrate how to use the CoGe platform to analyze ''Plasmodium'' genomes and evaluate diverse evolutionary hypotheses. The CoGe platform can be used to perform numerous comparative and evolutionary analyses across two or more genomes, while being informed on the nature of ortholog genes and their position on the genome. Therefore, it provides an additional layer of complexity to any analysis performed. In the following pages, we will illustrate how CoGe can be used for the analysis of whole genomes, as well as a tool for the early assembly of sequences, the analysis of genome composition and tracking of rearrangement events; and finally, the study of multigene families.
+
== '''Finding and integrating Plasmodium genomes in CoGe ''' ==
  
 +
You can find the details of ''Plasmodium'' spp. genome integration in the following link: [[Finding and intregating Plasmodium genomes to CoGe]]
  
== '''Finding and importing data into CoGe''' ==
 
  
The analysis of ''Plasmodium'' parasites using comparative genomics can be a challenging task due to the previously mentioned particularities of their genomes. Considering that an increasing number of ''Plasmodium'' genomes have become available in recent years, and that the genomic information for the genus is likely to increase in the near future, it is fundamental to search new alternatives for the incorporation, analysis, and visualization of ''Plasmodium'' genomic data. Particularly, tools which allow the rapid analysis of numerous sequences at various levels, and permit the identification of potentially relevant patterns to which novel analyses can be focused, are currently of high relevance for ''Plasmodium'' research.  Additionally, the use of online platforms where complex genomic data can be incorporated and analyzed facilitate the start and continuation collaborative initiatives. In particular, these platforms allows for the analysis of data regardless on differences between operative system, geographic location, or even access to high performance equipment, an aspect of large significance in a genus like ''Plasmodium'' which in the case of humans causes diseases associated to developing tropical countries where access to some equipments and software can be reduced.
+
=='''Comparative analyses workflows'''==
  
The initial step in the analysis of sequences using CoGe is the import of new sequences to the  platform.
+
The following links direct to specific tools for the comparative analysis of ''Plasmodium'' genomes:
  
 +
[[Plasmodium analysis workflow 1: Tools that evaluate genomic properties and amino acid usage]]
  
=== ''Finding about the Plasmodium genomes already present in CoGe'' ===
+
[[Plasmodium analysis workflow 2: Tools for the syntenic analysis of whole genomes and microsyntenic regions]]
  
While the amount of ''Plasmodium'' genomic data has risen during the pass years, important advances in ''Plasmodium'' genomics have been occurring since the publication of the ''P. falciparum'' genome <ref>Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419:498-511</ref>. Thus, there is a prominent amount of historical which can also be used for analysis, and depending of the hypotheses of interest, might be more relevant that later versions of the same data. As a result, there are a number of ''Plasmodium'' genomes under different development versions already imported into CoGe.
+
[[Plasmodium analysis workflow 3: Tools useful on the study of multigene families]]
  
Before importing any genome into the CoGe database, and in order to prevent potential redundancy of genomic information, it is recommended to identify the ''Plasmodium'' genomic data already available. You can identify these genomes by:
 
  
 +
=='''Overall conclusions'''==
  
'''A.''' Typing the word in "plasmod" into the ''Search'' bar at the top of most pages. This will retrieve all organisms and genomes with names matching the search term. 
+
Insights into the unique patterns of ''Plasmodium'' biology, epidemiology, ecology, and genetics can be obtained from molecular and comparative genomic studies. The rapid growth of genomic information makes implementing tools that facilitate assessing genome evolutionary trends an imperative task. The services and tools provided by the CoGe platform are of considerable use in advancing ''Plasmodium'' comparative genomics. Here, we showed how various CoGe tools could be used to assess evolutionary patterns unique to ''Plasmodium''. We also showed how to use this platform to further characterize sequenced ''Plasmodium'' genomes. Overall, we have demonstrated that CoGe’s tools can be used to address evolutionary questions such as:  
 
+
*The evolutionary origins of ''Laveranian'' AT-rich genomes.  
[[File:Screen Shot 2016-09-29 at 1.43.09 PM.png|600px]]
+
*The location and nature of genome rearrangements between ''Plasmodium''.
 
+
*The evolutionary patterns of genes crucial in cell invasion.
 
+
*The evolutionary trends of multigene families.
'''B.''' For a more detailed description regarding the presentation and acquisition of the genomic information available in CoGe, follow these steps:
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/
+
 
+
'''2.''' Create an account / login into CoGe. See the [[How to get a CoGe account]] section on this wiki for more information
+
 
+
'''3.''' On the main CoGe page, find the '''Tools''' tile and click on to '''Organism View'''. This site can also be found by following this link: https://genomevolution.org/coge/OrganismView.pl
+
 
+
'''4.''' All publicly available genomes uploaded into CoGe and any corresponding information attached to them can be found in the '''Organism View''' section. You can find any published genome by typing a scientific name into the ''Search'' box. For each organism uploaded to CoGe you will find the following information:
+
 
+
:'''Organisms''': In the case of ''Plasmodium'' spp., the different parasitic strains currently uploaded. Any organelle genomes independently uploaded (mitochondrial and apicoplast) can also be found in this section.
+
:'''Organism Information''': provides an outline of the organisms’ taxonomy (following that published on NCBI/Genbank). This section also includes quick links to some of the main CoGe analysis tools and additional search engines. 
+
 
+
:'''Genomes''': All the genome versions for the species of interest. Note that by selecting different genome versions, all other genomic information associated to that species is modifies on site. This section allows you to access to previous versions of a published genome (''e.g.'' access scaffolds from a previous genome version currently under the chromosome assemble level).
+
:'''Genome information''': Shows the genome IDs, type of sequences uploaded and the length of these sequences. In this tab you will also be able to directly perform analyses using the CoGe platform.
+
 
+
:'''Datasets''': This section shows the number of datasets included for the specified genome. In the case of completely sequenced ''Plasmodium'' genomes obtained from NCBI/GenBank, it will indicate the accession numbers for each individual chromosome. 
+
:'''Dataset information''': Provides specific information for each individually selected dataset including accession numbers (if available), source of the upload, chromosome length, and GC%.
+
 
+
:'''Chromosomes''': Shows the number of available chromosome for the selected genome. However, depending of the method used to import the data into CoGe and the nature of the dataset itself, the count and length of chromosomes shown will be larger than expected (''e.g.'' number of contigs ''in lieu'' of the number of chromosomes).
+
:'''Chromosome information''': Shows the chromosome ID and the number of base pairs (bp) for that chromosome.
+
 
+
'''5.''' By clicking on the '''Genome Info''' section within the '''Genome Information''' section provides a more detailed description of the genome of interest and allows access to quick links to most comparative analysis tools available on CoGe.
+
 
+
 
+
<span style="color:green">'''Keep in mind that only publicly available genomes imported to CoGe can have a Public or Restricted display. Genomes made public can be seen and analyzed by anyone using the CoGe platform. On the other hand, Restricted genomes can only be seen/analyzed by the user or those with whom the information has been shared with: [[Sharing_data]]'''</span>
+
 
+
 
+
=== ''Importing Plasmodium genomes into CoGe'' ===
+
 
+
 
+
While data can be uploaded into CoGe using a variety of methods, we will focus on two of the most likely to be used in the incorporation of ''Plasmodium'' genomes, due to their intuitive use for most databases. For additional information, please check the following link: [[How_to_load_genomes_into_CoGe]]
+
 
+
====''Importing genomes using the "Upload" method''====
+
 
+
:Depending on your interests and hypotheses, it might be necessary to perform analyses using complete ''Plasmodium'' genomes or to focus only in specific parasitic organelles and chromosomes. If you desire to import a complete ''Plasmodium'' genome (including all organelles and chromosomes), make sure to follow these steps:
+
 
+
[[File:PVXgenomeNCBI.png|thumb|200px|Screen capture of ''P. vivax'' genome's webpage on NCBI]]
+
 
+
:'''1.''' Go to the genome database on NCBI/GenBank and type "Plasmodium" on the search box. You can select any genome of interest, but in this example we will use that of ''P. vivax'' (Salvador I strain). 
+
 
+
:'''2.''' Find the '''Representative Genome''' section in the upper section of your screen. Below you will find the ''Download Sequences in FASTA'' format and ''Download Genome Annotation'' sections. 
+
::- To download a complete ''P. vivax'' genome, click on '''Genome''' under ''Download Sequences in FASTA''
+
::- To download a complete annotation for the ''P. vivax'' genome, click on '''GFF''' under ''Download Genome Annotation''
+
 
+
:'''3.''' Both files will be downloaded to your chosen folder into your local computer.
+
 
+
[[File:MyDatasectiononCoGe.png|thumb|200px|'''Step 7''': Screen capture of researcher's CoGe MyData tab]]
+
 
+
:'''4.''' Go to CoGe or follow this link: https://genomevolution.org/coge/
+
 
+
:'''5.''' Login into CoGe.
+
 
+
:'''6.''' Click on the '''MyData''' section on the upper left part of the screen. This will lead to the ''Data'' section of your personal CoGe page. This section will fill up as genomes of interest are uploaded into CoGe.
+
 
+
:'''7.''' On the upper left section of the screen, click the '''NEW''' button and select ''New Genome'' from the dropdown menu.
+
 
+
[[File:NamingnewstraininCOGE.png|thumb|200px|'''Step 8''': Screen capture of Create New Organism window at CoGe. Notice the different name of the selected strain and the one written under "'''Name'''"]]
+
 
+
:'''8.''' Once on the '''Create a New Genome''' window has appeared, information about the organisms' taxonomy and genome's origin must be entered. Keep in mind that depending on the type of organism being uploaded, taxonomic information might not have been incorporated into CoGe just yet (e.g. a private species of strain). If this is the case, make sure to create a new organism by following these steps:
+
 
+
::a. Click on '''NEW''' on the "'''Organism:'''" section
+
::b. On the Search NCBI box type the scientific name of the organism to be uploaded. If the organism of interest is not on NCBI yet, select its closest taxonomic relative. In the case of ''Plasmodium'', several strains might be available for a given species (particularly ''P. vivax'' and ''P. falciparum''), make sure to select the correct strain or, if a new strain is being uploaded, to add the new strain's name.
+
::c. Click '''Create'''
+
 
+
:'''9.''' After successfully creating a new strain/genome, is time to include any additional information that might be needed in the future as well. Depending on the number of versions for the selected genome already available at CoGe, a different number will be typed on '''Version'''. Thus, it is important to check the latest genome version available on CoGe before importing a new version of the same genome (''e.g. P. falciparum'' currently has '''5''' versions, so any new version incorporated should be number '''6'''). Under the '''Type''' section, select the adequate sequence type from the drop down menu (most sequences can be identified as unmasked,  [[Masked]]). Select the '''Source''' in the next dropdown menu (in this case the source is NCBI, but other databases as well as ''Private sources'' are also available). Finally, tick the check box if you desire your genome to be '''Restricted'''. Remember that:
+
:- Restricted genomes can only be seen and analyzed by the user and those with whom the genome has been shared.
+
:- Unrestricted genomes are available to anybody using CoGe.
+
 
+
:'''10.''' Click '''Next'''
+
 
+
:'''11.''' This new window allows you to import genome files by using four different strategies: first, data can be imported directly from the '''Cyverse Data Store''' (if the data is not already on the ''Data Store'' it can be easily imported from CoGe afterwards); second, creating an '''HTP/FTTP''' link directly to the data; third, '''Upload''' the data from a private computer, and fourth, importing the data using '''GenBank''' accession numbers. We will be continuing this example by using the '''Upload''' option.
+
 
+
:'''12.''' Select the downloaded genome file and wait for it to be read by CoGe, once the process is completed select '''Next'''. Note that you should select your FASTA, FST or FAA file and not the GFF file (genome annotation).
+
 
+
:'''13.''' Click '''Start''' on the next screen to begin the upload.
+
 
+
:'''14.''' Once the file upload has concluded all information included by the user, as well as any specifics regarding the genome FASTA file itself, will be visible in the '''Genome Information''' page. Note that genomes in earlier stages of assembly (''e.g''. Scaffolds) can be easily uploaded into CoGe by this method.
+
 
+
[[File:Completeuplatedgenomeandannotation.png|thumb|200px|'''Step 16''': Complete genome and annotation upload into CoGe]]
+
 
+
:'''15.''' At this point, genome annotation files can be also uploaded into CoGe for that same genome. These files can be included by clicking on the green '''Load Sequence Annotation''' button under the '''Sequence & Gene Annotation''' menu. Note that some limited analyses can be performed in CoGe even when genome annotation data is not yet available. Also, any specific upload can be updated at any point in time in CoGe. Thus, genome annotation data, metadata or experimental data can be included for a genome already imported into CoGe as soon as they become available.
+
 
+
:'''16.''' The process to importing annotations is similar to that of importing genomes. Under the '''Describe your annotation''' page, select the version and source of the annotation data and click '''Next'''. As previously described, the data can be uploaded directly from the '''Cyverse Data Store''', by creating an '''HTP/FTTP''' link, or by using the '''Upload''' option. Note that both GFF and GTF files can be used for uploading genome annotation data from a private computer. Click '''Next''' and the annotation data associated to the genome will be imported onto CoGe.  This information should now be visible on the '''Genome Information''' page under the '''Sequence & Gene Annotation''' menu. For more details about uploading genome annotations follow this link: [[LoadAnnotation]]
+
 
+
 
+
[[File:NCBIPchabaudichromosomes.png|thumb|200px|'''Step 1''': Screen capture of NCBI chromosome section under the ''P. chabaudi'' genome tab on NCBI]]
+
 
+
====''Importing genomes using the "NCBI/Genebank" method''====
+
 
+
:You can also specifically upload certain chromosomes and organelles into CoGe. The following steps show how to import chromosomes one by one onto CoGe from NCBI/GenBank:
+
 
+
:'''1.''' Go to the genome database on NCBI/GenBank and type "Plasmodium" on the search box. You can select any genome of interest, but in this example we will use that of ''P. chabaudi'' (AS strain). 
+
 
+
:'''2.''' Find the '''Reference Genome''' section in the lower portion of your screen. Here you will find the '''RefSeq''' and '''INSDC''' numbers for each chromosome and, if available, organelles.
+
 
+
:'''3.''' Follow steps 4 through 10 from the previous section.
+
 
+
[[File:UploadusingGenbank.png|thumb|200px|'''Step 4''': Screen capture of genome upload to CoGe using GenBank ID numbers]]
+
 
+
:'''4.''' Select the '''GenBank''' accession numbers option. Type or Copy/Paste the '''INSDC''' numbers for each ''Plasmodium'' chromosome (or for specific ''Plasmodium'' organelles) and click the '''Get''' button after each time. Information from each imported genome should appear under '''Selected file(s)'''. Once all genomes have been imported (14 chromosomes in the case of ''Plasmodium''), click on the '''Next''' button.
+
 
+
:'''5.''' After the genome has been imported, all information included by the user, as well as any specifics regarding the genome FASTA file itself will be visible in the '''Genome Information''' page. Note that uploading chromosomes/genomes using this method also imports any information of genome annotation already included in NCBI/GenBank. Also, note that genomes uploaded using this method will be unrestricted, and thus, are visible by all CoGe users.
+
 
+
 
+
====''Exporting genomes from CoGe to Cyverse''====
+
 
+
:Data can be exported into Cyverse for easy sharing and storage after it has been imported onto CoGe. While this is not needed to use CoGe or perform any analyses, it is a highly recommended step for complete and '''Certified''' genomes (those which represent the latest and most complete version of a given species' genome up to date). You can use CoGe to export data into the ''CyVerse Data Store'' by following these steps:
+
 
+
:'''1.''' While logged into CoGe, go to the '''Genome Information''' page on your genome of interest.
+
 
+
:'''2.''' Under the '''Tools''' menu, find the ''Export to CyVerse Data Store'' option. Click either on the FASTA or the GFF file options to upload genomic data and its annotation, respectively. Make sure to specify a name for the GFF file before performing the export.
+
 
+
:'''3.''' Wait until the export is completed. From this point forward, your FASTA and GFF files data will be also found in the ''CyVerse Data Store''. Note that no modification can be performed to the uploaded genomes, so it is recommended to keep a list of the uploaded genome codes that is provided by CyVerse and their associated organism or strain.
+
 
+
 
+
== '''Using CoGe tools to perform comparative analyses''' ==
+
 
+
=== ''Analyzing GC content and other genomic properties (GenomeList)'' ===
+
 
+
[[File:Genomelistnew.png|thumb|200px|'''Step 5''': Upload of 12 ''Plasmodium'' genomes to '''Genome List''']]
+
 
+
Initial comparative genomic studies pointed out to significant variations on GC content between ''Plasmodium'' species and even within a single genome. The average GC content of ''P. vivax'' and ''P. falciparum'', two of the mayor causal agents of human malarias, is 42.3% and 19.4%, respectively. Changes in GC content have also been reported in different regions of ''P. vivax'' chromosomes, with subtelomeric regions being largely GC poor; alternatively AT rich regions are widespread in the ''P. falciparum'' genome <ref>Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, Cheng Q, Coulson RM, Crabb BS, Del Portillo HA, Essien K, Feldblyum TV, Fernandez-Becerra C, Gilson PR, Gueye AH, Guo X, Kang'a S, Kooij TW, Korsinczky M, Meyer EV, Nene V, Paulsen I, White O, Ralph SA, Ren Q, Sargeant TJ, Salzberg SL, Stoeckert CJ, Sullivan SA, Yamamoto MM, Hoffman SL, Wortman JR, Gardner MJ, Galinski MR, Barnwell JW, Fraser-Liggett CM. 2008. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 455:757-63. https://www.ncbi.nlm.nih.gov/pubmed/18843361</ref>. The evolutionary history of the GC change in ''Plasmodium'' has been a topic of interest since the sequencing of the first human malarias. Taking into consideration the proposed order of speciation events within the genus, it has been proposed that the genome of the ''Plasmodium'' common ancestor might had been AT rich, a trait which has been maintained in ''P. falciparum''. Therefore, the higher GC content reported in ''P. vivax'' and closely related species might actually be a derived trait <ref>Nikbakht H, Xia X, Hickey DA. 2014. The evolution of genomic GC content undergoes a rapid reversal within the genus Plasmodium. Genome. 9:507-511. https://www.ncbi.nlm.nih.gov/pubmed/25633864</ref>. Alternative, the AT richness of the ''P. falciparum'' genome might also be a trait of the common ancestor of the ''Laveranian'' subgenus alone. In order to confidently address these hypotheses, a larger number of ''Plasmodium'' species, preferentially those belonging to clades ancestral to the split of the ''Laveranian'' subgenus, should be evaluated. Regrettably no fully sequenced genomes for such ''Plasmodium'' species are currently available; nonetheless, a more complete perspective of GC content evolution can be obtained thanks to the increasing number of sequenced ''Plasmodium'' genomes.
+
 
+
It is possible to calculate the GC content for each ''Plasmodium'' genome in CoGe by using the '''GenomeInfo''' tool found on '''Genome Information'''. GC content will be displayed for genomes imported from GenBank; however, genomes uploaded from private computers or in earlier stages of assembly will not have the GC content information on display. This information can be calculated on the '''Genome Information''' page itself and for each specific genome/chromosome, by clicking on %GC on the ''Length'' and/or ''Noncoding sequence'' sections under the '''Statistics''' tab. 
+
 
+
However, a better tool to compare GC content and other genomic features across several species/strains imported into CoGe is '''GenomeList'''. This tool creates a list of desired genomes and calculates various features for each one of them. A sample of the features which can be comparative evaluated using '''GenomeList''' includes: amino acid usage, codon usage, CDS GC content, and genome features such as number of genes, introns, etc. In addition, this list also summarizes genome information included by the user: sequence type, sequence origin, taxonomy, provenance, version uploaded to CoGe, etc. 
+
 
+
 
+
The following steps indicate how to perform comparative analyses using the '''GenomeList''' tool in CoGe:
+
 
+
[[File:Genomelistnewresults.png|thumb|200px|'''Step 7''': '''Genome List''' used to compare 12 ''Plasmodium'' species. Note that some columns are not on display. Link to this analysis: https://genomevolution.org/r/lys1]]
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe
+
 
+
'''2.''' In the main CoGe page  find the '''Tools''' tile and click on '''Organism View'''.  You can also follow this link: https://genomevolution.org/coge/OrganismView.pl
+
 
+
'''3.''' Type the scientific name of the organism of interest on the ''Search'' box and select the desired version of the uploaded genome.
+
 
+
'''4.''' Find the '''Genome Information''' tile on the right side of the screen. Under '''Tools''' find and click on '''Add to GenomeList'''. This will automatically generate a new window indicating that the selected genome has been added to a list.
+
 
+
'''5.''' Without closing this window, type the scientific name of other organisms of interest on the ''Search''. Once you have selected your second genome click on '''Add to GenomeList'''. The second selected organism should appear on the same window alongside the first. You can add as many genomes as desired by using this method.
+
 
+
'''6.''' Once you have included all your genomes of interest click on the green '''Send to Genome list''' button.
+
 
+
'''7.''' A new window should appear after a couple of seconds containing all your selected genomes in a table. Different features and information can be calculated and compared here including information related to the uploaded genome. Most importantly, each genome has quick links that allow you to perform certain calculations (amino acid composition, %AT, etc.). Keep in mind that it is possible to perform specific analyses on certain genomes, but you can also perform the same analysis for all genomes on the '''GenomeList''' by clicking on the green '''Get All''' section below each column's tittle. Depending on the number and quality of the included genomes, this calculation might take a couple of minutes. Note that by clicking on the '''Change Viewable Columns''' green button on the upper right part of the screen it is possible to select the displayed columns on the screen.
+
 
+
'''8.''' It is possible to download information from the selected genomes under a variety of formats using "''Send Selected Genomes to''". Note that the information downloaded will correspond to the genomes themselves and not to the calculations and analyses performed on '''GenomeList'''.
+
 
+
 
+
We have used '''GenomeList''' to compare 12 ''Plasmodium'' species with largely complete genomes. Results show that species closely related to ''P. falciparum'' share equally AT rich genomes. Moreover, GC content appear to gradually increase both on more recently divergent clades (rodent and simian), as well as within species from the simian clade. ''P. vivax'', ''P. cynomolgi'' and ''P. knowlesi'' show the highest %GC out of all analyzed species with ''P. vivax'' surpassing these species by at least 6%. These results are in agreement with previous suggestions that GC content is currently undergoing a reversal on recently diverging ''Plasmodium'' species. It has been proposed that the increment of GC content in ''P. vivax'', while maintaining GC poor subtelometic regions might be indicative of an efficient genome organization <ref>Das A, Sharma M, Gupta B, Dash AP. 2009. Plasmodium falciparum and Plasmodium vivax: so similar, yet very different. Parasitol Res. 105:1169-71. https://www.ncbi.nlm.nih.gov/pubmed/19543915</ref>. Interestingly, GC content was shown to be markedly low on ''P. malariae'' (another species causing human malaria) compared to other species of the simian clade, suggesting that this genome might have similar GC content organization to that seen on species from the ''Laveranian'' subgenus. It should also be noted that none of the mayor human malarias showed identical GC content, and thus are likely to showcase different GC content organization. Therefore, while GC content does have an important role on the development and maintenance of variability in these genomes, particularly on regard to antigenic variation <ref>Bull PC, Buckee CO, Kyes S, Kortok MM, Thathy V, Guyah B, Stoute JA, Newbold CI, Marsh K. 2008. Plasmodium falciparum antigenic variation. Mapping mosaic var gene sequences onto a network of shared, highly polymorphic sequence blocks. Mol Microbiol. 68:1519-34. https://www.ncbi.nlm.nih.gov/pubmed/18433451?dopt=Abstract</ref>, it does not seem to be associated to the infection of specific hosts types. Moreover, these patterns are indicative that the mayor four human malarias might follow different evolutionary strategies more strongly related to their phylogenetic relations than to their host.
+
 
+
 
+
===''Identifying gene homologs (CoGeBlast)''===
+
 
+
[[File:Input.png|thumb|200px|Screen capture of '''CoGeBlast''' input window. Genomes of interest and the query sequence are shown]]
+
 
+
Broadly speaking, genes belonging to the ''Plasmodium'' core genome and those unique to certain clades or species showcase different elements of the ''Plasmodium'' evolutionary panorama. There is no question then, that a significant step on the study of a genome, or even a group of genes, is the correct identification of homologous sequences. In this regard, the identification of multigene family members poses a particular challenge in the study of ''Plasmodium'' evolution. Multigene families are formed by two types of genes: orthologs (homolog genes related by speciation events), and paralogs (homolog genes related by duplication events). Within the genus ''Plasmodium'', multigene families perform a wide variety of functions, showcase unique evolutionary patterns, and present diverse genomic arrangements. While many families members are arranged in tandem and can be easily associated with regions of microsynteny loss, others show far more complex patterns. A particular challenge is presented by subtelomeric families associated with antigenic variation and immune evasion (''var'', ''stevor'', ''rifin'' in ''P. falciparum'' and closely related species, and ''pir'' on ''P. vivax'' and closely related species) which can have members distributed across the genome, and also present rapid sequence variation making difficult to identify homologies that aid the establishment of ortholog/paralog relations. <ref>Niang M, Yan Yam X, Preiser PR. 2009. The Plasmodium falciparum STEVOR multigene family mediates antigenic variation of the infected erythrocyte. PLoS Pathog. 5:e1000307. https://www.ncbi.nlm.nih.gov/pubmed/19229319</ref><ref>Witmer K, Schmid CD, Brancucci NM, Luah YH, Preiser PR, Bozdech Z, Voss TS. 2012. Analysis of subtelomeric virulence gene families in Plasmodium falciparum by comparative transcriptional profiling. Mol Microbiol. 84:243-59. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3491689/</ref><ref>Petter M, Bonow I, Klinkert MQ. 2008. Diverse expression patterns of subgroups of the rif multigene family during Plasmodium falciparum gametocytogenesis. PLoS One. 3:e3779. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0003779</ref><ref>Singh V, Gupta P, Pande V. 2014. Revisiting the multigene families: Plasmodium var and vir genes. J Vector Borne Dis. 51:75-81. https://www.ncbi.nlm.nih.gov/pubmed/24947212</ref>
+
 
+
In this sense, BLAST tools which can identify multigene family members and permit the easy visualization of homolog regions between two or more genomes can have a large impact on the study of complex ''Plasmodium'' multigene families. We will evaluate how one of CoGe tools ([[CoGeBlast]]) performs when identifying multigene family members and indicating their location across several ''Plasmodium'' genomes. In the following example, we will attempt to identify members of one of the most extended and challenging multigene super families within ''Plasmodium'': ''vir''. <ref>Fernandez-Becerra C, Yamamoto MM, Vêncio RZ, Lacerda M, Rosanas-Urgell A, del Portillo HA. 2009. Plasmodium vivax and the importance of the subtelomeric multigene vir superfamily. Trends Parasitol. 2009 25:44-51. https://www.ncbi.nlm.nih.gov/pubmed/19036639</ref><ref>Lopez FJ, Bernabeu M, Fernandez-Becerra C, del Portillo HA. 2013. A new computational approach redefines the subtelomeric vir superfamily of Plasmodium vivax. BMC Genomics. 14:8. https://www.ncbi.nlm.nih.gov/pubmed/?term=A+new+computational+approach+redefines+the+subtelomeric+vir+superfamily+of+Plasmodium+vivax</ref>
+
 
+
[[File:Position.png|thumb|200px|Screen capture of '''CoGeBlast''' output. The relative position of hits to the query sequence is shown for the PO1 and Salvador-1 ''P. vivax'' strains]]
+
 
+
The ''vir'' super family is composed by 313 members <ref>Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, Cheng Q, Coulson RM, Crabb BS, Del Portillo HA, Essien K, Feldblyum TV, Fernandez-Becerra C, Gilson PR, Gueye AH, Guo X, Kang'a S, Kooij TW, Korsinczky M, Meyer EV, Nene V, Paulsen I, White O, Ralph SA, Ren Q, Sargeant TJ, Salzberg SL, Stoeckert CJ, Sullivan SA, Yamamoto MM, Hoffman SL, Wortman JR, Gardner MJ, Galinski MR, Barnwell JW, Fraser-Liggett CM. 2008. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 455:757-63. https://www.ncbi.nlm.nih.gov/pubmed/18843361</ref> with paralogs being grouped into 10 different subfamilies or remaining independent according to sequence similarity analyses <ref>Lopez FJ, Bernabeu M, Fernandez-Becerra C, del Portillo HA. 2013. A new computational approach redefines the subtelomeric vir superfamily of Plasmodium vivax. BMC Genomics. 14:8. https://www.ncbi.nlm.nih.gov/pubmed/?term=A+new+computational+approach+redefines+the+subtelomeric+vir+superfamily+of+Plasmodium+vivax</ref>. Previous studies have found that less than a third of vir genes are found in other ''P. vivax'' strains demonstrating how rapidly evolving is this family. Nonetheless, this same study also found 15 ''vir'' genes shared across all ''P. vivax'' strains, which also presented low sequence polymorphism; particularly, one of these genes (PVX_113230) was found to share higher sequence similarity than any other family member and even maintain synteny in other ''Plasmodium'' species <ref>Neafsey DE, Galinsky K, Jiang RH, Young L, Sykes SM, Saif S, Gujja S, Goldberg JM, Young S, Zeng Q, Chapman SB, Dash AP, Anvikar AR, Sutton PL, Birren BW, Escalante AA, Barnwell JW, Carlton JM. 2012. The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum. Nat Genet. 44:1046-50. https://www.ncbi.nlm.nih.gov/pubmed/22863733</ref>. We have used this gene as a query sequence for the following example.
+
 
+
 
+
The following steps show how to use the'''CoGeBlast''' tool in the CoGe platform:
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe.
+
 
+
'''2.''' In the main CoGe page  click on '''CoGeBlast''' under the '''Tools''' tile (Alternatively, you can follow this link: https://genomevolution.org/coge/CoGeBlast.pl).
+
 
+
'''3.''' Type the scientific name of your ''Organism'' of interest on the ''Search'' box under '''Select Target Genomes'''. All organism and genomes with names matching the search term will appear under the '''Matching Organisms''' menu. Also, any [[Notebooks]] matching the term will appear in a new window named '''Import List'''.
+
 
+
'''4.''' Select all the organisms of interest by using <span style="color:gray">Crtl+click</span> or <span style="color:gray">Command+click</span> and then click on the green '''+ Add''' button. The added organisms will appear on the '''Selected Genomes''' menu on the right. Alternatively, you can select any of the Notebooks found on '''Import List''', and all genomes included in it will be automatically selected.
+
 
+
'''5.''' Copy the query sequence in FASTA format on the '''Query Sequence(s)''' section at the bottom of the screen. If desired, the BLAST analysis itself can be modified by changing the '''BLAST Parameters'''.
+
 
+
'''6.''' Once the analysis has been completed the output will include: a table showing the number of hits to the query sequence in the analyzed genomes, a graphic depiction of the location of these hits on the genome, and a list showing information for each hit including their similarity index. 
+
 
+
 
+
In agreement with previously results, we found PVX_113230 to be highly conserved across ''P. vivax'' strains <ref>Neafsey DE, Galinsky K, Jiang RH, Young L, Sykes SM, Saif S, Gujja S, Goldberg JM, Young S, Zeng Q, Chapman SB, Dash AP, Anvikar AR, Sutton PL, Birren BW, Escalante AA, Barnwell JW, Carlton JM. 2012. The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum. Nat Genet. 44:1046-50. https://www.ncbi.nlm.nih.gov/pubmed/22863733</ref>. Interestingly, there was some small variation on the number of reported homologs across strains within the analyzed subfamily, with Mauritania, PO1, and the Salvador-1 showing the largest numbers of homologs. This shows that even within conserved subfamilies, the ''vir'' superfamily is highly diverse. Even more, comparison on the location of sequence hits between the PO1 ( not available for inclusion on the previous study) and the Salvador-1 strains, show highly conserved synteny. A comparison between the two strains shows hits located in the approximate same positions and, unless absent, on the same chromosomes. This pattern can also be observed, in less detail, on other strains with an scaffold assembly level. These results could suggest that while  PVX_113230 appears to be the founder of the ''vir'' superfamily and perform an ancestral role, other regions or member of the family could also have functions outside the stablished role on immune evasion (Results for this test can be replicated following this link: https://genomevolution.org/r/lyvj). As expected, when using another ''vir'' member outside this subfamily, the number of family members and their chromosome location varies largely across ''P. vivax'' strains.
+
 
+
 
+
===''Identifying microsyntenic regions (GEvo)''===
+
 
+
Different patterns of genome evolution can be observed among closely related species or even at an intraspecific level. These small genome rearrangements cause loss of synteny between a few genes ([[Microsynteny]]), even if gene positions in large portions of a chromosome tend to be preserved. For instance, within ''Plasmodium'' microsynteny can be loss in regions of high recombination frequency or where rapid genomic changes are evolutionary advantageous. Thus, change in microsynteny can point out to regions of great evolutionary interest where a more detailed evaluation might be informative. In addition, careful assessment of genome properties on defined regions can aid in hypotheses testing of evolutionary gene origins. In this regard, it has been hypothesized that some of the genes essential for successful erythrocyte invasion found in ''Laverania'' (the reticulocyte-binding-like homologous protein 5 or Rh5, and the
+
cysteine-rich protective antigen or CyRPA) might have originated via an horizontal genome transfer (HGT) event early on the evolution of the clade <ref>Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, Shaw KS, Ayouba A, Peeters M, Speede S, Shaw GM, Bushman FD, Brisson D, Rayner JC, Sharp PM, Hahn BH. 2016. Genomes of cryptic chimpanzee Plasmodium species reveal key evolutionary events leading to human malaria. Nat Commun. 7:11078. https://www.ncbi.nlm.nih.gov/pubmed/27002652</ref>. [[GEvo]] can be used to identify and visually represent patterns of genome evolution across multiple genomic regions and for any number of genomes, which can aid on the confirmation of this hypothesis.
+
 
+
[[File:Gcwobble.png|thumb|200px|GC content is shown as green bars in each genome background. The wobble codon GC content of each gene has also been colored]]
+
 
+
 
+
The following steps show how to use '''GEvo''' to analyze microsyntenic regions:
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe.
+
 
+
'''2.''' On the main CoGe page click on '''GEvo''' under the '''Tools''' tile (Alternatively, you can follow this link: (https://genomevolution.org/coge/GEvo.pl).
+
 
+
'''3.''' Each displayed box found under '''Sequence Submission''' allows you to select a sequence. You can specify as much as 25 sequences to perform a '''GEvo''' analysis. Each box is composed of several elements: a drop down menu of sequence databases (CoGe database, NCBI GenBank or Direct Submission), the name of the selected sequence (''e.g.'' gene ID numbers), the length of genome segment to be displayed to the left and right of the sequence, and green button used to specify additional ''Sequence Options'' (''e.g.'' skip sequence from the analysis, set sequence as reference, set sequence as reverse complement, or mask the sequence). You can input sequences for analysis by entering their corresponding IDs on the Name: bar. Alternatively, you can select pairs of genes for microsynteny analysis by zooming or clicking on specific regions of the SynMap display.
+
 
+
'''4.''' Once you have selected sequences for each display box (you can simply se), click on the red '''Run GEvo''' button at the bottom of the screen.
+
 
+
'''5.''' The '''GEvo''' analysis will output a display of the syntenic regions between the compared genomes. Genes are displayed in green on their corresponding genome position. Syntenic genome regions are signaled as light colored red bar on top of each genome. These bars can be clicked to display connectors to the corresponding regions on all analyzed genomes.
+
 
+
'''6.''' You can modify the analysis by changing the parameters displayed on the '''Algorithm''' tab. Also, you can modify the information of the graphical display by altering the options on the '''Results Visualization Options''' tab. We have modified the display to show GC content (GC rich regions are shown as green in the background and AT rich regions are shown in white) and wobble GC content (indicated by a colored gradient: low GC content is displayed in red, ~50% GC content in yellow, and high GC content in green).
+
 
+
[[File:Pvsal1new.png|thumb|200px|The analysis shows a region where synteny is loss between ''P. vivax'' Sal-1 strain, and the ''P. vivax'' PO1 strain and ''P. cynomolgi'' genomes.]]
+
 
+
 
+
We searched for Rh5 orthologs in all fully sequenced ''Plasmodium'' genomes (''P. falciparum'' strains 3D7 and IT, ''P. reichenowi'' strains CDC and SY57, and ''P. gaboni'' strain SY75) from the ''Laveranian'' subgenus by using '''CoGeBlast'''. We then used the provided output to perform a microsynteny analysis of these genome regions using '''GEvo'''. Our results show that microsynteny is largely maintained in the regions surrounding Rh5 and CyRPA; furthermore, there does not appear to be a marked difference in GC content inside and outside the region containing these genes for either of the evaluated genomes. It has been suggested that changes in GC content within any certain genome region that do not correspond to the background GC content, or to the GC wobble content of surrounding genes could be indicative of a HGT event. Such pattern is not observed for either Rh5 and CyRPA, and therefore our results do not support the previously suggested HGT event <ref>Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, Shaw KS, Ayouba A, Peeters M, Speede S, Shaw GM, Bushman FD, Brisson D, Rayner JC, Sharp PM, Hahn BH. 2016. Genomes of cryptic chimpanzee ''Plasmodium'' species reveal key evolutionary events leading to human malaria. Nat Commun. 7:11078. https://www.ncbi.nlm.nih.gov/pubmed/27002652</ref>. It is possible that a HGT event occurring between genomes of similar composition might not be detected by this analysis, and thus additional testing might be required. However, differences in topology for specific gene trees respect to those of the species tree might be cause by additional causes other than HGT. In particularly, genes expressed during blood parasitic stages, and involved on erythrocyte invasion, are expected to be largely affected by selective pressures imposed by the host's immune system <ref>Forni D, Pontremoli C, Cagliani R, Pozzoli U, Clerici M, Sironi M. 2015. Positive selection underlies the species-specific binding of ''Plasmodium'' falciparum RH5 to human basigin. Mol Ecol. 24:4711-22. https://www.ncbi.nlm.nih.gov/pubmed/26302433</ref>. Therefore, they could present unique evolutionary patterns not related to HGT. '''CoGeBlast''' analysis can be regenerated following this link: https://genomevolution.org/r/m1qw. The '''GEvo''' analysis can be run following this link: https://genomevolution.org/r/m4dq.
+
 
+
In addition, '''GEvo''' can be used to identify potentially poorly sequence genome regions which can influence the identification of larger syntenic patterns. A microsynteny analysis on border regions where an inversion event has been detected between ''P. vivax'' Salvador-1 strain, and ''P. vivax'' PO1 strain and ''P. cynomolgi'' shows that loss of synteny correlates with the location of a poorly sequenced regions in the ''P. vivax'' Salvador-1 strain (shown in orange). Synteny is maintained in the region between ''P. cynomolgi'' and the ''P. vivax'' PO1 strain, but loss when ''P. vivax'' Salvador-1 is compared with either ''P. vivax'' PO1 strain or the sister species ''P. cynomolgi''.  This suggest that the inversion event observed in ''P. vivax'' Salvador-1 might be unique for this genome, or it might indicate an artifact due to poor sequencing of this region.
+
 
+
 
+
===''Performing syntenic analyses between two genomes (SynMap)''===
+
 
+
====''Identifying syntenic gene pairs''====
+
 
+
There are approximately 1787 protein family members thought to have originated after the split of the ''Plasmodium'' and ''Theileria'' common ancestors  <ref>Wasmuth J, Daub J, Peregrín-Alvarez JM, Finney CA, Parkinson J. 2009. The origins of apicomplexan sequence innovation. Genome Res. 19:1202-13. https://www.ncbi.nlm.nih.gov/pubmed/19363216</ref>. As expected, the number of ortholog genes increases the more closely related two species are <ref>DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/</ref>. The sequencing of new ''Plasmodium'' genomes, provides the opportunity to identify syntenic gene pairs across different paired genome combinations. Not only this, but it also permits the identification of positional changes for certain genes and allows to infer their potential evolutionary origin. Furthermore, it is possible that changes in the sequence of genes in a genome can have an effect not only on the gene of interest but also on neighboring sequences. It has been shown in several eukaryotic organisms that gene expression and gene regulation might be largely dependent on genome location; furthermore, it has been proposed that gene co-expression clusters might be a significant element in the eukaryotic gene regulation programs <ref>Michalak P. 2008. Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics. 91:(43–248) http://www.sciencedirect.com/science/article/pii/S0888754307002807</ref>. Moreover, previous studies have shown that gene expression and transcriptome evolution is affected by genome position <ref>Ghanbarian AT,  Hurst LD. 2015. Neighboring Genes Show Correlated Evolution in Gene Expression. Mol Biol Evol. doi: 10.1093/molbev/msv053 http://mbe.oxfordjournals.org/content/early/2015/04/01/molbev.msv053.full</ref><ref>De S, Teichmann SA, Babu MM. 2009. The impact of genomic neighborhood on the evolution of human and chimpanzee transcriptome. Genome Res. 19(5): 785–794. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675967/</ref>. While there are comparative less studies that evaluate potential relations between gene co-expression and genomic location performed in ''Plasmodium'' genus, there is evidence that certain genes are strictly up-regulated during specific parasite life stages <ref>Lanfrancotti A, Bertuccini L, Silvestrini F, Alano P. 2007. Plasmodium falciparum: mRNA co-expression and protein co-localisation of two gene products upregulated in early gametocytes. Exp Parasitol. 116:497-503. https://www.ncbi.nlm.nih.gov/pubmed/17367781</ref>. The identification of syntenic gene pairs in ''Plasmodium'' can provide the putative location of functionally advantageous clusters preserved by natural selection, as well as suggest sites of interest for the evaluation of the role that changes in gene order could have on gene expression.
+
 
+
One of the most significant tools found in the CoGe platform is [[SynMap]]. This tool is used to identify syntenic ortholog genes between two genomes and provides a graphical output for genes across the entire genome. In ''Plasmodium'', such information can be used to identify highly conserved regions of between two genomes, as well as to identify section where synteny has been loss. This types of regions can be latter analyzed in search of patters suggesting neighboring effects on gene expression and transcription as those described for other eukaryotes.
+
 
+
 
+
The following steps can be followed to perform comparative analyses using the '''SynMap''' tool on CoGe:
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe
+
 
+
'''2.''' On the main CoGe page find the '''Tools''' section and click on '''Organism View''' (Alternatively, you can also follow this link: https://genomevolution.org/coge/OrganismView.pl)
+
 
+
[[File:Synmappvvspcy.png|thumb|200px|'''Step 5''': SynMap input screen. Genomes for two different species are selected as an example: ''P. cynomolgi'' B strain ('''Organism 1'''), and ''P. vivax'' Salvador-1 strain ('''Organism 2''')]]
+
 
+
'''3.''' Type the scientific name of the desired species on the ''Search'' box and select the appropriate genome. Then, click on the '''GenomeInfo''' link under the '''Genome Information''' section.
+
 
+
'''4.''' Find the link to the '''SynMap''' tool under the '''Analyze''' section on '''Tools'''.
+
 
+
'''5.''' By default, '''SynMap''' will allow you to evaluate the synteny of a genome with itself. This can be of use when characterizing a genome or when attempting to identify putative duplication events <ref>Tang H, Lyons E. 2012. Unleashing the Genome of Brassica Rapa. Front Plant Sci. 3: 172. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3408644/</ref>. Alternatively, two different genomes or two different organisms can be analyzed by using '''SynMap'''. A different genome can be selected for Organism 1 or for Organism 2 by typing a different scientific name on either ''Search'' before performing the '''SynMap''' analysis, and the selecting the desired genome. Once you have selected the organisms to analyze you can run this tool by clicking on '''Generate SynMap'''.
+
 
+
'''6.''' Once the analysis has been completed, '''SynMap''' will output a graphical depiction of the syntenic regions between the two genomes. There are currently two version of the '''SynMap''' tool: '''SynMap2''', which is selected by default, allows you to interact and dynamically alter the output (''e.g.'' zoom in into a particular region showing a pattern of interest); and '''SynMap Legacy''', which only provides static images. Either version can be used for the analysis of two genomes and require the same input.
+
 
+
'''7.''' Specific gene pairs of interest observed in '''SynMap''' can be analyzed in more detail in '''GEvo'''. The syntenic gene pair can be selected by zooming on the '''SynMap''' plot (this is done by clicking on the region of interest on '''SynMap Legacy''' or by dragging the mouse over the region on '''SynMap2'''). GEvo can then be run for specific gene pairs by double clicking on their syntenic point ('''SynMap Legacy'''), or by selecting this point and clicking on the ''Compare in GEvo >>>'' under ''Point Selection'' ('''SynMap2''')
+
 
+
 
+
==== ''Identifying chromosomal inversions, fusions, fissions and other events between two genomes'' ====
+
 
+
[[File:Synmaplegacypairwisecomparisons.png|thumb|200px|Independent rearrangement events are observed in these pairwise comparisons with '''SynMap Legacy'''. '''From top to bottom and left to right''': ''P. knowlesi'' vs. ''P. malariae''; ''P. coatneyi'' vs. ''P. malariae''; ''P. coatneyi'' vs. ''P. knowlesi''; ''P. ovale'' vs. ''P. malariae''; ''P. coatneyi'' vs. ''P. ovale''; ''P. ovale'' vs. ''P. knowlesi'']]
+
 
+
Another significant use of CoGe's '''SynMap''' tool is the identification of genome rearrangement events. Rearrangements are originated when regions of the genome become duplicated, inverted, when a single region divides into two fragments or when different regions fuse into a single fragment. Furthermore, '''SynMap''' can be use to identify indels between two genomes. The tracking of these events can aid in pinpointing genome sections subjected to more rapid change than others, as well as to identify the evolutionary origin of certain genomic elements. 
+
 
+
Initial studies evaluating synteny conservation across species from the phylum ''Apicomplexa'' have shown that while synteny amongst genera is for the most part lost, gene order and position is highly maintained within the ''Plasmodium'' genus. Nonetheless, as a larger number of ''Plasmodium'' genomes become available, it has become apparent that synteny patterns within the genus are far more complex. Closely related ''Plasmodium'' species have shown to be largely syntenic with the exception of determined genomic regions; on the other hand, less closely related species from different ''Plasmodium'' clades are less likely to maintain synteny with numerous rearrangement events being apparent <ref>Tachibana SI, Sullivan SA, Kawai S, Nakamura S, Kim HR, Goto N, Arisue N, Palacpac NM, Honma H, Yagi M, Tougan T, Katakai Y, Kaneko O, Mita T, Kita K, Yasutomi Y, Sutton PL, Shakhbatyan R, Horii T, Yasunaga T,  Barnwell JB, Escalante AA, Carlton JM, Tanabe K. 2012. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat Genet. 44: 1051–1055. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759362/</ref>. Thanks to the larger number of ''Plasmodium'' genomes currently available, it is possible to evaluate ''Plasmodium'' synteny in a more complex array of species and within three of the four mayor ''Plasmodium'' clades. At this point in time is possible to estimate species-specific genomic rearrangements events and assess their significance on genome evolution, as well as to identify the potential evolutionary origins for most prominent rearrangements by performing several paired comparisons across different species sets.
+
 
+
In the case of ''P. vivax'' and closely related species, loss of synteny events have been reported on chromosomes 3 and 6 between: ''P. vivax'', ''P. cynomolgi'' and ''P. knowlesi'' . An analysis of these species using '''SynMap Legacy''' shows inversion events between ''P. vivax'' and both ''P. knowlesi'' and ''P. cynomolgi''. Nonetheless, no inversion events are observed between ''P. cynomolgi'' and ''P. knowlesi''. This suggest that the chromosomal inversions reported for chromosomes 3 and 6 might have occurred after the split of ''P. cynomolgi'' and ''P. vivax'' (approximately between 3.43-3.87 Mya) and can be unique of the ''P. vivax'' genome <ref>Pacheco MA, Reid MJ, Schillaci MA, Lowenberger CA, Galdikas BM, Jones-Engel L, Escalante AA. 2012. The origin of malarial parasites in orangutans. PLoS One. 7:e34990. https://www.ncbi.nlm.nih.gov/pubmed/22536346</ref>. Analyses can be regenerate following these links: https://genomevolution.org/r/lj12 (''P. vivax'' vs. ''P. cynomolgi''), https://genomevolution.org/r/lj1x (''P. knowlesi'' vs. ''P. cynomolgi''), and https://genomevolution.org/r/lj1t (''P. knowlesi'' vs. ''P vivax'').
+
 
+
It is also possible to identify sets of chromosome fusion/fission events unique to specific genomes. Pairwise comparisons between the genomes of four closely related ''Plasmodium'' parasites: ''P. ovale curtisi'', ''P. malariae'', ''P. coatneyi'' and ''P. knowlesi''; show that at least two sets of inversions and fusions have occurred in the ''P. coatneyi'' and ''P. malariae'' genomes. '''SynMap Legacy''' results show two fusion events in chromosomes 5 and 9 unique to ''P. malariae'' (marked with red squares) and two additional fusion events in chromosomes 13 and 14 of ''P. coatneyi'' (marked with green squares). Moreover, and inversion event can be observed in the central region of chromosome 4 in ''P. malariae'' (marked with a red circle). Analyses can be regenerated using the following links: ''P. knowlesi'' vs. ''P. malariae'' (https://genomevolution.org/r/lq5x); ''P. coatneyi'' vs. ''P. knowlesi'' (https://genomevolution.org/r/lj2b); ''P. coatneyi'' vs. ''P. malariae'' (https://genomevolution.org/r/lq5y); ''P. ovale'' vs. ''P. malariae'' (https://genomevolution.org/r/lq5t); ''P. coatneyi'' vs. ''P. ovale'' (https://genomevolution.org/r/lq65); and ''P. ovale'' vs. ''P. knowlesi'' (https://genomevolution.org/r/lq5v).
+
 
+
 
+
=== ''Measuring Kn/Ks values between genomes (SynMap - CodeML analysis tool)'' ===
+
 
+
The relative rates of synonymous (Ks) and non-synonymous (Kn) substitutions are a measure of the amount of change between two genomes. Ks values are largely neutral or can be under low selective pressure; thus, they can be used to measure mutation rates and to establish relative gene age. Alternatively, Kn values are largely indicative of the effects of natural selection on any given gene. As a whole, the Kn/Ks ratio provides a picture of some of the evolutionary forces shaping gene evolution. Under neutrality, it is expected that Kn/Ks = 1 since both synonymous and non-synonymous substitutions will occur at the same rate. Positive selection is indicated by a larger ratio of non-synonymous substitutions (Kn/Ks > 1), while purifying selection is observed when there is a larger ratio of synonymous substitutions (Kn/Ks < 1). The CoGe platform has the unique capability of calculating the Kn/Ks ratio on syntenic gene pairs; this means that it can provide a measure of the role of natural selection on gene evolution that is informed of the relative position of genes on the genome. Therefore, syntenic based Kn/Ks analyses aid to define genome regions evolving under different selective regimes than those predominant on the entire genome, identify the relative age of genome rearrangement events (''e.g.'' duplications), and establish genome-specific difference in genome evolution from the point of their split from the common ancestor. All these elements are highly significant on the study of ''Plasmodium'' evolution given that different species have been shown to present distinct evolutionary patterns. For instance, several studies have pointed out how ''Plasmodium'' subtelomeric regions have a tendency to show higher recombination rates and overall more rapid evolution than others regions of the genome, and in comparison with other ''Apicomplexa'' parasites <ref>Lau AO. 2009. An overview of the Babesia, Plasmodium and Theileria genomes: A comparative perspective. Mol Biochem Parasitol. 164:1-8. http://www.sciencedirect.com/science/article/pii/S016668510800279X</ref>.
+
 
+
In the CoGe platform, Kn/Ks analyses can be performed for two annotated genomes after a [[SynMap]] analysis has been completed. The analysis is performed by using one of the available SynMap Tools and will modify the [[Syntenic_dotplot]] display to represent the distribution of the different Ks, Kn or Kn/Ks ratio.
+
 
+
[[File:Kslaverania.png|thumb|200px|Paired Ks analyses between ''Plasmodium'' species of the ''Laverania'' subgenus. '''From right to left''': ''P. gaboni'' vs. ''P. reichenowi''; ''P. falciparum'' vs. ''P. reichenowi''; ''P. gaboni'' vs. ''P. falciparum'']]
+
 
+
 
+
The following steps show how to perform '''Kn/Ks''' analyses using the CodeML tool available on '''SynMap''':
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe.
+
 
+
'''2.''' Follow the steps to perform a '''SynMap''' analysis between the two genomes of interest. Keep in mind that CoGe has the capacity to store all analysis performed under a users' account, so you can use a previously generated '''SynMap''' analysis. Also note that, the Kn/Ks ratio can only be calculated for genomes with included annotation (.gff files have been imported) on CoGe regardless on their levels of assembly.
+
 
+
'''3.''' Once you have the '''SynMap''' output for the two sequences, find the '''CodeML tool''' under the '''Analysis Options''' tab at the bottom of the screen. Click on the ''Calculate syntenic CDS pairs and color dots:________ substitution rates(s)'' section and select ''Synonymous (Ks)'' from the dropdown menu. You can also perform other analyses by selecting the: ''Non-synonymous (Kn)'' and ''(Ks/Kn)'' analysis options. The display can be modified by choosing a different ''Color Scheme'' from the second dropdown menu, or by specifying the axis default ''Min Val.'' or ''Max Val.'', and the ''Log10 Transform.'' of the data.
+
 
+
'''4.''' The resulting output will show the distribution of Ks values (or Kn or Ks/Kn) across the syntenic regions between the two evaluated genomes displayed on '''SynMap'''. In addition, the output will include a ''Histogram of Ks values'' (or Kn or Ks/Kn) bellow updated SynMap. In '''SynMap2''', specific regions/chromosomes can be dynamically selected in order to view the Ks, Kn or Ks/Kn values across the a particular set of syntenic genes.
+
 
+
 
+
[[File:Knanalysislaverania.png|thumb|200px|Paired Kn analyses between ''Plasmodium'' species of the ''Laverania'' subgenus. '''From right to left''': ''P. gaboni'' vs. ''P. reichenowi''; ''P. falciparum'' vs. ''P. reichenowi''; ''P. gaboni'' vs. ''P. falciparum'']]
+
 
+
Smaller ''Log10( ) substitution per site values of ___'' are indicative of a lower number of synonymous (Ks) or non-synonymous (Kn) substitution between the analyzed genomes. Since the effects of Natural Selection on synonymous substitutions is thought to be minimal, these types of substitutions are expected to accumulate in a largely constant manner. Paired Ks analyses performed between different genome sets provide information regarding their time of divergence and mutability. The Ks analyses between ''P. gaboni'' strain SY57 and ''P. reichenowi'' strain CDC show a larger number of recent synonymous substitution compared to the same analysis performed between ''P. gaboni'' - ''P. falciparum'' strain 3D7. This is an interesting result since, ''P. reichenowi'' and ''P. falciparum'' are thought to have recently split (approximately 5.28-5.93 Mya <ref>Pacheco MA, Reid MJ, Schillaci MA, Lowenberger CA, Galdikas BM, Jones-Engel L, Escalante AA. 2012. The origin of malarial parasites in orangutans. PLoS One. 7:e34990. https://www.ncbi.nlm.nih.gov/pubmed/22536346</ref>), while they share a distant common ancestor with ''P. gaboni'' <ref>Rayner JC, Liu W, Peeters M, Sharp PM, Hahn BH. 2011. A plethora of Plasmodium species in wild apes: a source of human infection? Trends Parasitol. 27:222-9. https://www.ncbi.nlm.nih.gov/pubmed/21354860?dopt=Abstract&holding=npg</ref>. The dissimilarities between Ks rates in ''P. falciparum'' and ''P. reichenowi'' respect to ''P. gaboni'', suggest that a change in synonymous substitution rates has occurred after the split of these sister taxa. It would be expected that if this change occurred in the common ancestor of both species with ''P. gaboni'', synonymous substitution rates would be similar when each one is compared to the ancestral ''P. gaboni'', which is not the case. Furthermore, the Ks values between ''P. reichenowi'' - ''P. falciparum'' are slightly smaller than those observed between ''P. falciparum'' - ''P. gaboni'' supported the observation that Ks rates have increased in ''P. reichenowi'' after its split from ''P. falciparum'', but there was largely little variation on the substitution rate after the split of the common ancestor for both species from ''P. gaboni''. This suggests that syntenic genes within ''P. reichenowi'' strain CDC are evolving at a more rapid rate than other compared species within the ''Laveranian'' subgenus. These analyses can be replicated in the following links: ''P. reichenowi'' vs. ''P. falciparum'' (https://genomevolution.org/r/ljhj), ''P. reichenowi'' vs. ''P. gaboni'' (https://genomevolution.org/r/ljhq), and ''P. falciparum'' vs. ''P. gaboni'' (https://genomevolution.org/r/ljhl).
+
 
+
Alternatively, the pattern of non-synonymous (Kn) substitution observed between ''P. gaboni'' - ''P. falciparum'' and ''P. gaboni'' - ''P. reichenowi'' are largely similar which suggest that a number of non-synonymous have occurred after the split of the common ancestor of both species from ''P. gaboni''. Moreover, the smaller rate but more recent number of non-synonymous substitutions observed between ''P. falciparum'' - ''P. reichenowi'' indicate a number of non-synonymous substitutions unique for each species. Overall, these results indicate that natural selection has have a role on shaping the divergence between these three genomes in a pattern likely associated to the corresponding colonization to different vertebrate hosts (''e.g.'' human vs. chimps). Previous studies have shown that the non-synonymous substitution rates between ''P. reichenowi'' and ''P. falciparum'' are particularly large in a significant number of proteins; and that a selective pressure and gene gain/loss events are largely predominant during erythrocyte invasion. These previous results suggests that stages associated with erythrocyte invasion have had a fundamental role on the expansion of the ''Laveranian'' subgenus <ref>Otto TD, Rayner JC, Böhme U, Pain A, Spottiswoode N, Sanders M, Quail M, Ollomo B, Renaud F, Thomas AW, Prugnolle F, Conway DJ, Newbold C, Berriman M. 2014. Genome sequencing of chimpanzee malaria parasites reveals possible pathways of adaptation to human hosts. Nat Commun. 5:4754. https://www.ncbi.nlm.nih.gov/pubmed/25203297</ref>, and that some colonization of humans by P. falciparum might have been facilitated, at least in part, via the genome transfer of several key erythrocyte invasion proteins <ref>Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, Shaw KS, Ayouba A, Peeters M, Speede S, Shaw GM, Bushman FD, Brisson D, Rayner JC, Sharp PM, Hahn BH. 2016. Genomes of cryptic chimpanzee Plasmodium species reveal key evolutionary events leading to human malaria. Nat Commun. 7:11078. https://www.ncbi.nlm.nih.gov/pubmed/27002652</ref>. While our results are in agreement with the significant role of natural selection on the evolution of the ''Laveranian'' subgenus, they also point out to intrinsically different mutation patterns between ''P. reichenowi'' and ''P. falciparum''. Analyses can be run following these links: ''P. reichenowi'' vs. ''P. falciparum'' (https://genomevolution.org/r/lsz2), ''P. reichenowi'' vs. ''P. gaboni'' (https://genomevolution.org/r/lsyy), and ''P. falciparum'' vs. ''P. gaboni'' (https://genomevolution.org/r/lsz5).
+
 
+
 
+
===''Identifying sets of syntenic genes amongst several genomes (SynFind)''===
+
 
+
[[File:GEvousingSynfind.png|thumb|200px|Screen capture of '''GEvo''' analysis using the output from '''Synfind'''. Lines connect syntenic regions between members of the SERA multigene family]]
+
 
+
We have observed that a significant level of genome rearrangements is prevalent between ''Plasmodium'' clades and even within species inside a single clade. A large number of events leading to loss of synteny are associated to species-specific gene gain/loss events; moreover, high recombination rates can result in gene duplication being apparently located outside their point of original, a pattern also consistent with horizontal gene transfer occurs (HGT). In this regard, it is of particular significance the use of tools, which allow the identification of syntenic regions across genomes, and in particular, of those regions where genes of interest might be located. Moreover, the identification of these regions, more than that of the gene of interest itself, can provide indispensable information regarding the gene's origin and trajectory. Within ''Plasmodium'', the characterization of syntenic regions where multigene family members are found can aid in the identification of gain/loss events, rearrangements on the order of family members, or even evolutionary relation amongst non coding sequences which can allow the inference of the evolutionary history events leading to the spread, or reduction, of the family. These types of patterns are likely to be observed more predominantly on multigene families with a tandem arrangement on the chromosome; on this subject, a significant example for these patterns within the genus ''Plasmodium'' is the SERA multigene family.
+
 
+
Thought the specific details about their functionality is largely unknown, members of the SERA (serine repeat antigen) multigene family are found across all sequenced ''Plasmodium'' species. Overall, SERA multigene family members are characterized by encoding proteins with a papain-like cysteine protease motif <ref>Arisue N, Kawai S, Hirai M, Palacpac NM, Jia M, Kaneko A, Tanabe K, Horii T. 2011. Clues to Evolution of the SERA Multigene Family in 18 Plasmodium Species. PLoS One. 6: e17775. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017775</ref>, and are expressed during various stages of the ''Plasmodium'' life cycle. One member of this family (SERA-5), produced during late trophozoite and schizont stages, has been a widely considered as a promising target for malaria vaccine development and has reached phase Ib clinical trials (studies conducted in diagnosed patients) <ref>Arisue N, Hirai M, Arai M, Matsuoka H, Horii T. 2007. Phylogeny and evolution of the SERA multigene family in the genus Plasmodium. J Mol Evol. 65:82-91. http://link.springer.com/article/10.1007%2Fs00239-006-0253-1</ref>. While members of the SERA family have been described in all sequenced ''Plasmodium'' genomes, the amount of significant contractions, expansions and rearrangements observed across species pinpoint to a highly dynamic evolutionary history that can be explored with the adequate tools. The [[SynFind]] tool in CoGe allows the identification of syntenic regions across any set of genomes after providing a specific query gene and reference genome.
+
 
+
[[File:GEvousingSynfind2.png|thumb|200px|Screen capture of '''GEvo''' analysis using '''Synfind''' output. Lines connect syntenic regions. Small syntenic fragments are found across intergenic regions]]
+
 
+
 
+
These steps show how to use '''SynFind''' to search for syntenic regions associated to particular sets of genes from a reference genome:
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe.
+
 
+
'''2.''' On the main CoGe page click on '''SynFind''' under the Tools tile (Alternatively, you can follow this link: (https://genomevolution.org/CoGe/SynFind.pl).
+
 
+
'''3.''' Type the scientific name of your desired organism on the search bar. You will find this bar under the ''Search'' tab and on the '''Select Target Genomes''' section. Organisms and genomes with names matching the search term will be displayed on the '''Matching Organisms''' menu.
+
 
+
'''4.''' Select all the genomes of interest by using <span style="color:gray">Crtl+click</span> or <span style="color:gray">Command+click</span>. After you have selected all genomes of interest click on the green '''+ Add''' button. Added genomes will appear on the '''Selected Genomes''' menu on the right.
+
 
+
'''5.''' Type the ''Name'', ''Annotation'' or ''Organisms'' of interest in the '''Specify Features''' section. It is recommended to provide as many specifics for this query as possible; nonetheless, you should also be capable of performing the analysis even with less specific terms. For example, it is possible to retrieve the sequences of interest just by typing "sera" on the box corresponding to ''Name''. Once you have specified your features, click on the green '''Search''' button.
+
 
+
'''6.''' All matches to the search term and genome where they have been found will appear as an output in a drop down menu within the same section. Select all relevant '''Matches''' (''e.g.'' all SERA genes), and your reference '''Genome'''  (''e.g.'' ''P. falciparum'' strain 3D7 v5).
+
 
+
'''7.''' Once you have specified your feature click the red '''Run SynFind''' button to start the analysis (You can regenerate this example using the following link: https://genomevolution.org/r/lszj)
+
 
+
'''8.''' '''SynFind''' will output all syntenic regions to the query sequence found on the reference genome and their [[Syntenic depth]]. Using this output, sequences can be further analyzed by using any of the numerous tools available on CoGe (generate '''SynMap''' dotplots for matches, perform a microsynteny analysis for these regions with '''GEvo''', etc.).
+
 
+
 
+
The information provided by '''SynFind''' allows to rapidly identify regions where multigene family paralogs can be found. Then, '''GEvo''' can be used to evaluate the identified syntenic regions in detail. We used '''Synfind''' to identify potential syntenic regions to SERA-5 across six ''P. vivax'' strains from different geographic regions (analysis can be recreated following this link: https://genomevolution.org/r/lszj). Our results show that all evaluated ''P. vivax'' strains share the 12 reported SERA paralogs <ref>Arisue N, Kawai S, Hirai M, Palacpac NM, Jia M, Kaneko A, Tanabe K, Horii T. 2011. Clues to Evolution of the SERA Multigene Family in 18 Plasmodium Species. PLoS One. 6: e17775. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017775</ref>; however, there is some intraspecific variation between the syntenic regions where SERA paralogs are found. Specifically, synteny is loss on certain family members on the ''P. vivax'' Brazil-1 strain (shown as second from the upper part of the screen). The regions where synteny is loss are associated with the location of paralogs uniquely found in ''P. vivax'' and closely related species. Therefore, it is possible that recently duplicated paralogs might have not yet been fixated at the intraspecific level, or that there are certain evolutionary advantages associated with a variable number of paralogs within the same species as it has been previously discussed in other ''Plasmodium'' multigene families <ref>Rice BL, Acosta MM, Pacheco MA, Carlton JM, Barnwell JW, Escalante AA. 2014. The origin and diversification of the merozoite surface protein 3 (msp3) multi-gene family in Plasmodium vivax and related parasites. Mol Phylogenet Evol. 78:172-84. https://www.ncbi.nlm.nih.gov/pubmed/24862221</ref>. Nonetheless, it is important to note that while multigene family members are characterized by a family common motifs, such motifs can be occasionally found in genes non related to the family and evolving under a different patterns and mechanism. Thus, motifs and domains identified by '''SynFind''', can be conserved across different types of genes or even intergenic regions, and therefore should be carefully evaluated.
+
 
+
 
+
=== ''Identifying codon and amino acid substitution frequencies (CodeOn)'' ===
+
 
+
[[File:AAusagetables.png|thumb|200px|Amino acid usage tables in ''Plasmodium'' species from the simian clade]]
+
 
+
The evolutionary significance of compositional biased mutational pressure on codon and amino acid usage within the genus ''Plasmodium'' has been previously highlighted. The compositional bias observed on ''P. falciparum'' has been associated with variations on codon usage and gene expression, and in particular, preferences for C-ended codons has been observed in many highly-expressed genes despite this parasites' AT rich genome. Moreover, expression patterns have also been associated to usage of less energetically expensive amino acids, which could suggest that translational selection creates an evolutionary advantage for decreasing energetic costs during infection <ref>Peixoto L, Fernández V, Musto H. 2004. The effect of expression levels on codon usage in Plasmodium falciparum. Parasitology. 128:245-51. https://www.ncbi.nlm.nih.gov/pubmed/15074874</ref>. The significance of compositional bias and translational selection has also shown to be largely variable on other ''Plasmodium'' species; in particular, translational selection has been shown to have a small, yet higher than ''P. falciparum'', role on codon usage bias for ''P.vivax'' <ref>Yadav MK, Swati D. 2012. Comparative genome analysis of six malarial parasites using codon usage bias based tools. Bioinformation. 8:1230-9. https://www.ncbi.nlm.nih.gov/pubmed/23275725</ref>.
+
 
+
The role of compositional bias has been evaluated on only 6 ''Plasmodium'' species representing three of the four mayor ''Plasmodium'' clades. Currently, the large number of ''Plasmodium'' genome sequenced allow us to assess the role of composition bias on closely related species which also share similar nucleotide composition. In order to assess differences in codon and amino acid usage potentially associated with GC content across ''Plasmodium'' species we will use one of CoGe analysis tools named [[CodeOn]], which calculates amino acid usage across various levels of %GC for any given genome, and the number of CDS under the computed %GC tiers.
+
+
[[File:AAcodeonlaverania.png|thumb|200px|Amino acid usage tables in ''Plasmodium'' species from the ''Laveranian'' subgenus]]
+
 
+
 
+
The following steps indicate how to built amino acid usage tables for any given genome:
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe.
+
 
+
'''2.''' Find your organism and genome of interest in Organism View (https://genomevolution.org/coge/OrganismView.pl).
+
 
+
'''3.''' Find the '''Genome Information''' section on the right side of the screen. Under the different listed '''Tools''' you will find '''CodeOn'''. Click on the analysis, the output will be shown in a different tab once completed after a couple of minutes.
+
 
+
 
+
As expected, similarities on %GC were more prevalent amongst closely related species than species from different ''Plasmodium'' clades. Within the simian clade, ''P. vivax'' showed a large number of CDS with 45-55% GC, while other species presented a slightly more skewed 40-45% GC on most CDS. Alternatively, ''Plasmodium'' species of the ''Laveranian'' subgenus show a larger number of CDS with a reduced 20-30% GC. Nonetheless, '''CodeOn''' results show that the patterns of amino acid usage in relation to the variations on GC content are still unique for each ''Plasmodium'' species. Interestingly, ''P. vivax'' and ''P. coatneyi'' showed higher similarities in their amino acid usage trends than with their sister taxa (''P. cynomolgi'' and ''P. knowlesi'', respectively). Even more, these differences did not appear to be solely related to composition genome bias, since in both cases GC content was more similar amongst sister taxa. These results suggest that amino acid usage is likely influenced by elements other than compositional bias in other ''Plasmodium'' species from the simian clade. Taking into account previously reported associations of codon usage and translational selection on ''P. vivax'', it would be relevant to explore is similar relations are observed in other newly sequenced ''Plasmodium'' genomes.
+
 
+
In the case of ''Plasmodium'' species from the ''Laveranian'' subgenus, the sister species ''P. falciparum'' and ''P. reichenowi'' showed both similar amino acid usage, and number of CDS under low %GC tiers. On the other hand, the earlier divergent species ''P. gaboni'' showed similar %GC patterns but dissimilar trends in amino acid composition. The likeness in the patterns observed among ''Laveranian'' species confirms that compositional bias is a significant factor on determining amino acid usage within the subgenus; however, and similarly to species on the simian clade, additional elements also appear to play a role in determining amino acid composition. While difficult to assess using only three representative species genomes, it is possible that these changes in amino acid usage might have originated in specific points during the diversification of the ''Laveranian'' subgenus; specifically, the skewed amino acid usage observed on ''P. reichenowi'', and more predominantly on ''P. falciparum'', could represent a recently derived trait associated to the infection of a different host type and might have occurred in their common ancestor after the split from ''P. gaboni'' and other ''Laveranian'' species.
+
 
+
 
+
=== ''Using Syntenic Path Assembly (SPA) to make analysis of poor or early genome assemblies easier (SynMap - SPA tool)'' ===
+
 
+
[[File:Spacapture.png|thumb|200px|'''Syntenic Path Assembly (SPA)''' window analysis]]
+
 
+
While the ''Plasmodium'' genome panorama has become more complete in recent years, there are still a large number of incomplete ''Plasmodium'' genomes. These types of genome data originate from different sources: poorly sequenced or assembled genomes, sequencing project which publish genomic information in its earlier stages of assembly, partially sequenced genomes, and genomes unassembled private genomes. A challenge for the sequencing and assembly of ''Plasmodium'' genomes is the number of repetitive elements, low complexity sequences, and multigene families which can vary largely between ''Plasmodium'' species and even among chromosome regions. Therefore, even with the use of reference genomes and the widespread usage of novel sequencing techniques, the assembly of Plasmodium genomes can be a complex task <ref>Chien JT, Pakala SB, Geraldo JA, Lapp SA, Humphrey JC, Barnwell JW, Kissinger JC, Galinski MR. 2016. High-Quality Genome Assembly and Annotation for Plasmodium coatneyi, Generated Using Single-Molecule Real-Time PacBio Technology.  Genome Announc. 4: e00883-16. https://www.ncbi.nlm.nih.gov/pubmed/27587810</ref>. While unassembled genomes can be used in multiple types of studies (''e.g.'' calculating the polymorphism on specific genes or genome regions), the information that they provide in more complex comparative genomics analyses can be limited.
+
 
+
[[File:SyntenicPathAssembly.png|thumb|200px|'''Syntenic Path Assembly (SPA)''' of ''P. inui'' contigs using ''P. coatneyi'' genome as a reference]]
+
 
+
Hereof, tools capable of identifying syntenic orthologs to a reference genome can be used to provide preliminary genome assemblies and allow the identification of genome elements of interest. One of CoGe tools, the [[Syntenic_path_assembly]] or '''SPA''', provides a quick genome assembly based on any selected reference genome. This tool can be used with any incomplete assembly in order to provide information about the syntenic regions between two genomes as illustrated by '''SynMap'''. Alternatively, '''SPA''' can also be used to correctly orient syntenic regions which have been annotated using reverse DNA strands (this functionality is fundamental for the accurate identification of inversion events and prevention of data miss interpretation). We will use the '''SPA''' tool to assemble the ''P. inui'' genome (currently on scaffold) against the complete ''P. coatneyi'' genome. 
+
 
+
 
+
The following steps shows how to use the '''SPA''' tool found in '''SynMap''':
+
 
+
'''1.''' Go to: https://genomevolution.org/coge/ and login into CoGe
+
 
+
'''2.''' Run a '''SynMap''' analysis between a completely sequence genome and an incomplete genome assembly. You can revise previous sections of this manuscript for instructions on how to run '''SynMap'''.
+
 
+
'''3.''' Once the '''SynMap''' has been generated find the ''Display Options'' tab. Find the '''SPA''' tool at the bottom of the screen. Select the tool by clicking on the check mark next to: <span style="color:blue">The Syntenic Path Assembly (SPA)?</span>
+
 
+
'''4.''' After a few minutes (depending of the number of contigs), the incomplete genome will be assembled using the second genome as a reference.
+
 
+
 
+
Note that while using '''SPA''' allows you to observe syntenic regions between the two genomes to a certain degree there are some significant limitations regarding its assembly interpretation. For one, the incomplete genome will be assembled using a reference provided by the user. This means that contigs will be arranged on '''SynMap''' in a way that allows the largest level of synteny between the incomplete genome and the selected reference. Thus, it is evident that the assembly of contigs will not be the same when different reference genomes are used. For instance, ''P. inui'' genome can be assembled using ''P. coatneyi'' (a closely related species) or ''P. falciparum'' (a species from the ''Laveranian'' subgenus). In both cases, the synteny of the incomplete genome displayed on '''SynMap''' will be maximized, even though significant rearrangement events are evident when these two complete genomes are compared. Therefore, '''SPA''' reference genomes should be selected after consideration of the biological and evolutionary relation between species.
+
 
+
Another element of care should be the identification of rearrangement events such as inversions or duplications. Various contigs can potentially be syntenic to a same region and be incorrectly identified as a duplication event; on the other hand, contigs could have been annotated using a reverse DNA strand, showing a pattern which can be incorrectly identified as an inversion. Both potentially misinterpreted events are illustrated in the '''SPA''' assembly of ''P. inui'' using ''P. coatneyi'' genome as a reference using black circles. The analysis can be replicated using the following link:  https://genomevolution.org/r/ljen
+
 
+
 
+
=='''Overall conclusion'''==
+
 
+
By comparatively analyzing genomes with different levels of relation within ''Plasmodium'', it is possible to understand the origins and evolutionary forces shaping significant genome elements. The number of available ''Plasmodium'' genomes has increased markedly during recent years providing an unprecedented opportunity to understand evolution on this genus. Furthermore, the unique qualities of the different ''Plasmodium'' genomes can be explored in detail.
+
 
+
Thanks to worldwide efforts, there has been a large reductions in the number of malaria cases and deaths between 2000 and 2015. By 2015, it was estimated that the number of malaria cases had decreased from 262 million to 214 million, and the number of malaria related deaths from 839,000 to 438,000 <ref>World Health Organization. (2015). World Malaria Report 2015. Retrieved from http://www.who.int/malaria/publications/world-malaria-report-2015/report/en/</ref>. While this is an enormous achievement for malaria treatment and control strategies, human infectious of ''P. cynomolgi''  <ref>Ta TH, Hisam S, Lanza M, Jiram AI, Ismail N, Rubio JM. 2014. First case of a naturally acquired human infection with Plasmodium cynomolgi. Malar J. 13: 68. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3937822/</ref> and ''P. knowlesi'' <ref>Singh B, Daneshvar C. 2013. Human infections and detection of Plasmodium knowlesi. Clin Microbiol Rev. 26:165-84.  https://www.ncbi.nlm.nih.gov/pubmed/23554413</ref> have been reported on SouthEast Asia. In addition, various ''Plasmodium'' species from the ''Laveranian'' subgenus, including ''P. falciparum'' strains, have been found in African primates <ref>Prugnolle F, Durand P, Neel C, Ollomo B, Ayala FJ, Arnathau C, Etienne L, Mpoudi-Ngole E, Nkoghe D, Leroy E, Delaporte E, Peeters M, Renaud F. 2010. African great apes are natural hosts of multiple related malaria species, including ''Plasmodium falciparum''. Proc Natl Acad Sci U S A. 107:1458-63. https://www.ncbi.nlm.nih.gov/pubmed/20133889</ref><ref>Duval L, Fourment M, Nerrienet E, Rousset D, Sadeuh SA, Goodman SM, Andriaholinirina NV, Randrianarivelojosia M, Paul RE, Robert V, Ayala FJ, Ariey F. 2010. African apes as reservoirs of ''Plasmodium falciparum'' and the origin and diversification of the ''Laverania'' subgenus. Proc Natl Acad Sci U S A. 107:10561-6. https://www.ncbi.nlm.nih.gov/pubmed/20498054</ref> suggesting a potential role of wild primates as malaria reservoirs. Both examples illustrate the plasticity of the ''Plasmodium'' genome, where species barrier are more likely to be breached than we would desire. In this regard, ''Plasmodium'' related studies should not only be focused on those species of major human interest, but also partially devote to gain a better understanding of the evolution in the genus. Thus, the use of platforms like CoGe, where genomes can be easily imported, analyzed, visualized and made public represents an essential step in furthering comparative genomes in the ''Plasmodium'' genus.
+
 
+
We have used the different tools available on CoGe to successfully test various hypotheses significant for understanding ''Plasmodium'' evolution. In addition, we have use this platform to further characterize both general and specific genome elements on sequenced ''Plasmodium'' species and strains. In order to attain an even more complete panorama on the complex evolutionary history in this genus, genomes from ''Plasmodium'' species ancestral to the ''Laveranian'' subgenus are required. Evolutionary questions such as the origins on the AT richness observed in the ''Laveranian'' subgenus, the potential changes in synteny between mammal and non-mammal infecting ''Plasmodium'' species, and the expansion/contraction/origin of multigene families can be more clearly evaluated once these genomes become publicly available, and by their incorporation into the CoGe platform, these questions can be readily evaluated. Overall, our results show that the complexities of the ''Plasmodium'' genome can be effectively analyzed in CoGe, and that by doing this, opportunities for furthering our understanding of malaria evolution and developing novel hypothesis are open.
+
  
  
Line 438: Line 70:
 
===Sample data===
 
===Sample data===
  
:'''Gene sequence used on ''CoGeBlast'' analysis (obtained from PlasmoDB):'''
+
*'''Gene sequences used on ''CoGeBLAST'' analysis (obtained from PlasmoDB):'''
 +
:PVX_113230.1  | Plasmodium vivax Sal-1 | variable surface protein Vir14-related (http://plasmodb.org/plasmo/app/record/gene/PVX_113230)
 +
:PVX_096004.1  | Plasmodium vivax Sal-1 | VIR protein  (http://plasmodb.org/plasmo/app/record/gene/PVX_096004)
 +
 
 +
*'''Gene sequence used on ''SynFind'' to inform ''GEvo'' analysis (obtained from PlasmoDB):'''
 
:PVX_003830.1  | Plasmodium vivax Sal-1 | serine-repeat antigen 5 (SERA) (http://plasmodb.org/plasmo/app/record/gene/PVX_003830)
 
:PVX_003830.1  | Plasmodium vivax Sal-1 | serine-repeat antigen 5 (SERA) (http://plasmodb.org/plasmo/app/record/gene/PVX_003830)
  
:'''Gene sequences used on ''CoGeBlast'' used to inform ''GEvo'' analysis (obtained from PlasmoDB):'''
+
*'''Gene sequences used on ''CoGeBLAST'' to inform ''GEvo'' analysis (obtained from PlasmoDB):'''
 
:PF3D7_0424100.1  | Plasmodium falciparum 3D7 | reticulocyte binding protein homologue 5 (http://plasmodb.org/plasmo/app/record/gene/PF3D7_0424100)
 
:PF3D7_0424100.1  | Plasmodium falciparum 3D7 | reticulocyte binding protein homologue 5 (http://plasmodb.org/plasmo/app/record/gene/PF3D7_0424100)
 
:PVX_096410.1  | Plasmodium vivax Sal-1 | cysteine repeat modular protein 2, putative (http://plasmodb.org/plasmo/app/record/gene/PVX_096410)
 
:PVX_096410.1  | Plasmodium vivax Sal-1 | cysteine repeat modular protein 2, putative (http://plasmodb.org/plasmo/app/record/gene/PVX_096410)

Latest revision as of 14:29, 14 February 2017

About this guide

This 'cookbook' style document is meant to provide an introduction to many of our tools and services and is structured around a case study of investigating genome evolution of the malaria-causing Plasmodium spp. The small size and unique features of this pathogen's genome make it ideal for beginning to understand how our tools can be used to conduct comparative genomic analyses and uncover meaningful discoveries.

Through a number of example analyses, this guide will teach users about the following tools:

  • LoadGenome: Add a new genome to CoGe.
  • LoadAnnotation: Add structural and/or functional annotations to a genome.
  • GenomeInfo: Get information about a genome.
  • GenomeList: Get information about several genomes in a table.
  • CoGeBLAST: BLAST against any set of genomes.
  • GEvo: Microsynteny analysis.
  • SynMap: Whole genome syntenic analysis.
- SynMap#Calculating_and_displaying_synonymous.2Fnon-synonymous_.28Ks.2C_Kn.29_data: Characterize the evolution of populations of genes.
- SPA tool: Syntenic Path Assembly to assist in genome analysis.
  • SynFind: Identify syntenic genes across multiple genomes.
  • CodeOn: Characterize patterns of codon and amino acid evolution in coding sequence.


FOLLOW THIS LINK FOR A QUICK OVERVIEW OF Plasmodia comparative genomics WITH COGE.


A brief introduction to Plasmodium genome evolution

The genus Plasmodium emerged ~40 million years ago and harbors roughly 200 species of parasitic protozoa better known as malaria parasites. All Plasmodium species have a complex life cycle involving some kind of vertebrate host and a mosquito vector. In addition, Plasmodium species share similar life cycle characteristics, albeit with a few exceptions (e.g. hypnozoites). Plasmodium genomes are tiny (between 17-28Mb) in comparison to those of their vertebrate (1Gb for birds; 2-3Gb for mammals) and mosquito (230–284Mbp) hosts [1]. All Plasmodium genomes consist of fourteen chromosomes (nuclear genome), as well as a mitochondrial and apicoplast genome. Despite these shared genomic characteristics, the structural organization, gene content, and sequence of Plasmodium genomes is highly variably within the genus [2]. The exact origins and mechanisms of these differences remain largely unexplored, however, they are generally hypothesized to stem from host shift events [3][4].

An increase in funding devoted to malaria research has coincided with a dramatic increase in publicly available genomic information for Plasmodium [5]. The most prominent repository is found at NCBI/Genbank [6]; while additional and unique sequences can also be found on other databases: PlasmoDB [7], GeneDB [8], and MalAvi [9]. This wealth of genomic data facilitates detailed comparative genomic approaches, opening the possibility to:

  • Infer origins of certain traits, specialized phenotypes, and genomic features.
  • Track the maintenance of conserved genes across the genus, as well as the gain or loss of genes unique to a single species or a group of closely related species.
  • Identify the potential historical interactions that might have lead to the development of genomic adaptations.


Finding and integrating Plasmodium genomes in CoGe

You can find the details of Plasmodium spp. genome integration in the following link: Finding and intregating Plasmodium genomes to CoGe


Comparative analyses workflows

The following links direct to specific tools for the comparative analysis of Plasmodium genomes:

Plasmodium analysis workflow 1: Tools that evaluate genomic properties and amino acid usage

Plasmodium analysis workflow 2: Tools for the syntenic analysis of whole genomes and microsyntenic regions

Plasmodium analysis workflow 3: Tools useful on the study of multigene families


Overall conclusions

Insights into the unique patterns of Plasmodium biology, epidemiology, ecology, and genetics can be obtained from molecular and comparative genomic studies. The rapid growth of genomic information makes implementing tools that facilitate assessing genome evolutionary trends an imperative task. The services and tools provided by the CoGe platform are of considerable use in advancing Plasmodium comparative genomics. Here, we showed how various CoGe tools could be used to assess evolutionary patterns unique to Plasmodium. We also showed how to use this platform to further characterize sequenced Plasmodium genomes. Overall, we have demonstrated that CoGe’s tools can be used to address evolutionary questions such as:

  • The evolutionary origins of Laveranian AT-rich genomes.
  • The location and nature of genome rearrangements between Plasmodium.
  • The evolutionary patterns of genes crucial in cell invasion.
  • The evolutionary trends of multigene families.


Useful links

Plasmodium Notebooks in CoGe

Link to Notebook for published Plasmodium genome data: https://genomevolution.org/coge/NotebookView.pl?lid=1753
Link to Notebook for published P. falciparum strains: https://genomevolution.org/coge/NotebookView.pl?lid=1758
Link to Notebook for published P. vivax strains: https://genomevolution.org/coge/NotebookView.pl?lid=1760
Link to Notebook for published Plasmodium apicoplast data: https://genomevolution.org/coge/NotebookView.pl?lid=1754
Link to Notebook for published Plasmodium mitochondrion data: https://genomevolution.org/coge/NotebookView.pl?lid=1756

Sample data

  • Gene sequences used on CoGeBLAST analysis (obtained from PlasmoDB):
PVX_113230.1 | Plasmodium vivax Sal-1 | variable surface protein Vir14-related (http://plasmodb.org/plasmo/app/record/gene/PVX_113230)
PVX_096004.1 | Plasmodium vivax Sal-1 | VIR protein (http://plasmodb.org/plasmo/app/record/gene/PVX_096004)
  • Gene sequence used on SynFind to inform GEvo analysis (obtained from PlasmoDB):
PVX_003830.1 | Plasmodium vivax Sal-1 | serine-repeat antigen 5 (SERA) (http://plasmodb.org/plasmo/app/record/gene/PVX_003830)
  • Gene sequences used on CoGeBLAST to inform GEvo analysis (obtained from PlasmoDB):
PF3D7_0424100.1 | Plasmodium falciparum 3D7 | reticulocyte binding protein homologue 5 (http://plasmodb.org/plasmo/app/record/gene/PF3D7_0424100)
PVX_096410.1 | Plasmodium vivax Sal-1 | cysteine repeat modular protein 2, putative (http://plasmodb.org/plasmo/app/record/gene/PVX_096410)


References

  1. DeBarry JD, Kissinger JC. 2011. Jumbled Genomes: Missing Apicomplexan Synteny. Mol Biol Evol. 2011 Oct; 28(10): 2855–2871. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176833/
  2. Carlton JM, Perkins SL, Deitsch KW. 2013. Malaria Parasites. Caister Academic Press
  3. Prugnolle F, Durand P, Ollomo B, Duval L, Ariey F, Arnathau C, Gonzalez JP, Leroy E, Renaud F. 2011. A Fresh Look at the Origin of Plasmodium falciparum, the Most Malignant Malaria Agent. PLoS Pathog. 7: e1001283. http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1001283
  4. Prugnolle F, Rougeron V, Becquart P, Berry A, Makanga B, Rahola N, Arnathau C, Ngoubangoye B, Menard S, Willaume E, Ayala FJ, Fontenille D, Ollomo B, Durand P, Paupy C, Renaud F. 2013. Diversity, host switching and evolution of Plasmodium vivax infecting African great apes. Proc Natl Acad Sci U S A. 110:8123-8. https://www.ncbi.nlm.nih.gov/pubmed/23637341
  5. Buscaglia CA, Kissinger JC, Agüero F. 2015. Neglected Tropical Diseases in the Post-Genomic Era. Trends Genet. 31:539-55. https://www.ncbi.nlm.nih.gov/pubmed/26450337
  6. Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2016. GenBank. Nucleic Acids Res. 44: D67–D72. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702903/
  7. Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ Jr, Treatman C, Wang H. 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37:D539-43. https://www.ncbi.nlm.nih.gov/pubmed/18957442
  8. Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, Carver T, Aslett M, Olsen C, Subramanian S, Phan I, Farris C, Mitra S, Ramasamy G, Wang H, Tivey A, Jackson A, Houston R, Parkhill J, Holden M, Harb OS, Brunk BP, Myler PJ, Roos D, Carrington M, Smith DF, Hertz-Fowler C, Berriman M. 2012. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40:D98-108. https://www.ncbi.nlm.nih.gov/pubmed/22116062
  9. Bensch S, Hellgren O, Pérez-Tris J. 2009. MalAvi: a public database of malaria parasites and related haemosporidian in avian hosts based on mitochondrial cytochrome b lineages. Mol Ecol Resour. 9:1353-8. https://www.ncbi.nlm.nih.gov/pubmed/21564906