Difference between revisions of "Using CoGe for the analysis of Plasmodium spp"

From CoGepedia
Jump to: navigation, search
(Finding about the Plasmodium spp. genomes present in CoGe)
(Using Ks in SynMap Legacy Version and SynMap2)
Line 187: Line 187:
 
In a more complex example, it is possible to identify sets of chromosome fusion/division events unique to different genomes. In the following example, pairwise comparisons between the genomes of four closely related ''Plasmodium'' parasites: ''P. ovale curtisi'', ''P. malariae'', ''P. coatneyi'' and ''P. knowlesi''; show that at least two sets of inversions and fusions have occurred in the ''P. coatneyi'' and ''P. malariae'' genomes. SynMap results show two fusion events in chromosomes 5 and 9 unique to ''P. malariae'' (market with red squares) and two additional fusion events in chromosomes 13 and 14 of ''P. coatneyi'' (marked with green squares). Moreover, and inversion event can be observed in the central region of chromosome 4 in ''P. malariae'' (marked with a red circle). This example will also be used to illustrate the use of Ks analyses on the following section.
 
In a more complex example, it is possible to identify sets of chromosome fusion/division events unique to different genomes. In the following example, pairwise comparisons between the genomes of four closely related ''Plasmodium'' parasites: ''P. ovale curtisi'', ''P. malariae'', ''P. coatneyi'' and ''P. knowlesi''; show that at least two sets of inversions and fusions have occurred in the ''P. coatneyi'' and ''P. malariae'' genomes. SynMap results show two fusion events in chromosomes 5 and 9 unique to ''P. malariae'' (market with red squares) and two additional fusion events in chromosomes 13 and 14 of ''P. coatneyi'' (marked with green squares). Moreover, and inversion event can be observed in the central region of chromosome 4 in ''P. malariae'' (marked with a red circle). This example will also be used to illustrate the use of Ks analyses on the following section.
  
=== ''Using Ks in SynMap Legacy Version and SynMap2'' ===
+
=== ''Measuring Ks/Kn values between genomes'' ===
  
 
laverania
 
laverania
 
  
 
=== ''Using Syntenic Path Assembly (SPA)'' ===
 
=== ''Using Syntenic Path Assembly (SPA)'' ===

Revision as of 14:46, 29 September 2016

Finding and importing data into CoGe

Finding about the Plasmodium spp. genomes present in CoGe

The number of Plasmodium genomes available to the public increases yearly. Numerous research groups are working on completing the Plasmodium genome panorama, leading to reposition of diverse genome sequences under diverse levels of completion and originating from a variety of databases. A large number of Plasmodium genomes have been deposited on the National Center for Biotechnology Information (NCBI); however, additional databases such as PlasmoDB ([1]), GeneDB ([2]) and MalAvi ([3]) also carry addiional Plasmodium genome sequences.

To search for plasmodia genomes in CoGe, just type in "plasmod" into the search bar at the top of most pages. This will retrieve all organisms and genomes with names matching your search term.

Screen Shot 2016-09-29 at 1.43.09 PM.png

In order to attain a better picture of Plasmodium spp. genome evolution, the CoGe platform can be used to perform diverse comparative analyses. Currently, there is a number of Plasmodium genomes available on the CoGe database. You can obtain more about them by following these steps:


1. Go to: https://genomevolution.org/coge/

2. Create an account / login into CoGe: How to get a CoGe account

3. On the main CoGe page, find the Tools tile and click on to Organism View ([4])

4. Organism View allows the researcher to find all publicly available genomes uploaded into CoGe and browse any corresponding information. You can find any published genome by typing a scientific name into the Search box. For each organism uploaded to CoGe you will find the following information:

Organisms: In the case of Plasmodium spp., the different parasitic strains currently uploaded. Any organelle genomes independently uploaded (mitochondrial and apicoplast) can also be found here.
Organism Information: provides an outline of organisms’ taxonomy (following that published on NCBI), quick links to some of the main CoGe analysis tools, and the search engines were information can be found.
Genomes: All the genome versions for this species. Selecting different genome versions modifies al other output observed in this page; in addition, it allows the user to access to previous versions of a published genome (e.g. access scaffolds from a previous genome version currently under the chromosome assemble level).
Genome information: Shows the genome IDs, type of sequences uploaded and length of the whole genome. In addition, this tab allows the user to directly perform analyses using the CoGe platform.
Datasets: This section will show the number of datasets included for this genome. In the case of completely sequenced Plasmodium genomes, this will indicate the code numbers for the datasets of each individual chromosome.
Dataset information: Provides specific information for each individually selected dataset including. Information includes the accession numbers (if available), source of the upload, chromosome length and GC%.
Chromosomes: Shows the number of available chromosome for the selected genome. However, depending of the methodology used to upload the data into CoGe and the nature of the dataset itself, the count and length of chromosomes shown will be larger than expected (e.g. will show the number of contigs in lieu of the number of chromosomes). For whole sequenced genomes, specific IDs under the Dataset section will showcase the chromosome number and length.
Chromosome information: Shows the chromosome ID and the number of base pairs for that chromosome.

5. Under Genome Information, clicking on the Genome Info section permits the user to access to a more detailed genome description. It also allows access to other quick links to comparative analysis tools available on CoGe.

Importing Plasmodium spp. genomes into CoGe

While data can be uploaded into CoGe using a variety of methods, we will focus on the two most relevant for the incorporation of Plasmodium spp. genomes. We will follow each method with an example. For additional information, please check the following link: [[5]]

Importing genomes from using the "Upload" method

Depending on the researcher's interests, it might be desired to perform analyses using complete Plasmodium genomes or focus only in specific organelles and chromosomes. In order to upload a complete Plasmodium genome, make sure to follow these steps:
Screen capture of Plasmodium vivax genome's webpage on NCBI


1. In the upper part of the screen, find the Representative Genome section. Below, the Download Sequences in FASTA format and Download Genome Annotation sections can be found.
- To download the complete Plasmodium vivax genome, click on Genome under Download Sequences in FASTA
- To download the complete annotation for the Plasmodium vivax genome, clich on GFF under Download Genome Annotation
2. Both files will be downloaded to your desired folder into your local computer.
Step 7: Screen capture of researcher's CoGe MyData tab
3. Go to: [[6]]
4. Login into CoGe.
5. Click on the MyData section on the upper left part of the screen.
6. This will lead to the Data section of your personal CoGe page. This section will fill up as genomes of interest are uploaded into CoGe.
7. On the upper left section of the screen, click the NEW button and select New Genome from the dropdown menu.
Step 8: Screen capture of Create New Organism window at CoGe. Notice the different name of the selected strain and the one written under "Name"
8. Once on the page to Create a New Genome into CoGe, information about the organisms taxonomy and the genome's origin must be inputed. Depending of the type of organism being uploaded, taxonomic information might have not been yet included into CoGe. If this is the case, a new organisms should be created. To do this the following steps should be followed:
a. Click on NEW on the "Organism:" section
b. On the Search NCBI box type the scientific name of the organism to be uploaded. If the organism of interest is not on NCBI, select the closet taxonomic relative. In the case of Plasmodium several strains might be available for a single species, make sure to select the correct strain or, if a new strain is being uploaded, to add the new strain name.
c. Click Create
9. Once the new strain/genome has been added, additional information should be included as well. Depending on the number of genome versions for the selected genome available at CoGe, a different number will be typed on Version. Thus, it is important to check the number of genome version already available on CoGe before inputing a new version. Under the section named Type, select the adequate sequence type from the drop menu (most sequences can be identified as unmasked, [[7]]). Select the Source from the next dropdown menu (in this case NCBI, but there are many other sources available including Private sources). Check if you desire your genome to be Restricted or not.
- Restricted genomes can only be seen and analyzed by the user and those to whom it has been shared.
- Unrestricted genomes are available for the general public
10. Once done click Next
11. Genome files themselves can be uploaded in this window using four different strategies: first, data can be uploaded directly from the Cyverse Data Store (if the data is not on the Data Store, it can be easily uploaded there afterwards once it has been included in CoGe); second, creating an HTP/FTTP link directly to the data; third, Upload the data from a private computer, and fourth, uploading the data using GenBank accession numbers. In the following example, the data will be uploaded using the Upload option.
12. Select the downloaded file and wait for the file to be read by CoGe. Once the file is read select Next.
13. Click Start on the next screen to begin upload.
14. Once the genome has been uploaded, all information included by the user, as well as any specifics regarding the genome FASTA file itself will be visible in the Genome Information page. Note that genomes in earlier stages of assembly (e.g. Scaffolds) can be uploaded into CoGe using these steps.
Step 16: Complete genome and annotation upload into CoGe
15. At this point, genome annotation files can be also uploaded into CoGe for the specified genome. These files can be included by clicking on the green Load Sequence Annotation button under the Sequence & Gene Annotation menu. Note that some limited analyses can be performed in CoGe even when genome annotation data is not yet available. Also, any specific upload can be updated at any point in time in CoGe. Thus, genome annotation data, metadata or experimental data can be included for the same genome in CoGe as soon as they become available.
16. The process to upload an annotation is similar to that of uploading genome. Under the Describe your annotation page, the user can select the version and source of the annotation data. After clicking Next, the data can be uploaded directly from the Cyverse Data Store, by creating an HTP/FTTP link directly to the data, or using the Upload option. Both GFF and GTF files are accepted when the genome annotation data is uploaded from a private computer. By clicking Next, the annotation data associated to the genome is uploaded and stored on CoGe. Now, the information should be visible under to Genome Information page under the Sequence & Gene Annotation menu. For more details about uploading genome annotation follow this link: [[8]]


Step 1: Screen capture of NCBI chromosome section under the Plasmodium chabaudi genome tab on NCBI

Importing genomes from using the "NCBI/Genebank" method

Now, it is also possible to specifically upload chromosomes and organelles's genomes into CoGe. The following steps show how to upload individual chromosomes into CoGe:


1. In the lower part of the screen, find the Reference Genome section. The RefSeq and INSDC numbers for each chromosome can be found here.
2. Follow steps 3-10 from the previous section.
Step 3: Screen capture of genome upload to CoGe using GenBank ID numbers
3. Select the GenBank accession numbers option. Type or Copy/Paste the INSDC numbers for each Plasmodium chromosome or for specific Plasmodium organelle genomes. After typing each number click the Get button. The uploaded genome should appear under Selected file(s). Once all the desired genomes have been uploaded select Next to begin the upload.
4. Once the genome has been uploaded, all information included by the user, as well as any specifics regarding the genome FASTA file itself will be visible in the Genome Information page. Note that uploading chromosomes/genomes using this method inputs information of genome annotation already included in GenBank. Also, notice that genomes uploaded using this method become public and are visible by all users of CoGe.

Exporting genomes from CoGe to Cyverse

Data can be uploaded into Cyverse for easy sharing and storage once it has been uploaded into CoGe. This is highly recommended for complete and certified data. Using CoGe to upload data into the CyVerse data Store is remarkably simple:
1. While logged into CoGe, go to the Genome Information page of the genome you want to add.
2. Under the Tools menu, find the Export to CyVerse Data Store option. Click on FASTA and GFF to upload the genome and annotation, respectively. Make sure to provide any specifics when uploading the annotation data (GFF file).
3. Wait until the upload is completed. From this point forward, the data will be also found in the CyVerse Data Store. Note that no modification can be performed to the uploaded genomes at the moment, so is recommended to keep a list of the uploaded genome codes provided by CyVerse and their corresponding organism.

Using CoGe tools to perform comparative analyses

Analyzing GC content and other genomic properties

Genome's %GC as seen in the Genome Information page of CoGe

It is possible to calculate the GC content for each Plasmodium genome via the GenomeInfo section under genome information. For genomes uploaded using GenBank, this information will already be displayed on the page. Genomes uploaded from private computers or using other methods, as well as genomes in earlier stages of assembly, will not have this information on display from the start. However, simply clicking on %GC on the Length and/or Noncoding sequence lines under the Statistics tab these measures will be promptly calculated by CoGe.

A simpler method to comparatively assess GC content variations across genomes is by using GenomeList. This tool permits to upload one or more genomes of interest into a list and perform genome specific calculations for a variety of features: amino acid usage, codon usage, and genomic features and CDS GC content. In addition, this table also summarizes genome information included by the user: sequence type, sequence origin, taxonomy, provenance, version uploaded to CoGe, etc. Moreover, GenomeList can be used on genomes on earlier levels of assembly.


The following steps indicate how to perform comparative analyses using the GenomeList in CoGe:

Step 5: Upload of eight Plasmodium genomes to Genome List

1. Go to: [[9]] and login into CoGe

2. In the main page of CoGe, find the Tools tile and click on to Organism View ([10])

3. Type the scientific name of the organism of interest on the Search box and select the desired version of the uploaded genome.

Step 7: Genome List used to compare 8 Plasmodium species. Link to this analysis: https://genomevolution.org/r/lmzp

4. Find the Genome Information tile on the right side of the screen. Under the Tools line find Add to GenomeList and click. This will automatically generate a new window where the selected genome has been added.

5. Without closing the window from step 4, type the scientific name of other organisms of interest on the same Search box used before. Once the second organism's genome has been selected, click on Add to GenomeList. The second select organism should appear on the small window. You can add as many organisms as desired.

6. Once all genomes have been selected click on the green Send to Genome list button.

7. After a couple of seconds, features and information for all included genomes will be available for comparison on GenomeList. While some information related to the nature of the upload itself, several columns provide the links to perform genome specific calculations. Note that by clicking on the Change Viewable Columns green button on the upper right part of the screen, is possible to select which columns are under display on the screen.

8. It is possible to download information from the selected genomes under a variety of formats using "Send Selected Genomes to". Note that the information downloaded will correspond to the genomes themselves and not to the calculations and analyses performed on GenomeList.

Comparing genomes to identify chromosomal inversions, fusions, fissions and other events

Closely related Plasmodium species have a tendency to show highly conserved synteny blocks. Nonetheless, due to the increasing number of sequenced Plasmodium genomes publicly available, it is possible to estimate the lineage in which events leading to loss of synteny have occurred. A previously discussed example, involves the loss of synteny on chromosomes 3 and 6 between P. vivax, P. cynomolgi and P. knowlesi [[11]]. These three species will be used as an example to demonstrate how interspecific SynMap analyses are performed and to determine how the results obtained from the CoGe SynMap tool compared to previously published ones.

Step 7: SynMap input screen. The synteny of Plasmodium cynomolgi B strain (Organism 1) will be analyzed respect to that of Plasmodium vivax Salvador 1 strain (Organism 2)

These steps show how to perform comparative analyses between two Plasmodium species using the SynMap tool at CoGe:

1. Go to: [[12]]

2. Login into CoGe

3. On the main CoGe page, find the Tools tile and click on to Organism View ([13])

5. Type the scientific name of the desired species on the Search box, and click on the GenomeInfo link under the Genome Information tile

6. Find the SynMap link on the Analyze section of the Tools tile

Step 8: SynMap Legacy output screen. From left to right: P. vivax vs. P. cynomolgi SynMap output, P. vivax vs. P. knowlesi SynMap output and P. knowlesi vs. P. cynomolgi SynMap output

7. By default, SynMap allows the user to compare the synteny of a genome with itself. This can be of great use to characterize a genome and perform rapid comparisons to detect and putatively time certain duplication events [14]. In this example however, the genomes of two different organisms will be analyzed. Different genomes can be selected for Organism 1 or 2 by typing the scientific name of the desired organism of either search box and then selecting the intended genome. A P. vivax genome has been selected to be analyze with P. cynomolgi. Once the organisms have been selected click on Generate SynMap

8. Once the analysis has been completed, SynMap will output a graphical depiction of the syntenic regions between the two genomes. In the following example the output shows two chromosomal regions on chromosome 3 and 6 were an inversion has occurred (Regenerate this analysis: https://genomevolution.org/r/lj12). The same analysis can be performed between other two species pairs in order to identify their syntenic relation and to make inferences regarding the origin of the chromosomal inversions (Regenerate P. knowlesi vs. P. cynomolgi analysis: https://genomevolution.org/r/lj1x and P. knowlesi vs. P vivax analysis: https://genomevolution.org/r/lj1t)

The output shows that the chromosomal inversion events are observed in both comparisons between P. vivax with P. knowlesi and P. cynomolgi; nonetheless, when P. cynomolgi and P. knowlesi are compared, no inversion events are observed. This suggest that the chromosomal inversions reported for chromosomes 3 and 6 have occurred after the split of P. cynomolgi and P. vivax approximately between 3.43-3.87 Mya [15]

Different sets of events leading to loss of synteny are identified by performing pairwise comparisons in SynMap Legacy. Upper row from left to right: P. knowlesi vs. P. malariae; P. coatneyi vs. P. knowlesi; P. coatneyi vs. P. malariae. Lower row from left to right: P. ovale vs. P. malariae; P.coatneyi vs. P. ovale; P. ovale vs. P. knowlesi


In a more complex example, it is possible to identify sets of chromosome fusion/division events unique to different genomes. In the following example, pairwise comparisons between the genomes of four closely related Plasmodium parasites: P. ovale curtisi, P. malariae, P. coatneyi and P. knowlesi; show that at least two sets of inversions and fusions have occurred in the P. coatneyi and P. malariae genomes. SynMap results show two fusion events in chromosomes 5 and 9 unique to P. malariae (market with red squares) and two additional fusion events in chromosomes 13 and 14 of P. coatneyi (marked with green squares). Moreover, and inversion event can be observed in the central region of chromosome 4 in P. malariae (marked with a red circle). This example will also be used to illustrate the use of Ks analyses on the following section.

Measuring Ks/Kn values between genomes

laverania

Using Syntenic Path Assembly (SPA)

fragille


Using CodeOn

tables