Difference between revisions of "Fish Comparative Genomics"

From CoGepedia
Jump to: navigation, search
(Whole genome syntenic analysis)
 
(45 intermediate revisions by 2 users not shown)
Line 14: Line 14:
 
<br>
 
<br>
  
 +
== Sample Datasets ==
 +
To access sample datasets
 +
 +
*''Takifugu rubripes'' genome (http://de.iplantcollaborative.org/dl/d/05199A88-82ED-4D09-94AA-99CF3D8B64FE/Takifugu_rubripes_genome_NCBIv1.faa)
 +
 +
*''Takifugu rubripes'' genome annotation (http://de.iplantcollaborative.org/dl/d/989C12FD-D62C-48B5-8FB8-80706BCB028D/T_rubripes_annotation_GCF_000180615.1.gff.gz)
  
 
== Setting the hook: getting data loaded into CoGe  ==
 
== Setting the hook: getting data loaded into CoGe  ==
  
 
*Loading new genomes into CoGe  
 
*Loading new genomes into CoGe  
**Loading from the iPlant Data Store When loading a new genome  
+
**Loading genomes into the Bio.ci (iPlant Data Store) for the first time
**Using FTP/HTTP links for uploading genomes  
+
***[http://user.iplantcollaborative.org/ Sign up for an account or log in]
 +
***Access the [https://de.iplantcollaborative.org/de/ Data Store]
 +
***Direct upload from your computer
 +
****Press the upload tab on the top left and select 'Simple upload from Desktop'
 +
****Find the genome(s) FASTA file and select
 +
****Enjoy a picnic outside while they upload
 +
***Using FTP/HTTP links for uploading genomes in the Data Store
 +
****1) In the top left of the screen click on the 'Upload' button
 +
*****a) Select 'Import using URL'
 +
*****b) This will create several spaces to paste URLs into so that the Data Store will retrieve files (in this case genome FASTA files/.fa files). Once the blank areas come up, open a new tab in your browser
 +
****2) copy the link location of the genome FASTA file from an FTP site or webpage
 +
****3) paste the URL into a blank on the Data Store page
 +
****Enjoy some tea or coffee while they upload
 +
**Once the genomes have been uploaded into the Data Store
 +
***Select them and move them into the folder names 'coge_data'
 +
***The genome should now be visible in the CoGe platform
 +
 
 
**Direct upload  
 
**Direct upload  
**NCBI Loader <br>
 
  
<br>  
+
**[[CoGe NCBI Loader]] <br>
 +
 
 +
 
 +
*'''''OR''''' Use genomes already available in CoGe
 +
**Select your favorite fish genome from the table below
 +
**Or [https://genomevolution.org/CoGe/OrganismView.pl search CoGe] for other organisms using [[OrganismView]]
  
 
<br>  
 
<br>  
  
 +
<br>
 +
===Sequenced fish genomes===
 
{| width="1203" cellspacing="1" cellpadding="3" border="1"
 
{| width="1203" cellspacing="1" cellpadding="3" border="1"
 
|+ Table of fish genomes publicly available in CoGe. Bold denotes species with annotations.  
 
|+ Table of fish genomes publicly available in CoGe. Bold denotes species with annotations.  
Line 37: Line 65:
 
! scope="col" | '''CoGe Genome Links'''
 
! scope="col" | '''CoGe Genome Links'''
 
|-
 
|-
! scope="col" rowspan="2" |  
+
! rowspan="2" scope="col" |  
 
Hyperoartia  
 
Hyperoartia  
  
(jawless fish)
+
(jawless fish)  
  
 
| ''Lethenteron camtschaticum''  
 
| ''Lethenteron camtschaticum''  
Line 72: Line 100:
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25005 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25041 masked]
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25005 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25041 masked]
 
|-
 
|-
! scope="col" rowspan="34" |  
+
! rowspan="34" scope="col" |  
Actinopterygii
+
Actinopterygii  
  
(ray-fin fish)
+
(ray-fin fish)  
  
 
| ''Anguilla japonica''  
 
| ''Anguilla japonica''  
Line 111: Line 139:
 
| [http://www.ncbi.nlm.nih.gov/genome/50 50]  
 
| [http://www.ncbi.nlm.nih.gov/genome/50 50]  
 
| Model species  
 
| Model species  
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25050 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25064 masked]
+
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25001 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25037 masked]
 
|-
 
|-
 
| ''Dicentrarchus labrax''  
 
| ''Dicentrarchus labrax''  
Line 135: Line 163:
 
| [http://www.ncbi.nlm.nih.gov/genome/146 146]  
 
| [http://www.ncbi.nlm.nih.gov/genome/146 146]  
 
| Model species  
 
| Model species  
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=23854 unmasked]
+
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=23854 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25191 masked]
 
|-
 
|-
 
| '''''Haplochromis burtoni'''''  
 
| '''''Haplochromis burtoni'''''  
Line 146: Line 174:
 
| Blue mbuna  
 
| Blue mbuna  
 
| [http://www.ncbi.nlm.nih.gov/genome/2638 2638]  
 
| [http://www.ncbi.nlm.nih.gov/genome/2638 2638]  
| <br>
+
| <br>  
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24912 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24961 masked]
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24912 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24961 masked]
 
|-
 
|-
Line 162: Line 190:
 
|-
 
|-
 
| ''Mchenga conophoros''  
 
| ''Mchenga conophoros''  
| <br>
+
| <br>  
 
| [http://www.ncbi.nlm.nih.gov/genome/2585 2585]  
 
| [http://www.ncbi.nlm.nih.gov/genome/2585 2585]  
| <br>
+
| <br>  
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24914 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24919 masked]
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24914 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24919 masked]
 
|-
 
|-
Line 170: Line 198:
 
| Auratus Cichlid  
 
| Auratus Cichlid  
 
| [http://www.ncbi.nlm.nih.gov/genome/2639 2639]  
 
| [http://www.ncbi.nlm.nih.gov/genome/2639 2639]  
| <br>
+
| <br>  
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24913 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24964 masked]
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24913 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24964 masked]
 
|-
 
|-
Line 195: Line 223:
 
| [http://www.genoscope.cns.fr/trout-ggb/data/ Genoscope Salmon Database]  
 
| [http://www.genoscope.cns.fr/trout-ggb/data/ Genoscope Salmon Database]  
 
| Angling and industrial species  
 
| Angling and industrial species  
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24831 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24950 masked]
+
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25073 unmasked] [https://genomevolution.org/coge/GenomeInfo.pl?gid=25267 masked]
 
|-
 
|-
 
| '''''Oreochromis niloticus'''''  
 
| '''''Oreochromis niloticus'''''  
Line 228: Line 256:
 
|-
 
|-
 
| '''''Pundamilia nyererei'''''  
 
| '''''Pundamilia nyererei'''''  
| <br>
+
| <br>  
 
| [http://www.ncbi.nlm.nih.gov/genome/3330 3330]  
 
| [http://www.ncbi.nlm.nih.gov/genome/3330 3330]  
| <br>
+
| <br>  
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25011 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25055 masked]
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25011 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25055 masked]
 +
|-
 +
| '''''Salmo salar'''''
 +
| Atlantic salmon
 +
| [ NCBI Bioproject]
 +
| Angling species, Industrial species
 +
| [https://genomevolution.org/coge/GenomeInfo.pl?gid=28938 unmasked] [https://genomevolution.org/coge/GenomeInfo.pl?gid=29012 masked]
 +
|-
 +
| '''''Sebastes aleutianus'''''
 +
| Rougheye Rockfish
 +
| [http://www.ncbi.nlm.nih.gov/bioproject?term=txid214485 NCBI Bioproject]
 +
| Angling species, Industrial species, Does not age/senesce
 +
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24053 soft masked]
 
|-
 
|-
 
| '''''Sebastes nigrocinctus'''''  
 
| '''''Sebastes nigrocinctus'''''  
Line 243: Line 283:
 
| [http://www.ncbi.nlm.nih.gov/genome/11458 11458]  
 
| [http://www.ncbi.nlm.nih.gov/genome/11458 11458]  
 
| Angling species, Industrial species  
 
| Angling species, Industrial species  
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24886 unmasked]
+
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24886 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=25162 masked]
 
|-
 
|-
 
| '''''Stegastes partitus'''''  
 
| '''''Stegastes partitus'''''  
Line 254: Line 294:
 
| Sansaifugu  
 
| Sansaifugu  
 
| [http://www.ncbi.nlm.nih.gov/genome/14185 14185]  
 
| [http://www.ncbi.nlm.nih.gov/genome/14185 14185]  
| <br>
+
| <br>  
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24910 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24924 masked]
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24910 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24924 masked]
 
|-
 
|-
Line 275: Line 315:
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24952 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24968 masked]
 
| [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24952 unmasked] [https://genomevolution.org/CoGe/GenomeInfo.pl?gid=24968 masked]
 
|-
 
|-
 +
| <br>
 
| '''''Xiphophorus maculatus'''''  
 
| '''''Xiphophorus maculatus'''''  
 
| Southern Platyfish  
 
| Southern Platyfish  
Line 282: Line 323:
 
|}
 
|}
  
<br><br>  
+
<br><br>
  
*Links to CoGe [[Notebooks]] containing these genomes<br>
+
== Links to CoGe [[Notebooks]] containing fish genomes ==
 
**[https://genomevolution.org/CoGe/NotebookView.pl?nid=890 All unmasked fish genomes available in CoGe]<br>  
 
**[https://genomevolution.org/CoGe/NotebookView.pl?nid=890 All unmasked fish genomes available in CoGe]<br>  
 
**[ All masked fish genomes available in CoGe]  
 
**[ All masked fish genomes available in CoGe]  
Line 304: Line 345:
 
*[[In-Paralog]]
 
*[[In-Paralog]]
  
== Analyses ==
+
== Casting the Line: analyses for comparing genomes ==
  
 
=== Whole genome syntenic analysis ===
 
=== Whole genome syntenic analysis ===
  
*[[SynMap]]  
+
*[[SynMap]] [https://genomevolution.org/wiki/index.php/Maize_Sorghum_Syntenic_dotplot (tutorial)]
 
**Identify Whole Genome Duplications  
 
**Identify Whole Genome Duplications  
Whole genome synteny analysis of [https://genomevolution.org/r/eola ''Oncorhynchus mykiss'' (rainbow trout) compared to ''Takifugu rubripes''].  
+
***'''Example 1:''' Whole genome synteny analysis of [https://genomevolution.org/r/eola ''Oncorhynchus mykiss'' (rainbow trout) compared to ''Takifugu rubripes'']. In this analysis, the [[synonymous mutation]] rate (Ks) has been calculated to determine the relative age of each syntenic gene pair (represented as a dot) in the dotplot. The blue/green color indicates a newer whole genome duplication whereas the red/orange dots are noise in the dataset. The histogram below the dotplot shows the distribution of the synonymous mutation rates
https://genomevolution.org/r/eola
+
 
  
 
**Identify Synteny  
 
**Identify Synteny  
 
**Synonymous/nonsynonymous gene pair evolution
 
**Synonymous/nonsynonymous gene pair evolution
 +
***[https://genomevolution.org/r/eq6b Non-synonymous mutation rates for syntenic gene pairs between ''O. mykiss'' and ''T. rubripes'']
 +
***[https://genomevolution.org/r/eq69 Ka/Ks rates for syntenic gene pairs between ''O. mykiss'' and ''T. rubripes'']
  
 
=== Microsyntenic analysis ===
 
=== Microsyntenic analysis ===
Line 320: Line 363:
 
*[[GEvo]]  
 
*[[GEvo]]  
 
**Validate microsynteny  
 
**Validate microsynteny  
 +
***'''Example 1:''' Microsynteny analysis using GeVo to compare [https://genomevolution.org/r/eoln ''T. rubripes'' to ''O. mykiss'']. This analysis shows evidence of whole genome duplication in ''O. mykiss'' (the Salmonid WGD) when compared to another Teleost fish that does not have a WGD after the Teleost WGDfff.
 +
 
**Identify Conserved non-coding sequences (regulatory function)
 
**Identify Conserved non-coding sequences (regulatory function)
  
Line 326: Line 371:
 
*[[SynFind]]  
 
*[[SynFind]]  
 
**Identify orthologous regions across many species
 
**Identify orthologous regions across many species
 +
 +
== Sinkers to Cast Further and Deeper: adding weight to genomes with additional data types ==
 +
 +
=== Adding new data (genomes, RNASeq, SNPs) to CoGe ===
 +
 +
*[[LoadGenome]]
 +
*[[LoadExperiment]]
 +
*Keeping data private and sharing with collaborators
  
 
=== Gene family analysis ===
 
=== Gene family analysis ===
Line 338: Line 391:
 
*Adding/visualizing RNAseq data
 
*Adding/visualizing RNAseq data
  
=== Adding new data (genomes, RNASeq, SNPs) to CoGe ===
+
== Discussion/Conclusions ==
 
+
*[[LoadGenome]]
+
*[[LoadExperiment]]
+
*Keeping data private and sharing with collaborators
+
  
== Discussion/Conclusions ==
 
  
 
<br>
 
<br>

Latest revision as of 11:26, 6 May 2016

Summary/Abstract

Introduction

  • Phylogeny of genomes with polyploidy events marked



Sample Datasets

To access sample datasets

Setting the hook: getting data loaded into CoGe

  • Loading new genomes into CoGe
    • Loading genomes into the Bio.ci (iPlant Data Store) for the first time
      • Sign up for an account or log in
      • Access the Data Store
      • Direct upload from your computer
        • Press the upload tab on the top left and select 'Simple upload from Desktop'
        • Find the genome(s) FASTA file and select
        • Enjoy a picnic outside while they upload
      • Using FTP/HTTP links for uploading genomes in the Data Store
        • 1) In the top left of the screen click on the 'Upload' button
          • a) Select 'Import using URL'
          • b) This will create several spaces to paste URLs into so that the Data Store will retrieve files (in this case genome FASTA files/.fa files). Once the blank areas come up, open a new tab in your browser
        • 2) copy the link location of the genome FASTA file from an FTP site or webpage
        • 3) paste the URL into a blank on the Data Store page
        • Enjoy some tea or coffee while they upload
    • Once the genomes have been uploaded into the Data Store
      • Select them and move them into the folder names 'coge_data'
      • The genome should now be visible in the CoGe platform
    • Direct upload


  • OR Use genomes already available in CoGe



Sequenced fish genomes

Table of fish genomes publicly available in CoGe. Bold denotes species with annotations.
Class Species Common name Genome ID Species notes CoGe Genome Links

Hyperoartia

(jawless fish)

Lethenteron camtschaticum Artic lamprey 16905 One of few extant jawless fish unmasked masked
Petromyzon marinus Sea Lamprey 287 One of few extant jawless fish unmasked masked
Chondrichthyes

(cartilaginous fish)

Callorhinchus milii Australian Ghostshark 689 Proposed model cartilaginous fish unmasked masked
Sarcopterygii

(lobe-fin fish)

Latimeria chalumnae Coelacanth 3262 Oldest extant Sarcopterygii unmasked masked

Actinopterygii

(ray-fin fish)

Anguilla japonica Japanese Eel 13349 Ecological model, Industry species unmasked masked
Anoplopoma fimbria Sablefish 12760 Industrial species unmasked masked
Astyanax mexicanus Mexican tetra 13073 Two forms: seeing, and cave-dwelling blind unmasked masked
Cynoglossus semilaevis Tongue Sole 11788 Industrial species unmasked masked
Cyprinodon variegatus Sheepshead minnow 13078 Toxicology model unmasked masked
Danio rerio Zebrafish 50 Model species unmasked masked
Dicentrarchus labrax European Seabass 2659 Industrial species unmasked masked
Esox lucius Northern Pike 22932 Angling species unmasked masked
Gadus morhua Atlantic cod 2661 Industrial species unmasked masked
Gasterosteus aculeatus Three-spined stickleback 146 Model species unmasked masked
Haplochromis burtoni Burton’s mouthbrooder 3328 Ecological model species, Aquarium species unmasked masked
Labeotropheus fuelleborni Blue mbuna 2638
unmasked masked
Lepisosteus oculatus Spotted Gar 10597 Lineage before Teleost whole genome duplication unmasked masked
Maylandia zebra Zebra Mbuna 2640 Ecological model species, aquarium fish unmasked masked
Mchenga conophoros
2585
unmasked masked
Melanochromis auratus Auratus Cichlid 2639
unmasked masked
Neolamprologus brichardi Princess Cichlid 3329 Aquarium fish unmasked masked
Nothobranchius furzeri Turquoise Killifish 2642 Short life span model species, metabolic diapause unmasked masked
Nothobranchius kuhntae Beira Killifish 2643 Short life model species unmasked masked
Oncorhynchus mykiss Rainbow Trout Genoscope Salmon Database Angling and industrial species unmasked masked
Oreochromis niloticus Nile Tilapia 197 Industrial species unmasked masked
Oryzias latipes Japanese Medaka 542 Model fish, Aquarium species unmasked masked
Pimephales promelas Fathead Minnow 13167 Industrial baitfish unmasked masked
Poecilia formosa Amazon Molly 13072 Gynogenesis: all female populations unmasked masked
Poecilia reticulata Guppy 23338 Model species, Aquarium fish unmasked masked
Pundamilia nyererei
3330
unmasked masked
Salmo salar Atlantic salmon [ NCBI Bioproject] Angling species, Industrial species unmasked masked
Sebastes aleutianus Rougheye Rockfish NCBI Bioproject Angling species, Industrial species, Does not age/senesce soft masked
Sebastes nigrocinctus Tiger Rockfish 14568 Angling species, long lived model, live bearer masked
Sebastes rubrivinctus Flag Rockfish 11458 Angling species, Industrial species unmasked masked
Stegastes partitus Bicolor damselfish 13077 Medical model species unmasked masked
Takifugu flavidus Sansaifugu 14185
unmasked masked
Takifugu rubripes Torafugu 63 Shortest vertebrate genome known unmasked masked
Tetraodon nigroviridis Spotted Green Pufferfish 191 Model species, low amount of repetitive sequence unmasked masked
Thunnus orientalis Pacific Bluefin Tuna 13314 Industrial species unmasked masked

Xiphophorus maculatus Southern Platyfish 10764 Live bearer unmasked masked



Links to CoGe Notebooks containing fish genomes

All annotated unmasked fish genomes available in CoGe

All unmasked fish genomes available in CoGe

Glossary

Casting the Line: analyses for comparing genomes

Whole genome syntenic analysis

  • SynMap (tutorial)
    • Identify Whole Genome Duplications
      • Example 1: Whole genome synteny analysis of Oncorhynchus mykiss (rainbow trout) compared to Takifugu rubripes. In this analysis, the synonymous mutation rate (Ks) has been calculated to determine the relative age of each syntenic gene pair (represented as a dot) in the dotplot. The blue/green color indicates a newer whole genome duplication whereas the red/orange dots are noise in the dataset. The histogram below the dotplot shows the distribution of the synonymous mutation rates


Microsyntenic analysis

  • GEvo
    • Validate microsynteny
      • Example 1: Microsynteny analysis using GeVo to compare T. rubripes to O. mykiss. This analysis shows evidence of whole genome duplication in O. mykiss (the Salmonid WGD) when compared to another Teleost fish that does not have a WGD after the Teleost WGDfff.
    • Identify Conserved non-coding sequences (regulatory function)

Ortholog/paralog finding with synteny

  • SynFind
    • Identify orthologous regions across many species

Sinkers to Cast Further and Deeper: adding weight to genomes with additional data types

Adding new data (genomes, RNASeq, SNPs) to CoGe

Gene family analysis

  • CoGeBlast
    • Identify many gene family members within/across species
    • Extract sequences (nucleotide/protein)
    • Phylogenetic tree reconstruction using iPlant/iAnimal for multiple sequence alignment and tree building

Functional genomics

  • Adding/visualizing RNAseq data

Discussion/Conclusions