2011 BSA Workshop: Difference between revisions

From CoGepedia
Jump to navigation Jump to search
No edit summary
 
Line 59: Line 59:


==Genomes and Whole Genome Comparisons==
==Genomes and Whole Genome Comparisons==
[[File:Screen shot 2011-07-06 at 1.41.02 PM.png|thumb|right|500px|Syntenic dotplot between two substrains of E. coli to show various patterns of genome evolution.  Results may be regenerated: http://genomevolution.org/r/2vde]]  
[[Image:Screen shot 2011-07-06 at 1.41.02 PM.png|thumb|right|500px|Syntenic dotplot between two substrains of E. coli to show various patterns of genome evolution.  Results may be regenerated: http://genomevolution.org/r/2vde]]
[[Image:Master 7029 7071.CDS-CDS.blastn.dag.go c4 D20 g10 A5.aligncoords.gcoords ct0.w1200.png|thumb|600px|Syntenic dotplot of human (x-axis) versus chimp (y-axis). Results may be regenerated at: http://genomevolution.org/r/2vik]]
[[Image:Master_3068_8.CDS-CDS.blastn_geneorder_D20_g10_A5.w2000.ks.png|thumb|right|600px|Figure 1a: Syntenic dotplot between ''Arabidopsis thaliana'' and ''Arabidopsis lyrata''.  Syntenic gene pairs identified by DAGChainer have been colored based on their synonymous rate change as calculated by CODEML.  Results can be regenerated [http://genomevolution.org/CoGe/SynMap.pl?dsgid1=3068;dsgid2=8;D=20;g=10;A=5;w=0;b=1;ft1=1;ft2=1;dt=geneorder;ks=1;autogo=1 here]. ]]
# [[OrganismView]]: Find genomes and getting an overview of genomic data
# [[OrganismView]]: Find genomes and getting an overview of genomic data
##Start with bacteria genomes:  small and fast to process, easier to visualize comparisons: http://genomevolution.org/CoGe/OrganismView.pl?org_name=mg1655
##Start with bacteria genomes:  small and fast to process, easier to visualize comparisons: http://genomevolution.org/CoGe/OrganismView.pl?org_name=mg1655

Latest revision as of 14:13, 9 July 2011

Welcome to CoGe: http://genomevolution.org

Introduction

  1. Who has used CoGe?
  2. Preamble:
    1. Store any version of any genome from all of life
    2. Interconnected tools to analyze genomes at multiple levels of resolution: Open-ended Analyses!
      1. Your questions drive where you go and what analyses you perform. Not the tools driving the questions you may ask.
    3. Emphasis on exploring genomes as a biologist would an organism
  3. General types of research questions:
    1. I am interested in a group of organisms. . .
    2. I am interested in a group of genes . . .
    3. CoGe has tools to help answer questions in light of genome structure, dynamics, and evolution
  4. Who are you?
    1. Name
    2. Genes and genomes of interest
    3. Anything particular you'd like to know by the end of the workshop
  5. Workshop Organization
    1. Overview of CoGe's tools using example analyses to understand how they are linked together
    2. Open QnA

Always ask questions

  1. What you need to run CoGe:
    1. Firefox
    2. Flash
    3. Enable Javascript
    4. Enable Popups (for CoGe only)
    5. Enable Cookies (if you have a CoGe user account)
  2. First, anyone interested in comparing large genomes
    1. CoGe can do large analyses, but depending on the size and complexity of the genomes, some analyses may take a while to run. However, CoGe caches the results of large analyses.
  3. CoGe Organization: Each tool is designed to do one thing and one thing well.
    1. Tools are linked to one another through URL/web-links that pop-up additional tabs.
    2. This creates an implicit record of each step of your analysis.
    3. What to save where you are in an analysis? Copy the link and paste into your notes.
    4. Most analyses generate URLs that you can save to regenerate the analysis.
    5. Data and analysis results are meant to be easily exported from CoGe.
      1. Download genomes and annotations
      2. Download whole genome comparison blast files
      3. Download syntenic gene-pairs between genomes
    6. Home Page
    7. Entrance Tools
    8. Connected Tools

Starting with CoGe

  1. Home Page: http://genomevolution.org/CoGe/
    1. Info on the system: How many
      1. Organisms
      2. Genomes
      3. Nucleotides of genomic sequence
      4. [Genome features] (e.g. genes, mRNA, CDS, transposon, repeat region)
      5. Annotations
    2. Entrance tools
      1. OrganismView: Search for an organism by name or taxonomic description; get information about that genome. Links to downstream analyses.
      2. CoGeBlast: Blast sequences against any set of genomes in the system. You search for genomes and add them to a list. Great visuals for evaluating hits; automatic links to matching genomic features for additional downstream data retrieval and analyses
      3. FeatView: Search for genomic features by name; get information about that genomic feature (annotations, sequences, AT/GC content). Links to downstream analyses.
      4. SynMap: Pairwise whole genome comparisons with interactive and customizable syntenic dotplots visualizations. Links to downstream analyses.
      5. GEvo: High-resolution analysis of multiple genomic regions. Dynamic and interactive visualizations. Links to downstream analyses.
    3. Where to get more help
      1. CoGePedia

Genomes and Whole Genome Comparisons

Syntenic dotplot between two substrains of E. coli to show various patterns of genome evolution. Results may be regenerated: http://genomevolution.org/r/2vde
Syntenic dotplot of human (x-axis) versus chimp (y-axis). Results may be regenerated at: http://genomevolution.org/r/2vik
Figure 1a: Syntenic dotplot between Arabidopsis thaliana and Arabidopsis lyrata. Syntenic gene pairs identified by DAGChainer have been colored based on their synonymous rate change as calculated by CODEML. Results can be regenerated here.
  1. OrganismView: Find genomes and getting an overview of genomic data
    1. Start with bacteria genomes: small and fast to process, easier to visualize comparisons: http://genomevolution.org/CoGe/OrganismView.pl?org_name=mg1655
    2. Same methods work on any genome (though larger genomes may take longer to process some information): http://genomevolution.org/CoGe/OrganismView.pl?org_name=brachypodium
  2. GenomeView: Visually inspecting genomes: http://genomevolution.org/CoGe/GenomeView.pl?z=6&x=10000&dsgid=4242&chr=1
    1. MG1655 and horizontal genome transfer (phage insertion at position 280,000)
    2. Use browser layer "Wobble GC usage" to visualize"
    3. Extract sequence and features
    4. Get annotations for feature list: http://genomevolution.org/CoGe/FeatList.pl?dsid=36725&chr=1&start=252944&stop=305680&gstid=1
  3. SynMap: Pair-wise whole genome comparison; syntenic dotplots
    1. E. coli DH10B and W3110: http://genomevolution.org/r/2vde
    2. What is the dotplot?
    3. Inversion
    4. Segmental duplications
    5. Insertions/deletions
    6. Saving analyses using links: "Regenerate this analysis . . ."
    7. Uncovering evolution: What happened at the central insertion?
      1. High resolution analysis with GEvo
      2. Extracting inserted region (SeqView)
      3. Extracting genomic features and annotations (FeatList)
      4. Adding another sequence to the region from NCBI: http://genomevolution.org/r/2ved
  4. Pick a sequence from around that region to explore a question that uses CoGeBlast:
    1. Number and location of transposons in the genome?
    2. Other genomes with insertion?
  5. Bacterial inversion
    1. Sequences involved with inversion
      1. http://genomevolution.org/r/2vmm
      2. http://genomevolution.org/r/2vml
      3. Merging GEvo analysis: http://genomevolution.org/r/2vmn
    2. X-alignments
    3. Crazy bacterial genome evolution or poor genome assembly: http://genomevolution.org/r/2vmo

Analyzing larger genomes with SynMap:

  1. Human Chimp: http://genomevolution.org/r/2vik
    1. Changing chromosome order on axis: http://genomevolution.org/r/2vin
    2. Showing all matches versus just syntenic matches: http://genomevolution.org/r/2vis
  2. Human Mouse: http://genomevolution.org/r/2vip
    1. What are those other "dots": measuring evolutionary distance with synonymous mutation rates (Ks): http://genomevolution.org/r/2viw (Importance of changing Ks color scheme)
    2. Merging GEvo analyses; Human-Chimp-Mouse
      1. Human-Chimp GEvo: http://genomevolution.org/r/2vj3
      2. Human-Mouse GEvo: http://genomevolution.org/r/2vj1
        1. Reverse complementing and masking non-CDS sequences!
      3. Merge: http://genomevolution.org/r/2vjc
  3. Arabidopsis thaliana v Arabidopsis lyrata: http://genomevolution.org/r/2veh
    1. Note: For an in-depth overview of this comparison please see this page
    2. Axis metrics: genes versus nucleotides
    3. Multiple coverage and Whole genome duplications events
    4. Quota-align and setting coverage limits
    5. Synonymous mutations
  4. Sorghum versus Maize: http://genomevolution.org/r/2vej
    1. Shared versus independent Whole genome duplications
  5. Rice versus Brachypodium: http://genomevolution.org/r/2vii
    1. Nested chromosome insertions
  6. Other large genomes?

Gene families

  1. CoGe Blast and identifying families
    1. Find a gene of interest in CoGe or get a sequence from elsewhere
    2. CoGeBlast to various plant genomes
      1. Use CoGeBlast to evaluate hits
      2. Select matching genome features to:
        1. Get sequences
        2. Send to http://phylogeny.fr for phylogenetic tree reconstruction
        3. Send to FeatList to manage list of genomic features.
    3. Use phylogenetic tree and FeatList to select and send genomic features to GEvo to analyze regions for evidence of synteny and classify according to:
      1. Orthologs
      2. Homeologs
      3. Transposition duplications
      4. Tandem duplications

So you have sequenced a genome and you have pile of conigs. . .

  1. Syntenic path assembly in SynMap
    1. WGS sequence and de novo assembly of E. coli K12 to reference genome:
      1. Unsorted SynMap: http://genomevolution.org/r/2vjm
      2. Syntenic Path Assembly: http://genomevolution.org/r/2vjp
    2. Print out assembled sequence. Reload into CoGe. Gene model predictions. Lift-over annotations
  2. Something went wrong: When a sequencing sample was mixed up.
    1. OrganismView: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=11499
    2. No SynMap to reference genome:http://genomevolution.org/CoGe/SynMap.pl?dsgid1=11499;dsgid2=782
    3. What is it: Link to NCBI blast through CoGeBlast: contig00001 hits E. coli
    4. SynMap with E. coli: http://genomevolution.org/r/2v88
      1. Lower syntenic region identification threshold: http://genomevolution.org/r/2vjz
      2. Use syntenic path assembly to see it is an E coli genome: http://genomevolution.org/r/2vk1

Advanced functionality

  1. GC content shifts using SynMap and Amino Acid/Codon log odds scoring matrices
    1. Human-mouse example:
      1. SynMap: http://genomevolution.org/r/2viw
      2. Substitution Matrix:
    2. Plasmodia example:
      1. SynMap: http://genomevolution.org/r/2vk9
      2. Substitution Matrix: http://genomevolution.org/CoGe/SynSub.pl?dsgid1=9636;dsgid2=2465 (Take a while to load)
  2. Detecting mitochondria insertion in Arabidopsis thaliana:
    1. SynMap: http://genomevolution.org/r/2vk8
    2. GEvo: http://genomevolution.org/r/2vke
  3. Auto-finding syntenic regions with SynFind:
    1. http://genomevolution.org/r/2v2b