2011 BSA Workshop

From CoGepedia
Jump to navigation Jump to search

genomevolution.org

Introduction

  1. Who has used CoGe?
  2. Preamble:
    1. Store any version of any genome from all of life
    2. Interconnected tools to analyze genomes at multiple levels of resolution: Open-ended Analyses!
      1. Your questions drive where you go and what analyses you perform. Not the tools driving the questions you may ask.
    3. Emphasis on exploring genomes as a biologist would an organism
  3. General types of research questions:
    1. I am interested in a group of organisms. . .
    2. I am interested in a group of genes . . .
    3. CoGe has tools to help answer questions in light of genome structure, dynamics, and evolution
  4. Who are you?
    1. Name
    2. Genes and genomes of interest
    3. Anything particular you'd like to know by the end of the workshop
  5. Workshop Organization
    1. Overview of CoGe's tools using example analyses to understand how they are linked together
    2. Open QnA

Always ask questions

  1. What you need to run CoGe:
    1. Firefox
    2. Flash
    3. Enable Javascript
    4. Enable Popups (for CoGe only)
    5. Enable Cookies (if you have a CoGe user account)
  2. First, anyone interested in comparing large genomes
    1. CoGe can do large analyses, but depending on the size and complexity of the genomes, some analyses may take a while to run. However, CoGe caches the results of large analyses.
  3. CoGe Organization: Each tool is designed to do one thing and one thing well.
    1. Tools are linked to one another through URL/web-links that pop-up additional tabs.
    2. This creates an implicit record of each step of your analysis.
    3. What to save where you are in an analysis? Copy the link and paste into your notes.
    4. Most analyses generate URLs that you can save to regenerate the analysis.
    5. Data and analysis results are meant to be easily exported from CoGe.
      1. Download genomes and annotations
      2. Download whole genome comparison blast files
      3. Download syntenic gene-pairs between genomes
    6. Home Page
    7. Entrance Tools
    8. Connected Tools

Starting with CoGe

  1. Home Page: http://genomevolution.org/CoGe/
    1. Info on the system: How many
      1. Organisms
      2. Genomes
      3. Nucleotides of genomic sequence
      4. [Genome features] (e.g. genes, mRNA, CDS, transposon, repeat region)
      5. Annotations
    2. Entrance tools
      1. OrganismView: Search for an organism by name or taxonomic description; get information about that genome. Links to downstream analyses.
      2. CoGeBlast: Blast sequences against any set of genomes in the system. You search for genomes and add them to a list. Great visuals for evaluating hits; automatic links to matching genomic features for additional downstream data retrieval and analyses
      3. FeatView: Search for genomic features by name; get information about that genomic feature (annotations, sequences, AT/GC content). Links to downstream analyses.
      4. SynMap: Pairwise whole genome comparisons with interactive and customizable syntenic dotplots visualizations. Links to downstream analyses.
      5. GEvo: High-resolution analysis of multiple genomic regions. Dynamic and interactive visualizations. Links to downstream analyses.
    3. Where to get more help
      1. | CoGePedia

Genomes and Whole Genome Comparisons

  1. OrganismView: Find genomes and getting an overview of genomic data
    1. Start with bacteria genomes: small and fast to process, easier to visualize comparisons.
    2. Same methods work on any genome (though larger genomes may take longer to process).
  2. GenomeView: Visually inspecting genomes: http://genomevolution.org/CoGe/GenomeView.pl?z=6&x=10000&dsgid=4242&chr=1
    1. MG1655 and horizontal genome transfer (phage insertion at position 280,000)
    2. Use browser layer "Wobble GC usage" to visualize"
    3. Extract sequence and features
    4. Get annotations for feature list: http://genomevolution.org/CoGe/FeatList.pl?dsid=36725&chr=1&start=252944&stop=305680&gstid=1
  3. SynMap: Pair-wise whole genome comparison; syntenic dotplots
    1. E. coli DH10B and W3110: http://genomevolution.org/r/2vde
    2. What is the dotplot?
    3. Inversion
    4. Segmental duplications
    5. Insertions/deletions
    6. Saving analyses using links: "Regenerate this analysis . . ."
    7. Uncovering evolution: What happened at the central insertion?
      1. High resolution analysis with GEvo
      2. Extracting inserted region (SeqView)
      3. Extracting genomic features and annotations (FeatList)
      4. Adding another sequence to the region from NCBI: http://genomevolution.org/r/2ved
  4. Pick a sequence from around that region to explore a question that uses CoGeBlast:
    1. Number and location of transposons in the genome?
    2. Other genomes with insertion?
  5. Bacterial inversion
    1. Sequences involved with inversion
      1. http://genomevolution.org/r/2vmm
      2. http://genomevolution.org/r/2vml
      3. Merging GEvo analysis: http://genomevolution.org/r/2vmn
    2. X-alignments
    3. Crazy bacterial genome evolution or poor genome assembly: http://genomevolution.org/r/2vmo

Analyzing larger genomes with SynMap:

  1. Human Chimp: http://genomevolution.org/r/2vik
    1. Changing chromosome order on axis: http://genomevolution.org/r/2vin
    2. Showing all matches versus just syntenic matches: http://genomevolution.org/r/2vis
  2. Human Mouse: http://genomevolution.org/r/2vip
    1. What are those other "dots": measuring evolutionary distance with synonymous mutation rates (Ks): http://genomevolution.org/r/2viw (Importance of changing Ks color scheme)
    2. Merging GEvo analyses; Human-Chimp-Mouse
      1. Human-Chimp GEvo: http://genomevolution.org/r/2vj3
      2. Human-Mouse GEvo: http://genomevolution.org/r/2vj1
        1. Reverse complementing and masking non-CDS sequences!
      3. Merge: http://genomevolution.org/r/2vjc
  3. Arabidopsis thaliana v Arabidopsis lyrata: http://genomevolution.org/r/2veh
    1. Axis metrics: genes versus nucleotides
    2. Multiple coverage and Whole genome duplications events
    3. Quota-align and setting coverage limits
    4. Synonymous mutations
  4. Sorghum versus Maize: http://genomevolution.org/r/2vej
    1. Shared versus independent Whole genome duplications
  5. Rice versus Brachypodium: http://genomevolution.org/r/2vii
    1. Nested chromosome insertions
  6. Other large genomes?

Gene families

  1. CoGe Blast and identifying families
    1. Find a gene of interest in CoGe or get a sequence from elsewhere
    2. CoGeBlast to various plant genomes
      1. Use CoGeBlast to evaluate hits
      2. Select matching genome features to:
        1. Get sequences
        2. Send to http://phylogeny.fr for phylogenetic tree reconstruction
        3. Send to FeatList to manage list of genomic features.
    3. Use phylogenetic tree and FeatList to select and send genomic features to GEvo to analyze regions for evidence of synteny and classify according to:
      1. Orthologs
      2. Homeologs
      3. Transposition duplications
      4. Tandem duplications

So you have sequenced a genome and you have pile of conigs. . .

  1. Syntenic path assembly in SynMap
    1. WGS sequence and de novo assembly of E. coli K12 to reference genome:
      1. Unsorted SynMap: http://genomevolution.org/r/2vjm
      2. Syntenic Path Assembly: http://genomevolution.org/r/2vjp
    2. Print out assembled sequence. Reload into CoGe. Gene model predictions. Lift-over annotations
  2. Something went wrong: When a sequencing sample was mixed up.
    1. OrganismView: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=11499
    2. No SynMap to reference genome:http://genomevolution.org/CoGe/SynMap.pl?dsgid1=11499;dsgid2=782
    3. What is it: Link to NCBI blast through CoGeBlast: contig00001 hits E. coli
    4. SynMap with E. coli: http://genomevolution.org/r/2v88
      1. Lower syntenic region identification threshold: http://genomevolution.org/r/2vjz
      2. Use syntenic path assembly to see it is an E coli genome: http://genomevolution.org/r/2vk1

Advanced functionality

  1. GC content shifts using SynMap and Amino Acid/Codon log odds scoring matrices
    1. Human-mouse example:
      1. SynMap: http://genomevolution.org/r/2viw
      2. Substitution Matrix:
    2. Plasmodia example:
      1. SynMap: http://genomevolution.org/r/2vk9
      2. Substitution Matrix: http://genomevolution.org/CoGe/SynSub.pl?dsgid1=9636;dsgid2=2465 (Take a while to load)
  2. Detecting mitochondria insertion in Arabidopsis thaliana:
    1. SynMap: http://genomevolution.org/r/2vk8
    2. GEvo: http://genomevolution.org/r/2vke
  3. Auto-finding syntenic regions with SynFind:
    1. http://genomevolution.org/r/2v2b