Tutorials

From CoGepedia
Revision as of 11:44, 22 January 2020 by Elyons (Talk | contribs) (How to download a genome and its annotations)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Here you can find tutorials informing you on how to get the most out of CoGe's tools.

Contents

Original tutorials

You can find a list of CoGe's old tutorials here.

Tutorials

How to search and assemble genes

Contributed by David Nelson

Gene assembly

How to assemble contig-level de novo assembly using a reference genome using synteney


How to determine the structural changes between genome assemblies

  • This is easy using SynMap. Just find your organism of interest, and select the two versions of its genome your wish to compare. Here are examples form:
  1. Grape genome versions 1 and 2
  2. Medicago genome versions 2 and 3
  3. Maize B73 refgen version 1 (with gene annotation) and version 2 (genomic sequence only)

How to do phylogenetics with CoGe

  • You have a sequence of interest and you want to find homologs of it within and among various genomes in order to do phylogenetic tree reconstructions. CoGe can help. CoGeBlast helps you identify and evaluate homologs from any number of genome, and is linked to FeatList for displaying information about a list of genomic features. FeatList plays a central role in managing lists of genomic features in CoGe and let's you select and send features to other programs in CoGe. One of them is FastaView for generating fasta formatted sequence data. FastaView is linked to phylogeny.fr, and web resource for generating multiple sequence alignments and phylogenetic tree reconstructions. They have a very nice pipeline for automatically generating a decent phylogeny for a set of sequences, and FastaView's link to phylogeny.fr will automatically submit your sequences to their 'one-click' phylogenetic pipeline.

Here is the full tutorial: phylogenetics in CoGe

How to download a genome and its annotations

  • This is easy using GenomeInfo. Just search for an organism and genome of interest using the Search Database field at the top of any page. In the Tools section you will find the following links:
  1. "FASTA" to download the entire genome's DNA sequence in fasta format
  2. "GFF" to download all the genomic features in the genome and their annotations in GFF format
  • If you want to launch the genome browser to access a section of the genome, launch the JBrowse genome browser
Screen Shot 2020-01-21 at 12.02.23 PM.png

How to find syntenic regions between genomes

What to do with a genome in the early stages of assembly

Finding Inversions

Ortholog identification and conserved noncoding sequence (CNS) analysis

  • Ramosa2 orthologs and CNSs: ramosa2 Encodes a LATERAL ORGAN BOUNDARY Domain Protein That Determines the Fate of Stem Cells in Branch Meristems of Maize [1] Special thanks to Devin O’Connor for writing this tutorial!
  1. Esteban Bortiri, George Chuck, Erik Vollbrecht, Torbert Rocheford, Rob Martienssen, and Sarah Hake. 2006 ramosa2 Encodes a LATERAL ORGAN BOUNDARY Domain Protein That Determines the Fate of Stem Cells in Branch Meristems of Maize. Plant Cell 18:574–585

How to find rarely and frequently used codons in a genome

How to generate an amino acid usage table for an organism

How to determine the GC content of a genome or chromosome

  • This is easy using OrganismView. Just search for an organism and genome of interest and press the link "Click for percent GC content" located next to the length of the genome in the "Genome information" section. For small genomes, this is automatically calculated when the "Genome information" section is loaded.

Whole Genome Comparison and Analysis using SynMap and GEvo

  • Maize Sorghum Syntenic dotplot Since these lineages diverged ~11 MYA, maize has had a whole genome duplication event; prior to their divergence the lineage had a whole genome duplication event. Using SynMap's synonymous mutation overlay, it is easy to determine which syntenic regions are derived from the pre-grass whole genome duplication event or the one specific to maize.

How to extract all the gene sequences from a genomic region for export from CoGe

  • There are times when you want to export sequences from CoGe to another informatics tool. CoGe makes it easy to find the sequences you want and format them for export: How to extract genomic features.

Identifying putative horizontal gene transfer events

How to find the size of a genome and the genomic features it contains

  • Want to know how big a genome is and get a breakdown of the number of genes it contains? Use OrganismView and just search for your organism of interest. It will automatically return these sets of information, and a whole lot more.

How to annotate a genome using CoGe's tools and links to other bioinformatic resources

This tutorial uses CoGe to annotate a Baculovirus genome. Here, CoGe is primarily used to retrieve and organize genomic sequence data, and use its built-in links are used to:

  • Build an Excel spreadsheet of all genomic features in a genome including their annotations and sequences.
  • Link to NCBI's Blast resource for sequence searching
  • Send protein sequences to ProSite for domain identification

The steps outlines here will map to any genome, and works very well for a class project!

Thanks to Dr. Eric Haas-Stapleton for creating this tutorial.

How to load genomes into CoGe

More people are requesting installing a local version of CoGe. For those of you you have one, or access to the main CoGe server, here are the directions on How to load a genome into CoGe.

How to perform a genomic rearrangement analysis

When configured to enforce a syntenic coverage depth of 1:1 (a one to one mapping of syntenic regions between two genomes), SynMap will generate a link to the genomic rearrangement analysis tool, GRIMM, and auto-populate its submission boxes with the analyzed genomes appropriately formatted.

Please see this tutorial: Genomic Rearrangement Analysis

How to share genomes

This is a tutorial on how to share genomes with other users in CoGe

Sharing data in iPlant's Data Store

This tutorial shows to to use the iPlant Data Store to share data. This is useful if you want to share data with the CoGe Team to help get your genome loaded into CoGe

Sending genomes to the iPlant Data Store

This tutorial shows how to send a genome from CoGe to the iPlant Data Store (including marking up those data with all the metadata CoGe has on the genome.

How to add a genome from Phytozome or JGI

This tutorial walks through all the steps to integrate a genome from Phytozome/JGI into CoGe

How to favorite things (genomes, experiments) in CoGe so they appear at the top of lists in tools

CoGe often has many versions of a genome and this shows you how mark your favorite so it always shows up on top.

Search for lncRNAs with CoGeBlast

Created by CoGe user, Andrew Nelson.

Tutorials and lessons for high-school students

High-school student tutorials: These tutorials were designed in conjunctions with Michael Nakashima.

Published Papers (focused on using CoGe)

These papers are put forth as complete tutorials with background information as to how to use CoGe to perform various tasks:

Maize genome analysis: Maydica

Download Open Access Article From Maydica: http://www.maydica.org/articles/56_183.pdf

Alternative Download from CoGe: http://genomevolution.org/r/4stu

Comparative genomics with maize and other grasses: from genes to genomes!

James C. Schnable and Eric Lyons

Download: http://www.maydica.org/articles/56_183.pdf

Abstract: Of all the major plant groups, the grasses, with the complete genomes of five species, are the best positioned to take advantage of comparative genomics to obtain insight into functional genetic elements. Of all the grasses, maize is the best characterized in terms of genetics, development, and evolution. We provide several examples of how the web-based comparative genomics system CoGe may be used to aid in the interpretation of the maize genome sequence. These examples include verifying gene models, identifying differences between genome as- semblies, identifying conserved non-coding sequences, identifying syntenic regions between species and poly- ploidies, and identifying homeologs within maize and orthologs between maize and other grass genomes. In addition, a comprehensive list of orthologous gene sets is provided between maize and Sorghum, foxtail millet, rice, and Brachypodium.

Brassica genome analysis: Frontiers in Plant Genetics and Genomics

Download Open Access Article: http://www.frontiersin.org/plant_genetics_and_genomics/10.3389/fpls.2012.00172/abstract

Unleashing the genome of Brassica rapa

Haibao Tang and Eric Lyons

Abstract: The completion and release of the Brassica rapa genome is of great benefit to researchers of the Brassicas, Arabidopsis, and genome evolution. While its lineage is closely related to the model organism Arabidopsis thaliana, the Brassicas experienced a whole genome triplication subsequent to their divergence. This event contemporaneously created three copies of its ancestral genome, which had diploidized through the process of homeologous gene loss known as fractionation. By the fractionation of homeologous gene content and genetic regulatory binding sites, Brassica’s genome is well placed to use comparative genomic techniques to identify syntenic regions, homeologous gene duplications, and putative regulatory sequences. Here, we use the comparative genomics platform CoGe to perform several different genomic analyses with which to study structural changes of its genome and dynamics of various genetic elements. Starting with whole genome comparisons, the Brassica paleohexaploidy is characterized, syntenic regions with A. thaliana are identified, and the TOC1 gene in the circadian rhythm pathway from A. thaliana is used to find duplicated orthologs in B. rapa. These TOC1 genes are further analyzed to identify conserved non-coding sequences that contain cis-acting regulatory elements and promoter sequences previously implicated in circadian rhythmicity. Each “cookbook style” analysis includes a step-by-step walk-through with links to CoGe to quickly reproduce each step of the analytical process.

Workshops

SIP2010

2012 JCVI CoGe Plant Bioinformatics Workshop

2012 USDA Maricopa CoGe Plant Bioinformatics Workshop

2014 JCVI Summer Genomics Workshop

2015 PAG Computer Demo

2015 Plant Genome Evolution Workshop

2015 University of Chile

2016 Plant Reproduction

2016 Arizona State University

2016 Intro to Genome Management and Analysis with CoGe

2018 LANGEBIO CINVESTAV: Intro to genome management and analysis with CoGe

Genomic Web Resources Linking to CoGe

MaizeGDB

MaizeGDB links to CoGe through its genome browser to help researchers find syntenic gene sets between maize and sorghum.

Video Tutorials

OrganismView

OrganismView is CoGe's tool for finding genomes for your organism of interest.

FeatList

FeatList is CoGe's tool for managing lists of genomic features.

SeqView

SeqView is CoGe's tool for generating primary sequence data in fasta format.

MaizeGDB and CoGe's Maize-Sorghum Orthologies

Researchers can now go directly from MaizeGDB's genome browser to view the same region within CoGe's GenomeView and quickly compared pre-called syntenic orthologous genes between maize and sorghum, as well as the homeologous gene in maize, or when no homeolog was found, the homeologous region in which we would have expected to find it.

Using CoGe and phylogeny.fr to quickly find homologs and build phylogenetic trees

CoGe's tools make it easy to search through genomes to find homologs to a sequence of interest. Once identified, these sequences can be manipulated in FastaView and sent to phylogeny.fr for multiple sequence alignment, phylogenetic tree reconstruction, and tree visualization.

You can download a high-resolution version of the video from http://genomevolution.com/CoGe/docs/video/CoGe-Phylogenetics.mov

Human-Chimp Whole Genome Comparison

This tutorial walks through using SynMap to do a whole genome comparison between human and chimp.

You can download a high-resolution version of the video from http://genomevolution.com/CoGe/docs/video/SynMap-human-chimp.mov

Using GEvo to find a genomic region of interest, and extracting its sequence and genomic features

You can download a high-resolution version of the video from http://genomevolution.com/CoGe/docs/video/GEvo-to-extract-sequence-features.mov

Using SynMap to compare two strains of Bacillus thuringiensis and characterizing the breakpoints of an inversion (which turns out to have a chance of being due to a genome assembly error

You can download a high-resolution version of the video from http://genomevolution.com/CoGe/docs/video/Bacillus_thuringiensis_SynMap-dotplot.mov

Using OrganismView and SynMap to find and compare closely related genomes

Comparing the chloroplast genomes of maize and sorghum

This video walks through comparing the genomes of maize and sorghum chloroplasts to identify individual polymorphisms/character states.

How to verify and compare maize gene models

Using walkthrough 1 from the Maydica CoGe article

How to visualize assembly and annotation changes between versions of the maize genome

Using walkthrough 2 from the Maydica CoGe article

How to compare different assembly versions of a genome

Using walkthrough 5 from the Maydica CoGe article

This example uses SynMap to compare the assembly differences between version 1 and version 2 of the maize genome.

How to generate orthology gene lists between maize and another grass (e.g. Sorghum)

Using walkthrough 6 from the Maydica CoGe article

A difficult in identifying orthologous genes among the grasses is the grass-specific whole genome duplication event that happened prior to the radiation of all the grass lineages. The problem is compounded for maize due to its lineage-specific whole genome duplication event. When comparing the genomes of maize and sorghum, each regions of the sorghum genome is orthologously syntenic to two regions of the maize genome and paralogously syntenic to two additional regions. This video walks through comparing the genomes of maize and sorghum using SynMap, and its various advanced analytical tools to identify orthologous syntenic regions by the relative evolutionary distance of syntenic gene pairs using synonymous mutation rates and the algorithm quota align for screening syntenic regions to enforce a specific mapping of syntenic regions between genomes.

For more information on the evolutionary history of the maize and sorghum genomes: Maize Sorghum Syntenic dotplot

How to use CoGe's polymorphism tables for validating identified polymorphisms

How to use the iPlant Data Store to generate a quick-share link to share a genome file

How to use the CoGe and SSWAP (iPlant's Semantic Web Portal: http://sswap.iplantcollaborative.org/)

Using CoGe's tools SynMap and GEvo to compare the genomes of two Phytophthora species

Using CoGe's tool SynFind to identify sytnenic regions across multiple genomes. This example uses Phytophthora species.


Using CoGe's tool SynMap to compare two genomes of E. coli and analyze a syntenic discontinuity in more detail with GEvo

Using CoGe's tools SynFind and SynMap to find a mark a syntenic gene pair in a syntenic dotplot

Using user-data management system to share data

Using the iPlant Datastore to add your data to CoGe

How to load experimental data into CoGe

This short video shows how to add a SNP data to the Arabidopsis genome from the 1001 Arabidopsis genome project

How to load a private genome with annotations and experimental data, and view in JBrowse in under three minutes

This short video shows how to:

  1. Load a genome from a fasta file
  2. Load structural annotations from a GFF file
  3. Load variation (SNP) data from a vcf file
  4. Load mapped reads from a BAM file
  5. Load expression data from a CSV file

How to RNA-Seq in CoGe in three minutes

This short video shows how to load, process, and visualize RNA-Seq data in CoGe:

  • RNA-Seq Pipeline: Expression Analysis Pipeline
  • Quantitation of reads per position along a genome
  • Quantitation of FPKM for genomes with structural gene annotations
  • Individually mapped reads
  • All visualized in the EPIC-CoGe Browser (Based on JBrowse)

EPIC-CoGe/GenomeView videos

EPIC-CoGe Videos