Phylogenetics in CoGe

From CoGepedia
Revision as of 17:30, 6 April 2010 by Elyons (Talk | contribs) (Created page with '==Find a sequence of interest.== Searching for At1g02120 using [[FeatView. Search can be regenerated at: http://synteny.cnr.b...')

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Find a sequence of interest.

Searching for At1g02120 using FeatView. Search can be regenerated at:

To begin, you'll need at least one sequence:

If you use CoGe to get a sequence, you can send it directly to CoGeBlast to find homologs.

Search for potential homologs in other genomes using CoGeBlast

Searching for homologs of At1g02120 using CoGeBlast against plant genomes. Search can be regenerated at:;gstid=1

CoGeBlast lets you search your sequence(s) against any number of genomes in CoGe. In this case, organisms were identified that have "planta" in the organism description, which will find all plant genomes in CoGe. CoGe's organism descriptions usually follow NCBI's taxonomic description:

Eukaryota;Viridiplantae;Streptophyta;Embryophyta;Tracheophyta;Spermatophyta;Magnoliophyta;eudicotyledons;core eudicotyledons;rosids;eurosids II;Brassicales;Brassicaceae;Arabidopsis;

and a search for "planta" matches anything with "Viridiplantae".

Evaluate potential homologs in CoGeBlast and select the "good" ones

Evaluating a blast hit using CoGeBlast in Arabidopsis lyrata to Arabidopsis thaliana's At1g02120.

After selecting and searching genomes in CoGeBlast, its visualization tools make it easy to identify hits that match homologs. By clicking on an HSP, a popup graphic will be displayed showing the regions matched between the query sequence and a genome. The blast hit you clicked on in colored yellow, and all other blast hits between that query sequence at the genomic region are colored red. This makes it easy to see that the query sequence has full coverage in a genomic region, even if there are several blast hits between those sequences.

Send the identified homologs to FeatView

Selecting homologs in CoGeBlast to send to FastaView.

CoGe's blast will identify genomic features overlapping with a blast hit. This makes it so that you can check the sequence and mark it to be sent to other tools in CoGe. In this case, we are going to send all overlapping genomic features to FastaView. Don't worry if you check the of the same overlapping sequence more than once, all duplicate submissions will be collapsed to a single entry.

Send the selected homologs to FastaView

Viewing identified homologs in FastaView. Protein sequences have been requested.

FastaView is CoGe's tool for viewing many fasta sequences. It has a button you can press to translate DNA sequence to protein sequence.

Modify sequences in FastaView

Some protein translations in FastaView do not have a "M" start and a "*" stop. FastaView will return all reading frame translation in those cases and you will have to delete them from the textbox before submission to

FastaView will try to identify the correct reading frame for DNA sequence by identifying a start methionine (M), a stop codong (*) and to intragenic stop codons. If it can't meet these three criteria, it will return all 6 reading frame translations. If this happens, as is shown here, you will need to remove any translation that is not correct. You can do this by highlighting the bad fasta sequence in the text-box and pressing the delete key.

Likewise, if you have additional sequences you'd like to add, just copy and paste them into the text-box in fasta format. Remember to add back your original sequence if you need to!

To send your sequences to for phylogenetic anlaysis, just press the "" button. All the sequences in the text-box will automatically be submitted and the analysis at automatically started.

Send sequences to automatically running MUSCLE to generate a multiple sequence alignment. automatically running PhyML maximum likelihood tree conconstruction. automatically rendering a phylogenetic tree using TreeDyn.
Final phylogenetic tree generated by's pipeline consists of:

  1. Multiple sequence alignment using MUSCLE
  2. Phylogenetic tree reconstruction using PhyML
  3. Phylogenetic tree visualization using TreeDyn

Video Tutorial

This short video walks through the example detailed above.