CoGeBlast

From CoGepedia
Revision as of 15:55, 6 April 2009 by Admin (Talk | contribs)

Jump to: navigation, search
CoGeBlast
CoGeBlast-logo.png
CoGeBlast Snapshot.png

CoGeBlast Screenshot
Software companyCoGe Team
Analysis TypeBlast query sequences against genomes stored in CoGe database
Working stateReleased
Tools Utilizedblastn, tblastn, tblastx, blastz
Websitehttp://synteny.cnr.berkeley.edu/CoGe/CoGeBlast.pl

CoGeBlast is CoGe's interface to BLAST (Basic Local Alignment Search Tool) and other related algorithms. With CoGeBlast, one can take any query sequence, whether user submitted or requested from the CoGe database, and compare it against any number of genomes in the CoGe database

Overview

Background

CoGeBlast is a web-based interface to blast that allows you to quickly:

  1. Add query sequences
  2. Find and select any number of organisms and genomes in CoGe to search against
  3. Configure a blast analysis
  4. Visualize an overview of blast hits (High-scoring Sequence Pairs; HSP) in relationship to their genomic locations
  5. Interact with a sortable list of blast hits detailing
    1. which query sequence matched which organism
    2. their genomic location
    3. query sequence coverage
    4. variety of blast hit metrics (length, e-value, score, percent ID, quality)
    5. and allows you to find the closest genomic feature in the searched organisms
  6. Visualize individual hits in their genomic context to determine the extent to which you query matched
  7. Get sequence, alignment and positional information for a given hit
  8. An overview of the number of times your query sequences matched a given organism
  9. Links to data files including
    1. table of blast hits metrics and sequences
    2. fasta file of query Hit sequences
    3. fasta file of subject Hit sequences
    4. file of blast hit alignments
    5. raw blast reports for each organisms searched
  10. Ability to select identified genomic regions and:
    1. send to other programs in CoGe
    2. generate a fasta file of nearby genomic features
    3. export results to a tab delimited file or Microsoft Excel

Alignment Algorithms

CoGeBlast utilizes a number of variants of the BLAST algorithm originally developed by Altschul et al. [1]

Running an Analysis

Configure

Quick Run

To quickly run an analysis:

  1. Add query sequences
  2. Find and add organisms to search against
  3. Press the "CoGeBlast" button to start your analysis.

Detailed View

1. Adding Query Sequences:

Simply paste your sequences in this box. If you are searching with more than one sequence, make sure they are in fasta format:

>sequence name TAATATATCTGATGATGCTGACTGCATGCA

>sequence 2 name TATGATCGTACGTACGTACGATCGTACGATCGT

Many tools in CoGe link to CoGeBlast and will automatically deposit sequences in this box. You can always replace those that have been automatically deposited or add additional sequence.

2. Select Blast analysis type:

If you have added in your own sequences, make sure to select whether they are protein or DNA sequences. If sequences have been added automatically, when you change the sequence type, the sequence in the box will change automatically as well. For each sequence type, you can then select an appropriate blast algorithms. Blastb, tblastx, and blastz for nucleotide sequence; tblastn for protein sequence.

3. Configure blast parameters

Different blast algorithms have different parameters you can set. The ones in this area will change depending on the algorithm selected. Although an explanation of the meaning of the parameters are beyond the scope of this document, you can easily find the information elsewhere on the internet. However, one important configuration for CoGeBlast is "Limit results to:" which sets the upper limit to the number of blast hits displayed for each organism, regardless of how blast is configured. This limit is set so that if you blast a sequence that is highly repetitive, you do not overload your web-browser with results. You can change this limit as you see fit, and if more results were generated than were returned to your browser, you will be notified in the results. Also, the entire blast results file is available for downloading.

4. Select Organisms to Blast

There are many thousands of organisms in CoGe. To find those of interest, simply type their name (or a portion of their name) in the "Name" box or a description in the "Description" box. Most organisms have a description that follows NCBI's organism naming convention. For example:

Escherichia coli str. K12 substr. DH10B Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia

This allows you to search descriptions for "gamma" and find all gammaproteobacteria (plus some other things). When you get the list of organisms back, you can add them to the search list by selecting them and pressing the "add" button, or double clicking on the organism name. If you want to add all from the search list, press "Add all listed".

5. Color Blast Hits According to:

You can color the blast hits that are displayed on the genomic overview of blast hits based on a few criteria:

  1. None: All hits are colored green
  2. Query Sequence: Each blast hits generating by a different query sequence is given a different color. This is useful if you are looking at the genomic distribution of a few different sequences.
  3. Log Quality: Each blast hit is colored based on its log normalized "quality" score. The quality score is calculated by multiplying the percent identity of the hit to its coverage of the query sequence (PERCENT_IT * PERCENT_QUERY_COVERAGE). The colors are displayed in a green-yellow-red gradient with green being the top score and red being the bottom score.
  4. Percent Identity: Each blast hit is colored based on its percent identity. The colors are displayed in a green-yellow-red gradient with green being the top score and red being the bottom score.

An example of such colors are show for Log Quality. Note that each organism's hit colors are normalized only to it:

Running

Run CoGeBlast

When your analysis is configured, just press the "CoGeBlast" button!

While CoGeBlast is Running

While CoGeBlast is running, you'll see a spinning double helix of DNA at the top of the web-page, and a status report of what is happening behind the scenes in terms of finding or creating organisms' blastable databases, and blasting their genomes:

Results

CoGeBlast Results

When returned, your results will appear above the section where you configured your analysis:

  1. Is a graphical overview of the location of your blast hits in genomes searched. Those that had no hits will be listed at the bottom of this table.
  2. Interactive table of blast hits showing detailed information for each hit. Each column of the table is sortable, and can be hidden from view if not needed for interpreting your analysis.
  3. Overview table of the number of times each query sequence hit each organism. If the number of hits returned exceeded the limit as specified in the configuration of your analysis, there will be a notification in this table.
  4. Links to data files.
  5. (Not shown above). A detailed view of a blast hit in the context of the query sequence and the genomic region to which it matched.

Documentation

You should read this if you are confused.

References

  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990). "Basic local alignment search tool". J Mol Biol 215 (3): 403–410.doi:10.1006/jmbi.1990.9999. PMID 2231712. http://www-math.mit.edu/~lippert/18.417/papers/altschuletal1990.pdf. 

Tutorials

We has them.