Difference between revisions of "MotifView"

From CoGepedia
Jump to: navigation, search
(Created page with '{{ infobox Application | title = '''GEvo - Genome Evolution Analysis''' | logo = Image:GEvo-logo.png | screenshot = thumb | caption = T...')
 
 
(217 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
{{ infobox Application
 
{{ infobox Application
| title = '''GEvo - Genome Evolution Analysis'''
+
| title = '''MotifView - A motif viewing tool'''
| logo = [[Image:GEvo-logo.png]]
+
| screenshot = [[Image:motifview_image.png|200px|thumb]]
| screenshot = [[Image:GEvo_Screenshot.png|200px|thumb]]
+
| caption = MotifView at work
| caption = Typical GEvo Analysis
+
 
| developer = CoGe Team
 
| developer = CoGe Team
| analysis = Compare multiple genomic regions for synteny and other forms of genome evolution
+
| analysis = Compare multiple genomic regions for motifs
| working_state = Released
+
| working_state = Testing
| tools = blastn, tblastx, blastz, CHAOS, LAGAN, DiAlign 2
+
| tools = blastn, LAGAN
| website = http://synteny.cnr.berkeley.edu/CoGe/GEvo.pl
+
 
}}  
 
}}  
  
'''GEvo''' is CoGe's '''G'''enome '''Evo'''lution Analysis tool, designed to visually compare genomic regions using both local and global alignment algorithms.
+
MotifView is a tool that visualizes motifs in compared genomic regions.
  
  
 
= Introduction =
 
= Introduction =
The purpose of GEvo is to compare multiple genomic regions from any number of organisms using a variety of different sequence comparison algorithms in order to quickly identify patterns of genome evolution
+
MotifView uses visual and algorithmic tools  to visualize motifs within multiple genomic regions. Sharing many functional similarities to GEvo, it's possible to compare sequences from any number of organisms using a variety of different sequence comparison algorithms.
  
=Getting started=
+
On this page we provide only a brief description of options that are shared with GEvo. If descriptions and directions are ambiguous, please follow the links to specific sections to the [[GEvo#GEvo|GEvo]] instructions on that section.
[[Image:GEvo-setup.png|thumb|right|500px|Screen-shot of where a GEvo analysis is configured.  Two genomic regions have been specified by gene name and the amount of additional upstream/downstream sequence]]
+
 
 +
There's also the option to use the embedded videos to view demos of the sections following. One can either follow the text along with the video or choose to use either exclusively.
 +
 
 +
=MotifView basics=
 +
[[Image:Motifview-setup.png|thumb|right|400px|Screen-shot of where a MotifView analysis is configured.  Four genomic regions have been specified by gene name, dataset and the amount of additional upstream/downstream sequence]]
  
 
# Select genomic regions to analyze
 
# Select genomic regions to analyze
# Select a sequence alignment algorithm appropriate for the sequences and questions in mind
+
# Select a sequence alignment algorithm appropriate for the sequences and area of interest
# Press "Run GEvo Analysis!" button
+
# Select motifs to visualize
To alternate between areas to configure an analysis, select the appropriate tab.
+
# Press "Find Motifs!" button
 +
To alternate between these options to configure an analysis, select the appropriate tab.
  
 
=Sequence Submission=
 
=Sequence Submission=
Select the "Sequence Submission" tab to open these options.  Here, you can specify sequence submission boxes for each sequence that will be submitted for a GEvo anlaysis.  This is also were you can adjust the amount of sequence analyzed, select which sequences are analyzed, reverse complement a sequence, mask a sequence according the the genomic features it contains, and change the display order of sequences.
+
===Manual Submission===
 +
Select the "Sequence Submission" tab to open these options.  Here, you can specify sequence submission boxes for each sequence that will be submitted for a MotifView anlaysis.  This is also were you can adjust the amount of sequence analyzed, select which sequences are analyzed, reverse complement a sequence, mask a sequence according the the genomic features it contains, and change the display order of sequences.
  
===Adding a sequence===
+
===Merge analysis===
To add another sequence submission box, press the "Add sequence" button. After pressing, a new sequence submission box will appear
+
Frequently users have links from previous GEvo analyses that they wish to screen for motifs. Pasting the link into the "Merge Previous GEvo Analysis" window allows panels created in GEvo to be analyzed in MotifView.
  
===Select the type of sequence===
+
The different options for submitting and modifying sequences to be visualized can be found [[GEvo#Adding_a_sequence|here]].
There are three types of ways that you can submit a sequence to GEvo:
+
#Using a CoGe [[genomic feature]] name
+
#Specify a GenBank Accession for automatic retrieval from NCBI
+
#Paste a sequence in Fasta or GenBank format
+
  
You can select the sequence submission type from a drop-down menu located next to the name header in a sequence submission box.  E.g.:  Sequence 1.
+
= Alignment Algorithms =
 +
While many major algorithms exist for alignment, not all are suitable for the analysis available in MotifView. As such, MotifView compares genes at a scale that makes BlastN and LAGAN the most ideal algorithm choices. The options and suitability of available algorithms is discussed [[GEvo#Alignment_Algorithms|here]].
  
===Specifying the amount of sequence analyzed===
+
= Select Motifs =
 +
This tab allows the user to define how and which motifs will be found and analyzed. The "Select Motifs" tab contains four pull down options when choosing motifs for analysis.
  
Each sequence allows you to specify the amount of sequence to be analyzed.  
+
[[Image:Select-motifs.png|thumb|right|400px]]  
*For sequences retrieved from CoGe's database, you can specify the amount of sequence to the left and right of the [[genomic feature]] you specified by name. 
+
*For sequences retrieved from NCBI, you specify a start position and the length of sequence from that position.  If the length is left blank, the entire sequence from the specified start position to the end of the sequence is used.
+
*For direct sequence submissions, you specify a start position and the length of the sequence from that position.  If the length is left blank, the entire sequence from the specific start position to the end of the sequence is used.
+
  
If you want to change all the positions of CoGe submissions at once (e.g. add 1000000 nt left and right of all sequences), just type in the amount of sequence into the box next to "Apply distance to all CoGe submissions".
+
===Choose TFBS Motif===
 +
Users can manually enter a motif in the section "Search for User-Defined Motifs". Also, while colors are automatically provided to motifs, users can define their own color separated from the motif by a colon. For example:
  
Likewise, you can do something similar with the box next to "Pad CoGe Sequences with additional sequence".  However, when the results are returned, the position slider bars are positioned based on the amount specified, not taking into account the amount added through padding.
+
CACGTG:Red
  
===Skip a sequence===
+
===Select from Comprehensive List of Motifs===
 +
It's also possible to browse the full list of motifs in our database and add them to be analyzed. In the window presenting the full list, motifs appear by name, then sequence. Information on highlighted motifs will pop up on pressing the "Get Motif Info" button.
  
*You can access this option by opening the "sequence options" menu by click on the button located at the bottom of the sequence submission box (e.g. Sequence 1 Options).
+
Once selected, the motif will appear in the "Selected Motifs" window where they can be additionally deselected or the list cleared entirely for a new list.
  
After a preliminary GEvo anlaysis, you may want to drop a sequence from the analysis. For example, if the region is repeated in another submission or if there the pattern you were seeking is not present (e.g. [[synteny]]).  To skip a sequence, just select "yes" from the row "Skip Sequence" in the options shown at the bottom of each sequence submission box.
+
===Select Motifs from Categories===
 +
Additionally, there is a choice of provided motif categories: Stress and Families. Toggling any category will pull a down list of motifs linked to that stress or transcription factor family. If desired, a range of motifs not confined to categories is available below the categories. In addition, users can select or deselect all options in a category if needed.
  
===Making a sequence a "reference sequence"===
+
Once motifs are chosen, press the "Find Motifs!" button above the tabs to begin analysis.
  
*You can access this option by opening the "sequence options" menu by click on the button located at the bottom of the sequence submission box (e.g. Sequence 1 Options).
+
=Demo MotifView Analysis=
  
A reference sequence is one to which all other sequences are compared. By default, all sequences are selected as reference sequences.  However, you may wish to only specify a couple of reference sequences.  For example, you are searching for syntenic regions from genome B to genomic region A.  By turning off sequences as reference sequences, the number of pair-wise comparisons is minimized so the analysis is processed more quickly.  Also, this can help simplify visualizing and interacting with the results.  Especially if one genome has lots of repetitive sequences.
+
Below is a demo basic MotifView analysis. In it we illustrate how to submit a region to be analyzed, how to choose an algorithm, relevant changes to the graphics, and how to choose motifs.
  
===Reverse complement a sequence===
+
[[Image:Demo-sequence-submission5.png|thumb|500px|right|Fig. 1 Sequence submission]]
  
*You can access this option by opening the "sequence options" menu by click on the button located at the bottom of the sequence submission box (e.g. Sequence 1 Options).
+
===Sequence Submission (Fig. 1)===
  
After a preliminary GEvo analysis, you may want to "flip" a sequence. To do this, select "yes" for the row "Reverse complement" located at the bottom of the sequence submission boxes.
+
Enter the genomic region:
 +
#Enter a gene accession number in the box labeled "Name:". In this case, we've chosen AT3G11580 and AT5G06250, two homeologs with annotations that will be seen later. A list that identifies which datasets contain what annotations can be found [[Extra Annotations|here]].
 +
#Choose datasets to be analyzed. When you enter the accession number, pull down menus will be populated with datasets that contain that gene, including genomic datasets, type of DNA, etc. This example requires that we use Arabidopsis TAIR V8 that has been masked for repeats.
  
===Masking a sequence===
+
Additionally, you may define how many base pairs flank each genomic region. This will become more relevant when refining an analysis.
  
*You can access this option by opening the "sequence options" menu by click on the button located at the bottom of the sequence submission box (e.g. Sequence 1 Options).
+
===Algorithm (Fig. 2)===
  
This option allows you to mask a sequence based on its annotated genomic features:
+
[[Image:Demo-algorithm3.png|thumb|500px|left|Fig. 2 Algorithm]]
  
*[[CDS]]: Mask all protein coding sequences
+
#Next, the alignment algorithm must be chosen from the pulldown menu next to "Alignment Algorithm:". While many alignment algorithms exist, MotifView analyzes DNA within a very small defined region. As such, this example uses "BlastN" for this analysis since it works best when analyzing small regions.
*All RNA:  Mask anything which codes for RNA including rRNA, mRNA, tRNA, siRNA
+
*Non-CDS:  Mask everything except protein coding sequences (very useful when identifying synteny)
+
*Non-genic:  Mask everything that is not annotated as a gene.
+
  
===Changing the display order of sequences===
+
===Results Parameters (Fig. 3)===
 +
There are many options available for ease of use when viewing the analysis. In this example the most relevant options address annotations.
  
Sequences are displayed according to the order of the sequence submission boxes. To change this order, just click and drag the title of a sequence submission box into a new position.
+
# The gene pair we've selected includes annotations for CNSs, gene spaces, and PIL5 sites. As such, we definitely want to see said annotations in the final results so all three boxes are checked.
 +
# Further, it's possible to be overwhelmed by the number of motifs present in the imaging panel if a wide selection of motifs is chosen so we chose the option to only view motifs that overlap our annotations.
  
= Alignment Algorithms =
+
[[Image:Demo-results-parameters5.png|thumb|450px|center|Fig. 3 Results Parameters]]
Current GEvo can use:
+
*[http://www.bx.psu.edu/miller_lab/ BlastZ:]  DNA-DNA Local Alignment Algorithm.  Good for finding large regions of conserved sequence.
+
*[http://blast.ncbi.nlm.nih.gov/Blast.cgi BlastN:]  DNA-DNA Local Alignment Algorithm.  Good for finding small regions of conserved sequence.
+
*[http://blast.ncbi.nlm.nih.gov/Blast.cgi TBlastX:] Translated DNA-Translated DNA Local Alignment Algorithm. Good for finding small regions of divergent, but evolutionarily conserved, genomic sequence where protein translated sequence is more conserved than DNA sequence.
+
*[http://dialign.gobics.de/chaos-dialign-submission Chaos:] DNA-DNA Local Alignment Algorithm.  Good for finding small regions of conserved sequence.  Uses fuzzy matches so it can seed its alignment on small sequences than BlastN.  However, it is slower than BlastN.
+
*[http://bibiserv.techfak.uni-bielefeld.de/dialign/ DiAlign:] DNA-DNA Global Alignment Algorithm.  Global alignment can be seeded using local alignment algorithm.  Good for alignment the entire sequence.
+
GEvo supports using Chaos, BlastN, and BlastZ for seeding DiAlign.
+
*[http://lagan.stanford.edu/lagan_web/index.shtml Lagan:]  DNA-DNA Glocal Alignment.  Using a hybrid alignment approach.
+
  
===Picking an algorithm to use:  blastn versus blastz===
+
===Select Motifs (Fig. 4)===
Of these algorithms, blastn and blastz are the two that are most useful for most people.  These two algorithms are local alignment algorithms and will find regions of sequence similarity located anywhere between two sequences.  The primary difference of these algorithms in terms of their utility are:
+
*'''blastn''' finds '''small''' regions of sequence similarity
+
*'''blastz''' finds '''large''' regions of sequence similarity by combining smaller regions of sequence similarlity
+
Use '''blastz''' if you are comparing large genomic regions and/or looking for synteny by identifying collinear series of genes
+
Use '''blastn''' if you are comparing gene models, analyzing exons, searching for [[CNSs]]], or looking for conserved motifs
+
  
=Results=
+
[[Image:Demo-motif-select-stress2.png|thumb|300px|right|Fig. 4 Select Motifs]]
[[Image:Maize-sorghum-cns.png|thumb|right|500px|Example GEvo results comparing sorghum and two regions of maize.  Results can be regenerated at: http://tinyurl.com/yek8pdw]]
+
GEvo's results are displayed in an interactive system called [[gobe]] that lets you connect regions of similar sequence, and get additional information about genomic features.  Please follow [[gobe|this link]] for more information about gobe.
+
  
Each panel represents a genomic region, with the dashed line in the middle separating the top and bottom strands of the chromosome.  If gene models are present, they are drawn as composite colored arrows above and below this line if they are read from the top and bottom strand respectively.  Usually, the full gene is the gray arrow, on top of which is the mRNA (blue),  on top of which is protein coding sequence ([[CDS]]).  There are other colors and icons that represent other types of genomic features that are described [[GenomeView_examples | here].  Above and below the gene models will be the identified regions of sequence similarity.  These are represented by colored boxes.  The location of the colored box above or below the dashed line signifies of whether the match is in the [[(++) or (+-) orientation]] respectively.  Each pairwise comparison will have its regions of sequence similarity drawn in a separate track (both above and below the dashed line) and are usually different colors from one another (though that is configurable).  To see which region matches which other region, just click on a colored box, and a transparent wedge will be drawn connecting it to its partner region.  For more information about GEvo's interface, see the documentation on [[gobe]].
+
Though users can define their own motifs or select from our full list of motifs as shown above, we're illustrating how categories of motifs can be analyzed.
  
= Regenerating/Saving a GEvo Analysis =
+
# Toggle "Select Motifs from Stress Categories". The expanding window allows the user to choose from stresses associated with motifs including Chemical/Oxidative/Pathogen, Cold, Drought/Heat, Hypoxia, Light, Nutrient, Salt, Water, and Unspecified stresses.
 +
# In this case, toggle the Cold Stress category.
 +
# To illustrate how one can search for a range of motifs, Select All motifs in the Cold Stress category.
  
[[Image:GEvo-links.png|thumb|right|500px]]
+
===MotifView Panel (Fig. 5)===
  
=== GEvo Links ===
+
Below is the image of the analysis performed. Show in the panel are:
  
After results are generated by GEvo, a URL will be created that will be a hyperlink to GEvo with your analysis pre-configured. To regenerated the results, all you need to do is press the "Run GEvo Analysis!" button and wait for the analysis to run. This link is stored in two places:  
+
# HSPs: A high-scoring segment pair, or HSP, is a subsegment of a pair of sequences. In this case, the HSPs have been toggled to show the regions of similarity between the gene pair.
 +
# Genomic features: The gene is shown with exons painted gold, the introns painted grey and non coding regions painted blue. Notice that that gene space is also highlighted by the yellow background underlying the gene and other annotations
 +
# Motifs: These annotations are painted on as diamonds. It's important to realize that the diamonds don't represent the real size of the motifs. Rather, the motifs must be artificially represented or, because of their small size, they won't be visible at all. Notice how the green motif appears to have an HSP associated with a PIL5 site on its homeolog.
 +
# CNSs: Conserved non-coding sequences are very prevalent in this gene pair and can be differentiated from the PIL5 sites by being colored half purple.
 +
# PIL5 sites: One type of annotation, PIL5 sites are transcription factor binding sites. Notice how some sites have HSPs associated with that denote sequence similarity with sites on the homeolog.
  
#At the bottom of the results under "GEvo Links" (see example image.) This link has been condensed using the tinyurl redirecting service.
+
[[Image:Demo-panel3.png|thumb|1000px|center|Fig. 5 MotifView Panel]]
#At the bottom of the log file. The link to the log file can also be found at the bottom of the results (see example image.)
+
 
+
=== GEvo Direct ===
+
 
+
GEvo Direct is a tool for quickly viewing the results of a previously run analysis without having to re-run the analysis.  <font color=red>Please Note: </font> CoGe saves all the files from a GEvo analysis for ~24 hours.  After that time, the data-files are deleted and the GEvo Direct link will no longer work.
+
 
+
===Save Work History===
+
 
+
Registered CoGe users can save a link to a GEvo analysis for later retrieval from their work history. This permits a GEvo analysis to also be names and annotated for future reference.
+
  
 
=Modifying result graphics=
 
=Modifying result graphics=
  
 +
===Show preloaded annotations===
 +
An important feature when using MotifView is the ability to view other features such as CNSs. In the "Results Parameters" section there is the option to show preloaded annotations in the panel, including CNSs, genespace and PIL5 sites.
  
===Showing Contigs===
+
Further, one can restrict viewing motifs anywhere except when overlapping with any preloaded annotations. This is especially important because motifs are painted larger in the panel than they would actually appear. Not painting the motifs larger would result in invisible motifs but this representation can appear to make motifs overlap with other features when they do not. Restricting visible motifs to those that overlap with annotations eliminates any such error.
[[Image:GEvo-with-labels.png|thumb|right|500px|Example GEvo result with contigs, hsp labels, and genomic feature labels drawn.]]
+
[[Image:GEvo-contigs-and-labels.png|thumb|right|500px|Where to find GEvo's options for viewing contigs, HSP labels, and genomic feature labels.]]
+
  
Some genomes have contig assembly information.  To view this in GEvo's results:
+
===Other useful graphics modifications===
#Select the "Results Parameters" tab from GEvo's configuration box
+
[[Image:Demo-panel2.png|thumb|right|500px|Example MotifView result with hsps, genomic feature, CNSs, PIL5 sites, genespace, and motifs drawn. Note that motifs are also only restricted to viewing those that overlap with genespace and PIL5 sites]]
#Select "yes" for the option "Color contigs <font color=red>red</font>".
+
[[Image:Demo-results-parameters4.png|thumb|500px|right|Other useful graphic modifications in the Results Parameters tab]]
  
===Turning on labels for HSPs (blast hits) in GEvo's results===
+
[[GEvo#Showing_Contigs|Showing contigs]].
If you want to have the HSP number drawn on the HSP:
+
#Select the "Results Parameters" tab from GEvo's configuration box
+
#Select "yes" for the option "Label HSPs".
+
*You can have the labels drawn linearly, so each label is at the same vertical position for a track, or staggered, where they are drawn top, middle, bottom alternating.
+
  
===Turning on labels for Genomic Features (e.g. genes) in GEvo's results===
+
[[GEvo#Turning_on_labels_for_HSPs_(blast_hits)_in_GEvo's_results | Turning on labels for HSPs]].
If you want to have the feature names drawn on the feature:
+
#Select the "Results Parameters" tab from GEvo's configuration box
+
#Select "yes" for the option "Label Genomic Features".
+
*You can have the labels drawn linearly, so each label is at the same vertical position for a track, or staggered, where they are drawn top, middle, bottom alternating.
+
  
===Expanding Overlapping Features and Regions of Sequence Similarity===
+
[[GEvo#Turning on labels for Genomic Features (e.g. genes) in GEvo's results|Drawing feature names on features]]
[[Image:GEvo-show-overlapping.png|thumb|500px|right|Where to find GEvo's options for viewing overlapping genomic features and regions of sequence similarlity.]]
+
  
[[Image:GEvo-local-dup-no-show-overlap.png|thumb|500px|right|Example of GEvo result with local duplications that are obfuscated by not showing separating overlapping HSPs. Comparison is between orthologous regions of Arabidopsis thaliana and Arabidopsis lyrata. (A) No wedges drawn connecting regions of sequence similarity. (B) Wedges drawn connecting regions of sequence similarity. Note the "messy" regions where the local duplication is. Results can be regenerated at http://tinyurl.com/mokdnn .]]
+
[[GEvo#Expanding_Overlapping_Features_and_Regions_of_Sequence_Similarity | Expanding Overlapping Features and Regions of Sequence Similarity]]
  
[[Image:GEvo-local-dup-show-overlap.png|thumb|500px|right|vo results with "auto adjust" HSP and Genomic Features turned on. This causes GEvo to find genomic features and blast-hits that overlap at the same position, and drawn them such that they are separated in order to identify local duplications in a genomic region, repeat sequences, and alternatively spliced transcripts. This is a comparison between orthologous regions of Arabidopsis thaliana and Arabidopsis lyrata, and can be regenerated at http://tinyurl.com/mokdnn. Wedges have been drawn connection regions of sequence similarity between one gene in the bottom panel. This shows that this one gene has sequence similar to four regions in the orthologous genomic region, which is indicative of a local gene duplication. Also, there is a "stack" of HSPs which is caused by repeated sequences. Note that two genes have annotations for being alternatively spliced, which is visualized by separating the drawing of overlapping genomic features. ]]
+
=Refining an analysis=
 +
Once a MotifView analysis has run, any of the analysis parameters can be changed and re-run after pressing the "Clear all previous analysis" button. Some existing parameters will remain and other will have to be selected again.
  
By default GEvo will drawn overlapping genomic features and regions of sequence similarity on top of one another.  However, this sometimes hides some of the interesting complexities in a genomic region such as local duplications or regions containing repeated sequences. To view these, select the "Results Parameters" tab and select "Yes" for "Auto adjust overlapping features" and/or "Auto adjust overlapping HSPs". These options are set to "No" by default because finding and drawing overlapping features can take a long time to process, and are not always useful.
+
The common parameters changed are:
 +
*The extent of the genomic region analyzed. The amount to which the panel extends beyond the gene in question can be changed in the "Left sequence" and "Right sequence" boxes on the "Sequence Submission" tab. Changes to these boxes will remain in new analysis.
 +
*Reverse complementing sequences. This change will remain after previous analysis is cleared.
 +
*Also reset when previous analysis is cleared are the datasets for the sequence submissions, algorithm, and motifs selected. This means that this information will have to defined again after the user clears previous analysis
  
=Merging Analyses=
+
=Linking to GEvo=
Often, there are times when you will want to merge together two or more separate GEvo anlayses. To do this, copy a [[GEvo#GEvo_Links | GEvo link]] into the text-box next the text: "Merge Previous GEvo Analysis (paste in URL)" located at the top of the sequence submission tab.  Then press the "Merge" button".  The sequences as specified in the pasted URL will appear as new sequence submission boxes configured as specified in the link (extra up/downstream sequence, reverse complement, masked, etc.)
+
Linking to GEvo is easy! Please see [[linking to GEvo|this page]] on how.
  
=Refining an analysis=
+
=Tutorials =
Once a GEvo analysis has run, you can change any of the analysis parameters and re-run the analysis by pressing the "Run GEvo analysis" button again.
+
The common parameters changed are:
+
*The extent of the genomic region analyzed.  [[Gobe#Changing_the_extent_of_a_genomic_region | The interactive results ]] make this easy with slider bars.
+
*The algorithm used in the analysis
+
*Masking sequences
+
*Skipping sequences
+
*Reverse complementing sequences
+
*The coloration and information displayed in the result's graphics
+
  
=Hints and Tricks=
+
=References/Downloads=
===Sequences with many common sub-sequences===
+
For a list of all datasets with annotations, click [[Extra Annotations|here]]
Comparing sequences with lots of common sub-sequences usually causes GEvo to take a very long time processing the analysis (both in terms of identifying the common sequences and generating the final results).  Also, if many regions are identified, it is often difficult to make sense of the results.  This kind of problem will surface in many large genomes, such as mammal and plant genomes.  For example human and maize are both riddled with large amounts of repetitive sequences derived from retroviruses and transposons.  This makes the comparison of large genome regions in these genomes difficult, if not impossible.  To circumvent this problem, mask all sequence that does not code for protein.  You can select this option under the "Sequence options" menu and selecting "non-CDS" for the row "Mask Sequence".
+
  
=Example Analyses=
+
For a list of all TFBS motifs used in Spangler et al., New Phytologist (2011) Evidence for Conserved Noncoding Sequence Functions in Arabidopsis thaliana. , click [http://coge.iplantcollaborative.org/CoGe/tmp/MotifView/MotifsUsedInSpanglerPaper.txt here]
[[GEvo-4at-cp-vv|Analysis of syntenic regions from Arabidopsis thaliana, Carica papaya, and Vitis vinifera]]
+
  
 +
For a list of all TFBS motifs used in this site, click [http://coge.iplantcollaborative.org/CoGe/tmp/MotifView/Motiflist.txt here]
  
=Linking to GEvo=
+
=Frequently Asked Questions=
Linking to GEvo is easy!  Please see [[linking to GEvo|this page]] on how.
+
  
=Tutorials =
+
=Bug Report=
  
=References=
+
Progress on bugs can be found [[Motifview bugs|here]].
{{reflist}}
+

Latest revision as of 13:02, 20 November 2013

MotifView - A motif viewing tool
Motifview image.png

MotifView at work
Software companyCoGe Team
Analysis TypeCompare multiple genomic regions for motifs
Working stateTesting
Tools Utilizedblastn, LAGAN

MotifView is a tool that visualizes motifs in compared genomic regions.


Introduction

MotifView uses visual and algorithmic tools to visualize motifs within multiple genomic regions. Sharing many functional similarities to GEvo, it's possible to compare sequences from any number of organisms using a variety of different sequence comparison algorithms.

On this page we provide only a brief description of options that are shared with GEvo. If descriptions and directions are ambiguous, please follow the links to specific sections to the GEvo instructions on that section.

There's also the option to use the embedded videos to view demos of the sections following. One can either follow the text along with the video or choose to use either exclusively.

MotifView basics

Screen-shot of where a MotifView analysis is configured. Four genomic regions have been specified by gene name, dataset and the amount of additional upstream/downstream sequence
  1. Select genomic regions to analyze
  2. Select a sequence alignment algorithm appropriate for the sequences and area of interest
  3. Select motifs to visualize
  4. Press "Find Motifs!" button

To alternate between these options to configure an analysis, select the appropriate tab.

Sequence Submission

Manual Submission

Select the "Sequence Submission" tab to open these options. Here, you can specify sequence submission boxes for each sequence that will be submitted for a MotifView anlaysis. This is also were you can adjust the amount of sequence analyzed, select which sequences are analyzed, reverse complement a sequence, mask a sequence according the the genomic features it contains, and change the display order of sequences.

Merge analysis

Frequently users have links from previous GEvo analyses that they wish to screen for motifs. Pasting the link into the "Merge Previous GEvo Analysis" window allows panels created in GEvo to be analyzed in MotifView.

The different options for submitting and modifying sequences to be visualized can be found here.

Alignment Algorithms

While many major algorithms exist for alignment, not all are suitable for the analysis available in MotifView. As such, MotifView compares genes at a scale that makes BlastN and LAGAN the most ideal algorithm choices. The options and suitability of available algorithms is discussed here.

Select Motifs

This tab allows the user to define how and which motifs will be found and analyzed. The "Select Motifs" tab contains four pull down options when choosing motifs for analysis.

Select-motifs.png

Choose TFBS Motif

Users can manually enter a motif in the section "Search for User-Defined Motifs". Also, while colors are automatically provided to motifs, users can define their own color separated from the motif by a colon. For example:

CACGTG:Red

Select from Comprehensive List of Motifs

It's also possible to browse the full list of motifs in our database and add them to be analyzed. In the window presenting the full list, motifs appear by name, then sequence. Information on highlighted motifs will pop up on pressing the "Get Motif Info" button.

Once selected, the motif will appear in the "Selected Motifs" window where they can be additionally deselected or the list cleared entirely for a new list.

Select Motifs from Categories

Additionally, there is a choice of provided motif categories: Stress and Families. Toggling any category will pull a down list of motifs linked to that stress or transcription factor family. If desired, a range of motifs not confined to categories is available below the categories. In addition, users can select or deselect all options in a category if needed.

Once motifs are chosen, press the "Find Motifs!" button above the tabs to begin analysis.

Demo MotifView Analysis

Below is a demo basic MotifView analysis. In it we illustrate how to submit a region to be analyzed, how to choose an algorithm, relevant changes to the graphics, and how to choose motifs.

Fig. 1 Sequence submission

Sequence Submission (Fig. 1)

Enter the genomic region:

  1. Enter a gene accession number in the box labeled "Name:". In this case, we've chosen AT3G11580 and AT5G06250, two homeologs with annotations that will be seen later. A list that identifies which datasets contain what annotations can be found here.
  2. Choose datasets to be analyzed. When you enter the accession number, pull down menus will be populated with datasets that contain that gene, including genomic datasets, type of DNA, etc. This example requires that we use Arabidopsis TAIR V8 that has been masked for repeats.

Additionally, you may define how many base pairs flank each genomic region. This will become more relevant when refining an analysis.

Algorithm (Fig. 2)

Fig. 2 Algorithm
  1. Next, the alignment algorithm must be chosen from the pulldown menu next to "Alignment Algorithm:". While many alignment algorithms exist, MotifView analyzes DNA within a very small defined region. As such, this example uses "BlastN" for this analysis since it works best when analyzing small regions.

Results Parameters (Fig. 3)

There are many options available for ease of use when viewing the analysis. In this example the most relevant options address annotations.

  1. The gene pair we've selected includes annotations for CNSs, gene spaces, and PIL5 sites. As such, we definitely want to see said annotations in the final results so all three boxes are checked.
  2. Further, it's possible to be overwhelmed by the number of motifs present in the imaging panel if a wide selection of motifs is chosen so we chose the option to only view motifs that overlap our annotations.
Fig. 3 Results Parameters

Select Motifs (Fig. 4)

Fig. 4 Select Motifs

Though users can define their own motifs or select from our full list of motifs as shown above, we're illustrating how categories of motifs can be analyzed.

  1. Toggle "Select Motifs from Stress Categories". The expanding window allows the user to choose from stresses associated with motifs including Chemical/Oxidative/Pathogen, Cold, Drought/Heat, Hypoxia, Light, Nutrient, Salt, Water, and Unspecified stresses.
  2. In this case, toggle the Cold Stress category.
  3. To illustrate how one can search for a range of motifs, Select All motifs in the Cold Stress category.

MotifView Panel (Fig. 5)

Below is the image of the analysis performed. Show in the panel are:

  1. HSPs: A high-scoring segment pair, or HSP, is a subsegment of a pair of sequences. In this case, the HSPs have been toggled to show the regions of similarity between the gene pair.
  2. Genomic features: The gene is shown with exons painted gold, the introns painted grey and non coding regions painted blue. Notice that that gene space is also highlighted by the yellow background underlying the gene and other annotations
  3. Motifs: These annotations are painted on as diamonds. It's important to realize that the diamonds don't represent the real size of the motifs. Rather, the motifs must be artificially represented or, because of their small size, they won't be visible at all. Notice how the green motif appears to have an HSP associated with a PIL5 site on its homeolog.
  4. CNSs: Conserved non-coding sequences are very prevalent in this gene pair and can be differentiated from the PIL5 sites by being colored half purple.
  5. PIL5 sites: One type of annotation, PIL5 sites are transcription factor binding sites. Notice how some sites have HSPs associated with that denote sequence similarity with sites on the homeolog.
Fig. 5 MotifView Panel

Modifying result graphics

Show preloaded annotations

An important feature when using MotifView is the ability to view other features such as CNSs. In the "Results Parameters" section there is the option to show preloaded annotations in the panel, including CNSs, genespace and PIL5 sites.

Further, one can restrict viewing motifs anywhere except when overlapping with any preloaded annotations. This is especially important because motifs are painted larger in the panel than they would actually appear. Not painting the motifs larger would result in invisible motifs but this representation can appear to make motifs overlap with other features when they do not. Restricting visible motifs to those that overlap with annotations eliminates any such error.

Other useful graphics modifications

Example MotifView result with hsps, genomic feature, CNSs, PIL5 sites, genespace, and motifs drawn. Note that motifs are also only restricted to viewing those that overlap with genespace and PIL5 sites
Other useful graphic modifications in the Results Parameters tab

Showing contigs.

Turning on labels for HSPs.

Drawing feature names on features

Expanding Overlapping Features and Regions of Sequence Similarity

Refining an analysis

Once a MotifView analysis has run, any of the analysis parameters can be changed and re-run after pressing the "Clear all previous analysis" button. Some existing parameters will remain and other will have to be selected again.

The common parameters changed are:

  • The extent of the genomic region analyzed. The amount to which the panel extends beyond the gene in question can be changed in the "Left sequence" and "Right sequence" boxes on the "Sequence Submission" tab. Changes to these boxes will remain in new analysis.
  • Reverse complementing sequences. This change will remain after previous analysis is cleared.
  • Also reset when previous analysis is cleared are the datasets for the sequence submissions, algorithm, and motifs selected. This means that this information will have to defined again after the user clears previous analysis

Linking to GEvo

Linking to GEvo is easy! Please see this page on how.

Tutorials

References/Downloads

For a list of all datasets with annotations, click here

For a list of all TFBS motifs used in Spangler et al., New Phytologist (2011) Evidence for Conserved Noncoding Sequence Functions in Arabidopsis thaliana. , click here

For a list of all TFBS motifs used in this site, click here

Frequently Asked Questions

Bug Report

Progress on bugs can be found here.