Difference between revisions of "Ramosa2 orthologs and CNSs"

From CoGepedia
Jump to: navigation, search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[Image:NM_001138446.png|thumb|600px|right|GenBank Accession NM_001138446 from NCBI]]
+
=A CoGe Walkthrough – ramosa2 ortholog identification and CNS analysis=
[[Image:Ra2-CoGeBlast.png|thumb|600px|right|Configuring CoGeBlast to search multiple grass genomes]]
+
 
[[Image:CoGeBlast-HSPTable.png|thumb|600px|right|HSP Table from CoGeBlast.  HSPs are commonly known as "Blast Hits"]]
+
In this exercise you will use genomic synteny to identify putative ramosa2 orthologs across several cereal genomes.  In addition you will identify Conserved Non-Coding Sequences (CNSs) in the ramosa2 promoter region. 
[[Image:GEvo-manual-add-sequence.png|thumb|600px|right|Manually adding a sequence to a GEvo analysis by pasting in a sequence into the sequence submission box]]
+
 
[[Image:Ra2-GEvo-result.png|thumb|600px|right|GEvo result of ra2 in syntenic region between Brachypodium and Rice]]
+
[[Image:NM_001138446.png|thumb|600px|center|GenBank Accession NM_001138446 from NCBI]]
[[Image:GEvo-direct-submission-check.png|thumb|600px|right|Checking a direct sequence submission in GEvo]]
+
 
[[Image:GEvo-reverse-complement-option.png|thumb|600px|right|Selecting the reverse complement option for a sequence submission in GEvo]]
+
First we need to get some ra2 sequence to use in CoGe Blast.  Search the NCBI nucleotide database for: NM_001138446.  Copy the Fasta sequence of the mRNA. 
[[Image:GEvo-sequence-needs-rc.png|thumb|600px|right|Identifying a sequence that needs to be reverse complemented in GEvo]]
+
 
[[Image:GEvo-add-to-all.png|thumb|600px|right|Option increase up and downstream sequence to all sequence submissions]]
+
Now go to the website for CoGe: http://synteny.cnr.berkeley.edu/CoGe/ (you can just google CoGe and you’ll see it a few pages down).  Click on “CoGe Blast” and paste your sequence in the “Sequences field”.  It is not necessary to have a Fasta header but it helps keep track of things. 
[[Image:GEvo-change-region-extent.png|thumb|600px|right|Changing the amount of up and downstream sequence for a sequence submission in GEvo]]
+
 
[[Image:GEvo-identifying-conserved-sequence-ra2.png|thumb|600px|right|Identifying conserved sequence around ramora2 in GEvo]]
+
Now we need to decide which organisms we will Blast.  Only (mostly) completed genome sequences are available from this window (more on how to add other sequences later on).  In the “Organism Name” field search for Sorghum, Rice, Zea mays, and Brachypodium and add them in turn by hitting the “+add” button.  I find that it is best to add both the pseudomolecule maize sequence as well as the sequenced BACs as shown in the picture below.  Make sure you don’t accidently add the chloroplast genomes instead of the nuclear!
[[Image:GEvo-blastn.png|thumb|600px|right|Selecting blastn in GEvo]]
+
 
[[Image:GEvo-ra2-cns-discovery.png|thumb|600px|right|Using blastn in GEvo to identify conserved noncoding sequences for ramora2 in grasses]]
+
[[Image:Ra2-CoGeBlast.png|thumb|600px|center|Configuring CoGeBlast to search multiple grass genomes]]
[[Image:GEvo-image-parameters-hsp-limit.png|thumb|600px|right|Setting HSP limits in GEvo]]
+
 
 +
When you’re ready, hit “Run CoGe Blast” to start your analysis.
 +
 
 +
Note, this is just like Blast, only you are blasting several whole genome sequences at the same time.  You can enter protein sequences, change the Blast algorithm and adjust parameters just as you would in any Blast analysis. 
 +
 
 +
Your results will show both the genomic locations and a table of the “blast hits” or high scoring pairs (HSPs).  The HSPs are initially ordered by organism, but you can order them by any of the criteria at the top.  To sort HSPs by multiple criteria simply hit one criteria then hold shift while selecting another.  I often sort by HSP# then organism.  This will give you the top hits in all blasted genomes.  You can evaluate the results much like any blast analysis.  At this point we will pick which genomic locations to examine for gene synteny.  Select the top hit from each genome. 
 +
 
 +
[[Image:CoGeBlast-HSPTable.png|thumb|600px|center|HSP Table from CoGeBlast.  HSPs are commonly known as "Blast Hits"]]
 +
 
 +
Note that for now we will select top hits from both maize genomes.  Even though they are on the same BAC it is often better to see the psuedomolecule assembly as well as the BAC annotations.
 +
 
 +
Once you have selected the HSPs you would like to analyze, hit “Send to GEvo”.  In short, GEvo uses blast to compare the whole genomic regions adjacent to the HSPs you selected.
 +
 
 +
[[Image:GEvo-manual-add-sequence.png|thumb|600px|center|Manually adding a sequence to a GEvo analysis by pasting in a sequence into the sequence submission box]]
 +
 
 +
You’ll notice that you now have a small box for each sequence that you will be comparing.  I find that it is helpful to add the original query mRNA sequence to your analysis.  To do this hit the “Add Sequence” button at the top left.  This will add a blank sequence box.  From here you can add any sequence by either direct submission (copy the sequence in!) or by entering the Genbank ID.  In this picture I have added the original ra2 mRNA by direct submission. 
 +
 
 +
This function is also helpful when you want to add data from non-sequenced genomes or corrected gene models. 
 +
 
 +
After adding your query sequence, hit “Run GEvo Analysis!” to start.
 +
 
 +
Your GEvo results (http://tinyurl.com/yga5ozu) hit “run GEvo analysis” to recapitulate) will show a small chromosomal region of each genome being compared.  Gene models are displayed in the center of each sequence while color coded HSPs are shown on the top and bottom (depending on strand orientation).  Gene models are colored in grey (gene), green (CDS), and blue (mRNA).  Models that correspond to your query are colored yellow.  Each HSP is connected between two genomes.  In order to see the connector, simply click on the HSP.  For example, this image shows the HSP connector between the coding regions of the putative ramosa2 genes in Rice and Brachypodium.
 +
 
 +
[[Image:Ra2-GEvo-result.png|thumb|600px|center|GEvo result of ra2 in syntenic region between Brachypodium and Rice]]
 +
 
 +
In order to see all connectors between two different genomes hold down shift while clicking on an HSP.  If you want to clear the HSP connectors simply hit the “Clear Connectors” button.  In addition, when you select an HSP connector a window will pop up showing you more details about the HSP.  In order to see the exact alignment simply hit “full annotation” in the pop up window. 
 +
 
 +
One of the first things I like to check is that I actually found the genomic region in maize for ramosa2.  In order to do this, select an HSP in the mRNA sequence you added by direct submission that corresponds to the maize genomic squence. 
 +
 
 +
[[Image:GEvo-direct-submission-check.png|thumb|600px|center|Checking a direct sequence submission in GEvo]]
 +
 
 +
You can now hit “full annotation” in the HSP pop up window and you will see that the alignment is exact.  We found ra2 in the maize genome!  Unfortunately it looks like the ra2 gene in the 40x masked psuedomolecule sequence may have been masked out of existence!  Maize is a bit of a pain in the shape it’s in. 
 +
As you poke around you may notice that the orientation of a gene may by wrong.  For example in our case Brachypodium and rice are in the opposite orientation to the rest of the sequences.
 +
 
 +
[[Image:GEvo-sequence-needs-rc.png|thumb|600px|center|Identifying a sequence that needs to be reverse complemented in GEvo]]
 +
[[Image:GEvo-reverse-complement-option.png|thumb|600px|center|Selecting the reverse complement option for a sequence submission in GEvo]]
 +
 
 +
In order to fix this, simply hit the “Reverse Compliment” button in the boxes corresponding to Brachypodium and rice.  Hit “Run GEvo Analysis!” to redo the analysis.  (http://tinyurl.com/yg9g22q)
 +
 
 +
Now the stage is set to compare synteny between the putative ra2 regions.  In order to do this you need to increase the size of the region you are examining.  The easiest way to do this is to use the “Apply distance to all CoGe submissions?”  By default CoGe compares 10000 bp.  Lets increase the regions to 100000.  (as an alternative you can increase the region for each individual sequence in the sequence box). 
 +
 
 +
[[Image:GEvo-add-to-all.png|thumb|600px|center|Option increase up and downstream sequence to all sequence submissions]]
 +
 
 +
Now play around with the HSP connectors and you should find that we have found syntenic regions for ramosa2 across 4 different species!  Notice how much more repeat space is in maize.  You may want to expand the regions for maize just a bit to pull in more syntenic genes.  I changed the “Left Sequence” and “Right Sequence” of both maize sequences to 5000 and I got a better view.  (http://tinyurl.com/yk5rko7)
 +
 
 +
[[Image:GEvo-change-region-extent.png|thumb|600px|center|Changing the amount of up and downstream sequence for a sequence submission in GEvo]]
 +
 
 +
Now that we’re confident we’ve found orthologous chromosomes let’s find some CNSs.  To do this you’ll need to zoom back down to find the ra2 promoter.  The best way to do this is to grab the brackets on either side of each sequence and flank the region you want.  It’s best to connect the ra2 HSPs so that you can zoom down to the right region.  Then look for conserved regions around ra2 and try to zoom down to include only those regions. 
 +
 
 +
 
 +
[[Image:GEvo-identifying-conserved-sequence-ra2.png|thumb|600px|center|Identifying conserved sequence around ramora2 in GEvo]]
 +
 
 +
Once you’ve zoomed down to the region you want to explore you need to change the alignment algorithm to look for small regions of highly similar sequences.  You can do this in the Algorithm pane at the bottom.  Simply change the “Alignment Algorithm” to “BlastN: small regions” and use the defaults. 
 +
 
 +
[[Image:GEvo-blastn.png|thumb|600px|center|Selecting blastn in GEvo]]
 +
 
 +
Then hit “Run GEvo Analysis”.  There are your CNSs! (http://tinyurl.com/yzrlthn)
 +
 
 +
 
 +
[[Image:GEvo-ra2-cns-discovery.png|thumb|600px|center|Using blastn in GEvo to identify conserved noncoding sequences for ramora2 in grasses]]
 +
 
 +
Aside:  I find it helpful to mess with some settings in order to make a pretty picture.  Look in the “Results Parameters” tab at the bottom.  I usually change “Don’t Show HSPs with X overlaps” to 3-5.  This will eliminate mobilized transposons, but it may also eliminated recent gene duplications.  I also tend to change the top left three settings in order to make the image fit on the page.  Then you can print and publish your image using a screen capture.  Some apple Screen Capture commands: Hold command, shift, and control then hit 4 and this will give you some crosshairs.  Select the part of the screen you want a picture of then paste where you want the picture.  If you want to save the picture to the desktop use just command, shift and 4. 
 +
 
 +
[[Image:GEvo-image-parameters-hsp-limit.png|thumb|600px|center|Setting HSP limits in GEvo]]
 +
 
 +
Thanks!!
 +
 
 +
Devin Lee O'Connor
 +
 
 +
University of California, Berkeley
 +
 
 +
devin_oconnor@berkeley.edu

Latest revision as of 21:52, 1 January 2010

A CoGe Walkthrough – ramosa2 ortholog identification and CNS analysis

In this exercise you will use genomic synteny to identify putative ramosa2 orthologs across several cereal genomes. In addition you will identify Conserved Non-Coding Sequences (CNSs) in the ramosa2 promoter region.

GenBank Accession NM_001138446 from NCBI

First we need to get some ra2 sequence to use in CoGe Blast. Search the NCBI nucleotide database for: NM_001138446. Copy the Fasta sequence of the mRNA.

Now go to the website for CoGe: http://synteny.cnr.berkeley.edu/CoGe/ (you can just google CoGe and you’ll see it a few pages down). Click on “CoGe Blast” and paste your sequence in the “Sequences field”. It is not necessary to have a Fasta header but it helps keep track of things.

Now we need to decide which organisms we will Blast. Only (mostly) completed genome sequences are available from this window (more on how to add other sequences later on). In the “Organism Name” field search for Sorghum, Rice, Zea mays, and Brachypodium and add them in turn by hitting the “+add” button. I find that it is best to add both the pseudomolecule maize sequence as well as the sequenced BACs as shown in the picture below. Make sure you don’t accidently add the chloroplast genomes instead of the nuclear!

Configuring CoGeBlast to search multiple grass genomes

When you’re ready, hit “Run CoGe Blast” to start your analysis.

Note, this is just like Blast, only you are blasting several whole genome sequences at the same time. You can enter protein sequences, change the Blast algorithm and adjust parameters just as you would in any Blast analysis.

Your results will show both the genomic locations and a table of the “blast hits” or high scoring pairs (HSPs). The HSPs are initially ordered by organism, but you can order them by any of the criteria at the top. To sort HSPs by multiple criteria simply hit one criteria then hold shift while selecting another. I often sort by HSP# then organism. This will give you the top hits in all blasted genomes. You can evaluate the results much like any blast analysis. At this point we will pick which genomic locations to examine for gene synteny. Select the top hit from each genome.

HSP Table from CoGeBlast. HSPs are commonly known as "Blast Hits"

Note that for now we will select top hits from both maize genomes. Even though they are on the same BAC it is often better to see the psuedomolecule assembly as well as the BAC annotations.

Once you have selected the HSPs you would like to analyze, hit “Send to GEvo”. In short, GEvo uses blast to compare the whole genomic regions adjacent to the HSPs you selected.

Manually adding a sequence to a GEvo analysis by pasting in a sequence into the sequence submission box

You’ll notice that you now have a small box for each sequence that you will be comparing. I find that it is helpful to add the original query mRNA sequence to your analysis. To do this hit the “Add Sequence” button at the top left. This will add a blank sequence box. From here you can add any sequence by either direct submission (copy the sequence in!) or by entering the Genbank ID. In this picture I have added the original ra2 mRNA by direct submission.

This function is also helpful when you want to add data from non-sequenced genomes or corrected gene models.

After adding your query sequence, hit “Run GEvo Analysis!” to start.

Your GEvo results (http://tinyurl.com/yga5ozu) hit “run GEvo analysis” to recapitulate) will show a small chromosomal region of each genome being compared. Gene models are displayed in the center of each sequence while color coded HSPs are shown on the top and bottom (depending on strand orientation). Gene models are colored in grey (gene), green (CDS), and blue (mRNA). Models that correspond to your query are colored yellow. Each HSP is connected between two genomes. In order to see the connector, simply click on the HSP. For example, this image shows the HSP connector between the coding regions of the putative ramosa2 genes in Rice and Brachypodium.

GEvo result of ra2 in syntenic region between Brachypodium and Rice

In order to see all connectors between two different genomes hold down shift while clicking on an HSP. If you want to clear the HSP connectors simply hit the “Clear Connectors” button. In addition, when you select an HSP connector a window will pop up showing you more details about the HSP. In order to see the exact alignment simply hit “full annotation” in the pop up window.

One of the first things I like to check is that I actually found the genomic region in maize for ramosa2. In order to do this, select an HSP in the mRNA sequence you added by direct submission that corresponds to the maize genomic squence.

Checking a direct sequence submission in GEvo

You can now hit “full annotation” in the HSP pop up window and you will see that the alignment is exact. We found ra2 in the maize genome! Unfortunately it looks like the ra2 gene in the 40x masked psuedomolecule sequence may have been masked out of existence! Maize is a bit of a pain in the shape it’s in. As you poke around you may notice that the orientation of a gene may by wrong. For example in our case Brachypodium and rice are in the opposite orientation to the rest of the sequences.

Identifying a sequence that needs to be reverse complemented in GEvo
Selecting the reverse complement option for a sequence submission in GEvo

In order to fix this, simply hit the “Reverse Compliment” button in the boxes corresponding to Brachypodium and rice. Hit “Run GEvo Analysis!” to redo the analysis. (http://tinyurl.com/yg9g22q)

Now the stage is set to compare synteny between the putative ra2 regions. In order to do this you need to increase the size of the region you are examining. The easiest way to do this is to use the “Apply distance to all CoGe submissions?” By default CoGe compares 10000 bp. Lets increase the regions to 100000. (as an alternative you can increase the region for each individual sequence in the sequence box).

Option increase up and downstream sequence to all sequence submissions

Now play around with the HSP connectors and you should find that we have found syntenic regions for ramosa2 across 4 different species! Notice how much more repeat space is in maize. You may want to expand the regions for maize just a bit to pull in more syntenic genes. I changed the “Left Sequence” and “Right Sequence” of both maize sequences to 5000 and I got a better view. (http://tinyurl.com/yk5rko7)

Changing the amount of up and downstream sequence for a sequence submission in GEvo

Now that we’re confident we’ve found orthologous chromosomes let’s find some CNSs. To do this you’ll need to zoom back down to find the ra2 promoter. The best way to do this is to grab the brackets on either side of each sequence and flank the region you want. It’s best to connect the ra2 HSPs so that you can zoom down to the right region. Then look for conserved regions around ra2 and try to zoom down to include only those regions.


Identifying conserved sequence around ramora2 in GEvo

Once you’ve zoomed down to the region you want to explore you need to change the alignment algorithm to look for small regions of highly similar sequences. You can do this in the Algorithm pane at the bottom. Simply change the “Alignment Algorithm” to “BlastN: small regions” and use the defaults.

Selecting blastn in GEvo

Then hit “Run GEvo Analysis”. There are your CNSs! (http://tinyurl.com/yzrlthn)


Using blastn in GEvo to identify conserved noncoding sequences for ramora2 in grasses

Aside: I find it helpful to mess with some settings in order to make a pretty picture. Look in the “Results Parameters” tab at the bottom. I usually change “Don’t Show HSPs with X overlaps” to 3-5. This will eliminate mobilized transposons, but it may also eliminated recent gene duplications. I also tend to change the top left three settings in order to make the image fit on the page. Then you can print and publish your image using a screen capture. Some apple Screen Capture commands: Hold command, shift, and control then hit 4 and this will give you some crosshairs. Select the part of the screen you want a picture of then paste where you want the picture. If you want to save the picture to the desktop use just command, shift and 4.

Setting HSP limits in GEvo

Thanks!!

Devin Lee O'Connor

University of California, Berkeley

devin_oconnor@berkeley.edu