Difference between revisions of "Expression Analysis Pipeline"

From CoGepedia
Jump to: navigation, search
Line 5: Line 5:
 
When a FASTQ file of sequence reads is loaded in [[LoadExperiment]] and associated with an annotated genome, the following analysis steps are performed:
 
When a FASTQ file of sequence reads is loaded in [[LoadExperiment]] and associated with an annotated genome, the following analysis steps are performed:
 
# The FASTQ file is verified for correct format.
 
# The FASTQ file is verified for correct format.
# [http://code.google.com/p/cutadapt/ CutAdapt] is run to trim adapter sequence from the reads (parameters:  -q 25 --quality-base=64  -m 17).
+
# [http://code.google.com/p/cutadapt/ CutAdapt] is run to trim adapter sequence from the reads (parameters:  -q 25 -m 17).
 
# [http://research-pub.gene.com/gmap/ GMAP] or [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml Bowtie2] is run to index the reference genome sequence, depending on your choice.
 
# [http://research-pub.gene.com/gmap/ GMAP] or [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml Bowtie2] is run to index the reference genome sequence, depending on your choice.
 
# [http://research-pub.gene.com/gmap/ GSNAP] or [http://tophat.cbcb.umd.edu/ TopHat] is run to align the reads to the reference sequence (GSNAP parameters: -n 5 --format=sam  -Q  --gmap-mode=none  --nofails, TopHat parameters: -g 1).
 
# [http://research-pub.gene.com/gmap/ GSNAP] or [http://tophat.cbcb.umd.edu/ TopHat] is run to align the reads to the reference sequence (GSNAP parameters: -n 5 --format=sam  -Q  --gmap-mode=none  --nofails, TopHat parameters: -g 1).
Line 16: Line 16:
 
TBD:  how to do this ...
 
TBD:  how to do this ...
  
 
+
Note:
 +
* Removed "--quality-base=64" as an argument for cutadapt from the original qteller pipeline
  
 
===Video Tutorial===
 
===Video Tutorial===

Revision as of 11:26, 7 May 2014

overview of various types of mapping of data from RNASeq
Comparing the results of data mapped with GSNAP versus Tophat/Bowtie2

CoGe can generate gene/transcript expression measurements given a FASTQ input and an annotated genome. Thanks to James Schnable, creator of qTeller, for help developing this pipeline!

When a FASTQ file of sequence reads is loaded in LoadExperiment and associated with an annotated genome, the following analysis steps are performed:

  1. The FASTQ file is verified for correct format.
  2. CutAdapt is run to trim adapter sequence from the reads (parameters: -q 25 -m 17).
  3. GMAP or Bowtie2 is run to index the reference genome sequence, depending on your choice.
  4. GSNAP or TopHat is run to align the reads to the reference sequence (GSNAP parameters: -n 5 --format=sam -Q --gmap-mode=none --nofails, TopHat parameters: -g 1).
  5. SAMtools is run to compute per-position read depth of the resulting alignment (mpileup -D -Q 20).
  6. Cufflinks is run to compte per-transcript FPKM (parameters: -p 24).
  7. The per-position read depth and per-transcript FPKM values are log transformed and normalized between 0 and 1 for loading.
  8. The three results (raw alignment, per-position read depth, and per-transcript FPKM) are loaded as separate Experiments into a Notebook.

Genomes for which this analysis has been performed can have features imported into qTeller. TBD: how to do this ...

Note:

  • Removed "--quality-base=64" as an argument for cutadapt from the original qteller pipeline

Video Tutorial