Difference between revisions of "Expression Analysis Pipeline"

From CoGepedia
Jump to: navigation, search
Line 1: Line 1:
 +
[[File:Screen_Shot_2014-03-04_at_9.50.01_AM.png|thumb|300px|right|overview of various types of mapping of data from RNASeq]] [[File:Screen Shot 2014-04-07 at 2.27.54 PM.png|thumb|300px|Comparing the results of data mapped with GSNAP versus Tophat/Bowtie2]]
 +
 
CoGe can generate gene/transcript expression measurements given a FASTQ input and an annotated genome.  Thanks to [http://www.skraelingmountain.com/ James Schnable], creator of [http://qteller.com/ qTeller], for help developing this pipeline!
 
CoGe can generate gene/transcript expression measurements given a FASTQ input and an annotated genome.  Thanks to [http://www.skraelingmountain.com/ James Schnable], creator of [http://qteller.com/ qTeller], for help developing this pipeline!
  
Line 14: Line 16:
 
TBD:  how to do this ...
 
TBD:  how to do this ...
  
[[File:Screen_Shot_2014-03-04_at_9.50.01_AM.png|thumb|300px]] [[File:Screen Shot 2014-04-07 at 2.27.54 PM.png|thumb|300px]]
+
 
  
 
===Video Tutorial===
 
===Video Tutorial===

Revision as of 15:35, 7 April 2014

overview of various types of mapping of data from RNASeq
Comparing the results of data mapped with GSNAP versus Tophat/Bowtie2

CoGe can generate gene/transcript expression measurements given a FASTQ input and an annotated genome. Thanks to James Schnable, creator of qTeller, for help developing this pipeline!

When a FASTQ file of sequence reads is loaded in LoadExperiment and associated with an annotated genome, the following analysis steps are performed:

  1. The FASTQ file is verified for correct format.
  2. CutAdapt is run to trim adapter sequence from the reads (parameters: -q 25 --quality-base=64 -m 17).
  3. GMAP is run to index the reference genome sequence.
  4. GSNAP is run to align the reads to the reference sequence (parameters: -n 5 --format=sam -Q --gmap-mode=none --nofails).
  5. SAMtools is run to compute per-position read depth of the resulting alignment (mpileup -D -Q 20).
  6. Cufflinks is run to compte per-transcript FPKM (parameters: -p 24).
  7. The per-position read depth and per-transcript FPKM values are log transformed and normalized between 0 and 1 for loading.
  8. The three results (raw alignment, per-position read depth, and per-transcript FPKM) are loaded as separate Experiments into a Notebook.

Genomes for which this analysis has been performed can have features imported into qTeller. TBD: how to do this ...


Video Tutorial