Expression Analysis Pipeline: Difference between revisions

Revision as of 18:15, 27 February 2014

CoGe can generate gene/transcript expression measurements given a FASTQ input and an annotated genome.

When a FASTQ file of sequence reads is loaded in LoadExperiment and associated with an annotated genome, the following analysis steps are performed:

The FASTQ file is verified for correct format.
CutAdapt is run to trim adapter sequence from the reads (parameters: -q 25 --quality-base=64 -m 17).
GMAP is run to index the reference genome sequence.
GSNAP is run to align the reads to the reference sequence (parameters: --nthreads=32 -n 5 --format=sam -Q --gmap-mode=none --nofails).
SAMtools is run to compute per-position read depth of the resulting alignment (mpileup -D -Q 20).
Cufflinks is run to compte per-transcript FPKM (parameters: -p 24).
The three results (raw alignment, per-position read depth, and per-transcript FPKM) are loaded as separate Experiments into a Notebook.

@@ Line 7: / Line 7: @@
 # [http://research-pub.gene.com/gmap/ GSNAP] is run to align the reads to the reference sequence (parameters: --nthreads=32  -n 5 --format=sam  -Q  --gmap-mode=none  --nofails).
 # [http://samtools.sourceforge.net/ SAMtools] is run to compute per-position read depth of the resulting alignment (mpileup -D  -Q 20).
-# [http://cufflinks.cbcb.umd.edu/ Cufflinks] is run to compte per-transcript FPKM.
+# [http://cufflinks.cbcb.umd.edu/ Cufflinks] is run to compte per-transcript FPKM (parameters: -p 24).
 # The three results (raw alignment, per-position read depth, and per-transcript FPKM) are loaded as separate [[Experiments]] into a [[Notebook]].