Expression Analysis Pipeline: Difference between revisions

Revision as of 17:26, 7 May 2014

CoGe can generate gene/transcript expression measurements given a FASTQ input and an annotated genome. Thanks to James Schnable, creator of qTeller, for help developing this pipeline!

When a FASTQ file of sequence reads is loaded in LoadExperiment and associated with an annotated genome, the following analysis steps are performed:

The FASTQ file is verified for correct format.
CutAdapt is run to trim adapter sequence from the reads (parameters: -q 25 -m 17).
GMAP or Bowtie2 is run to index the reference genome sequence, depending on your choice.
GSNAP or TopHat is run to align the reads to the reference sequence (GSNAP parameters: -n 5 --format=sam -Q --gmap-mode=none --nofails, TopHat parameters: -g 1).
SAMtools is run to compute per-position read depth of the resulting alignment (mpileup -D -Q 20).
Cufflinks is run to compte per-transcript FPKM (parameters: -p 24).
The per-position read depth and per-transcript FPKM values are log transformed and normalized between 0 and 1 for loading.
The three results (raw alignment, per-position read depth, and per-transcript FPKM) are loaded as separate Experiments into a Notebook.

Genomes for which this analysis has been performed can have features imported into qTeller. TBD: how to do this ...

Note:

Removed "--quality-base=64" as an argument for cutadapt from the original qteller pipeline

Video Tutorial

Demo fastq data for Arabidopsis Col-0:

@@ Line 5: / Line 5: @@
 When a FASTQ file of sequence reads is loaded in [[LoadExperiment]] and associated with an annotated genome, the following analysis steps are performed:
 # The FASTQ file is verified for correct format.
-# [http://code.google.com/p/cutadapt/ CutAdapt] is run to trim adapter sequence from the reads (parameters:  -q 25 --quality-base=64  -m 17).
+# [http://code.google.com/p/cutadapt/ CutAdapt] is run to trim adapter sequence from the reads (parameters:  -q 25 -m 17).
 # [http://research-pub.gene.com/gmap/ GMAP] or [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml Bowtie2] is run to index the reference genome sequence, depending on your choice.
 # [http://research-pub.gene.com/gmap/ GSNAP] or [http://tophat.cbcb.umd.edu/ TopHat] is run to align the reads to the reference sequence (GSNAP parameters: -n 5 --format=sam  -Q  --gmap-mode=none  --nofails, TopHat parameters: -g 1).
@@ Line 16: / Line 16: @@
 TBD:  how to do this ...
+Note:
+* Removed "--quality-base=64" as an argument for cutadapt from the original qteller pipeline
 ===Video Tutorial===

Expression Analysis Pipeline: Difference between revisions

Revision as of 17:26, 7 May 2014

Video Tutorial

Navigation menu

Search