Difference between revisions of "Expression Analysis Pipeline"

Revision as of 11:34, 7 April 2014

CoGe can generate gene/transcript expression measurements given a FASTQ input and an annotated genome. Thanks to James Schnable, creator of qTeller, for help developing this pipeline!

When a FASTQ file of sequence reads is loaded in LoadExperiment and associated with an annotated genome, the following analysis steps are performed:

The FASTQ file is verified for correct format.
CutAdapt is run to trim adapter sequence from the reads (parameters: -q 25 --quality-base=64 -m 17).
GMAP is run to index the reference genome sequence.
GSNAP is run to align the reads to the reference sequence (parameters: -n 5 --format=sam -Q --gmap-mode=none --nofails).
SAMtools is run to compute per-position read depth of the resulting alignment (mpileup -D -Q 20).
Cufflinks is run to compte per-transcript FPKM (parameters: -p 24).
The per-position read depth and per-transcript FPKM values are log transformed and normalized between 0 and 1 for loading.
The three results (raw alignment, per-position read depth, and per-transcript FPKM) are loaded as separate Experiments into a Notebook.

Genomes for which this analysis has been performed can have features imported into qTeller. TBD: how to do this ...

Video Tutorial

Demo fastq file for Arabidopsis Col-0: http://de.iplantcollaborative.org/dl/d/2F807292-34CC-4C8E-96E3-3E668A304D23/test_rna_seq_data_0.17M_reads.fastq

@@ Line 16: / Line 16: @@
 [[File:Screen_Shot_2014-03-04_at_9.50.01_AM.png|thumb|center|300px]]
-Video:
+==Video Tutorial==
 {{#ev:youtube|3fNyHGB02dM}}
 *Demo fastq file for Arabidopsis Col-0: http://de.iplantcollaborative.org/dl/d/2F807292-34CC-4C8E-96E3-3E668A304D23/test_rna_seq_data_0.17M_reads.fastq

Difference between revisions of "Expression Analysis Pipeline"

Revision as of 11:34, 7 April 2014

Video Tutorial

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

CoGe links

Sites Linked to CoGe

Sites Linked from CoGe

Toolbox