Processing RNA seq data

From CoGepedia
Revision as of 12:12, 12 February 2014 by Jschnable (Talk | contribs) (Quality Trimming)

Jump to: navigation, search

CoGe's RNA-seq processing pipeline is highly automated version of the qTeller RNA-seq processing pipeline.

For detailed instructions on how to use these same tools manually see this PDF. Automated python scripts for batch processing large numbers of datasets are available here.

Quality Trimming

When using RNA-seq data to quantify expression (rather than calling SNPs or transcriptome assembly), quality trimming should be regarded as an optional step. However, it can substantially reduce total run-time by avoiding requiring aligners to spend significant amounts of processor time searching for valid alignment sites for low quality reads.

The version of the qTeller pipeline employed by CoGe uses "cutadapt" for quality trimming. The default settings for running cutadapt within the pipeline treats as low quality positions in a read with quality scores less than 25 (those with an approximately 1 in 300 chance of being a mis-call). Reads which are less than 17 base pairs long after quality trimming are discarded as that is the minimum length required by the software used to perform alignments.

Alignment

Format Conversion and Sorting

Expression Quantification