Difference between revisions of "Creosote"

From CoGepedia
Jump to: navigation, search
Line 5: Line 5:
 
*Fastq files delivered from UAGC
 
*Fastq files delivered from UAGC
 
**82 files
 
**82 files
 +
**Headers are Sanger format (code 33)
 +
***Description of Fastq file format with notes on specific decoding of header names generated by various technologies: http://en.wikipedia.org/wiki/FASTQ_format
 +
**Pairend reads
 
***lane3_NoIndex_L003_R1_041.fastq
 
***lane3_NoIndex_L003_R1_041.fastq
 
***lane3_NoIndex_L003_R2_041.fastq
 
***lane3_NoIndex_L003_R2_041.fastq
**Need to understand if these are paired-end reads
 
 
**Need to get adapter sequences used in sequencing
 
**Need to get adapter sequences used in sequencing
**Description of Fastq file format with notes on specific decoding of header names generated by various technologies: http://en.wikipedia.org/wiki/FASTQ_format
+
***TGACCA (Not present in sequence reads)
 
*Check quality with fastqc: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
 
*Check quality with fastqc: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
 
**[[Creosote First Run FastQC]]
 
**[[Creosote First Run FastQC]]
 
+
'''Trimming reads'''
*Sequences cleaned using trimReads by Haibao Tang: https://github.com/tanghaibao/trimReads/tree/
+
**NOte:  Only use on single reads
+
**Ran with supplied adapter sequence file:
+
>Adapter 4
+
TGACCA
+
>Adapter 4 rc
+
TGGTCA
+
**Command-line run:
+
Running /home/elyons/bin/trimReads  -Q 33 -f /home/elyons/projects/genome/data/creosote/Sample_lane3/adapter/adapter.faa ./lane3_NoIndex_L003_R2_015.fastq
+
**Output of trimReads:
+
 
+
 
+
 
+
''New Notes for processing"
+
 
+
 
*Trim Paired ends with Trimmomatic: http://www.usadellab.org/cms/index.php?page=trimmomatic
 
*Trim Paired ends with Trimmomatic: http://www.usadellab.org/cms/index.php?page=trimmomatic
 
*Assumes Illumina Encoding (code: 64, not code: 33)
 
*Assumes Illumina Encoding (code: 64, not code: 33)
Line 39: Line 26:
 
*Merge R1 files; merge R2 files
 
*Merge R1 files; merge R2 files
 
*gzip them
 
*gzip them
 +
'''Trim sequences'''
 
*Run this: python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz
 
*Run this: python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz
 +
**Automatically trims and cleans sequences, also does the conversion to appropriate fastq format
 
**NOTE: This program should download trimmomatic, but may need to update the path of the timmomatic program in the program
 
**NOTE: This program should download trimmomatic, but may need to update the path of the timmomatic program in the program
*Note:  Bao recommends CLC for genome assembly.  Runs faster, less memory, less sensitive to bad data.  Compute intensive.
 
 
*If the Trimmer script fails for silly reasons, you can run it from the command-line:
 
*If the Trimmer script fails for silly reasons, you can run it from the command-line:
 
  java -Xmx4g -cp Trimmomatic-0.13/trimmomatic-0.13.jar org.usadellab.trimmomatic.TrimmomaticPE lane3_NoIndex_L003_R1_001.b64.fastq.gz lane3_NoIndex_L003_R2_001.b64.fastq.gz lane3_NoIndex_L003_R1_001.pairs.fastq.gz lane3_NoIndex_L003_R1_001.frags.fastq.gz lane3_NoIndex_L003_R2_001.pairs.fastq.gz lane3_NoIndex_L003_R2_001.frags.fastq.gz ILLUMINACLIP:adapters.fasta:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
 
  java -Xmx4g -cp Trimmomatic-0.13/trimmomatic-0.13.jar org.usadellab.trimmomatic.TrimmomaticPE lane3_NoIndex_L003_R1_001.b64.fastq.gz lane3_NoIndex_L003_R2_001.b64.fastq.gz lane3_NoIndex_L003_R1_001.pairs.fastq.gz lane3_NoIndex_L003_R1_001.frags.fastq.gz lane3_NoIndex_L003_R2_001.pairs.fastq.gz lane3_NoIndex_L003_R2_001.frags.fastq.gz ILLUMINACLIP:adapters.fasta:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
*Running SOAPdenovo:
+
'''Genome Assembly'''
 +
*Note:  Bao recommends CLC for genome assembly.  Runs faster, less memory, less sensitive to bad data.  Compute intensive.  THIS IS COMMERCIAL SOFTWARE
 +
 
 +
'''Running SOAPdenovo'''
 
  SOAPdenovo31mer all -s ../../soap.config.eric -o SoapAssem -K 25 -p 16 -R -d -D -F
 
  SOAPdenovo31mer all -s ../../soap.config.eric -o SoapAssem -K 25 -p 16 -R -d -D -F
 
**Note: if SOAP crashes, try another XXmer binary (e.g. 63mer)
 
**Note: if SOAP crashes, try another XXmer binary (e.g. 63mer)
 
 
'''Running Velvet'''
 
'''Running Velvet'''
 
**Need to interleave reads:
 
**Need to interleave reads:
Line 62: Line 52:
  
 
'''Other Stuff'''
 
'''Other Stuff'''
*python -m jcvi.formats.fastq convert  (read help file, default converstion
+
 
 +
Cleaning Single Reads:
 +
*Sequences cleaned using trimReads by Haibao Tang: https://github.com/tanghaibao/trimReads/tree/
 +
**Ran with supplied adapter sequence file:
 +
>Adapter 4
 +
TGACCA
 +
>Adapter 4 rc
 +
TGGTCA
 +
**Command-line run:
 +
Running /home/elyons/bin/trimReads  -Q 33 -f /home/elyons/projects/genome/data/creosote/Sample_lane3/adapter/adapter.faa ./lane3_NoIndex_L003_R2_015.fastq
 +
 
 +
 
 +
'''Converting sequences'''
 +
*python -m jcvi.formats.fastq convert  (read help file, default conversion Sanger (code 33) to Illumina (code 64)
 +
 
 +
'''Other programs to clean sequences'''
 
*python -m jcvi.apps.baseclean trim fastqfile (single ended)
 
*python -m jcvi.apps.baseclean trim fastqfile (single ended)
 
*python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz (paired ended)
 
*python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz (paired ended)
  
 +
'''keep sequences in single files (or two files for a pair of reads)'''
 
*Cat all the R1s together
 
*Cat all the R1s together
 
*Cat all the R2s together
 
*Cat all the R2s together

Revision as of 17:00, 4 August 2011

Creosote genome sequencing and assembly notes:

  • Sample obtained from front yard of 4951 W. McElroy Dr.
  • Sequences obtained from one lane of Illumina HiSeq2000
  • Fastq files delivered from UAGC
    • 82 files
    • Headers are Sanger format (code 33)
    • Pairend reads
      • lane3_NoIndex_L003_R1_041.fastq
      • lane3_NoIndex_L003_R2_041.fastq
    • Need to get adapter sequences used in sequencing
      • TGACCA (Not present in sequence reads)
  • Check quality with fastqc: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

Trimming reads

Steps:

  • Merge R1 files; merge R2 files
  • gzip them

Trim sequences

  • Run this: python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz
    • Automatically trims and cleans sequences, also does the conversion to appropriate fastq format
    • NOTE: This program should download trimmomatic, but may need to update the path of the timmomatic program in the program
  • If the Trimmer script fails for silly reasons, you can run it from the command-line:
java -Xmx4g -cp Trimmomatic-0.13/trimmomatic-0.13.jar org.usadellab.trimmomatic.TrimmomaticPE lane3_NoIndex_L003_R1_001.b64.fastq.gz lane3_NoIndex_L003_R2_001.b64.fastq.gz lane3_NoIndex_L003_R1_001.pairs.fastq.gz lane3_NoIndex_L003_R1_001.frags.fastq.gz lane3_NoIndex_L003_R2_001.pairs.fastq.gz lane3_NoIndex_L003_R2_001.frags.fastq.gz ILLUMINACLIP:adapters.fasta:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Genome Assembly

  • Note: Bao recommends CLC for genome assembly. Runs faster, less memory, less sensitive to bad data. Compute intensive. THIS IS COMMERCIAL SOFTWARE

Running SOAPdenovo

SOAPdenovo31mer all -s ../../soap.config.eric -o SoapAssem -K 25 -p 16 -R -d -D -F
    • Note: if SOAP crashes, try another XXmer binary (e.g. 63mer)

Running Velvet

    • Need to interleave reads:
~/src/velvet_1.1.04/shuffleSequences_fastq.pl lane3_NoIndex_L003_R1_001.pairs.fastq lane3_NoIndex_L003_R2_001.pairs.fastq merged_pairs.fastq
    • set threading of velvet with env var
export OMP_NUM_THREADS=32
    • running velveth
OMP_NUM_THREADS=32 velveth VelvetAssem 31 -shortPaired -fastq.gz merged_pairs.fastq.gz -short -fastq.gz lane3_NoIndex_L003_R1_001.frags.fastq.gz -short -fastq.gz lane3_NoIndex_L003_R2_001.frags.fastq.gz 
OMP_NUM_THREADS=32 velvetg VelvetAssem -scaffolding yes -exp_cov auto -cov_cutoff auto -min_contig_lgth 200 -ins_length 150



Other Stuff

Cleaning Single Reads:

>Adapter 4
TGACCA
>Adapter 4 rc
TGGTCA
    • Command-line run:
Running /home/elyons/bin/trimReads  -Q 33 -f /home/elyons/projects/genome/data/creosote/Sample_lane3/adapter/adapter.faa ./lane3_NoIndex_L003_R2_015.fastq


Converting sequences

  • python -m jcvi.formats.fastq convert (read help file, default conversion Sanger (code 33) to Illumina (code 64)

Other programs to clean sequences

  • python -m jcvi.apps.baseclean trim fastqfile (single ended)
  • python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz (paired ended)

keep sequences in single files (or two files for a pair of reads)

  • Cat all the R1s together
  • Cat all the R2s together