Creosote: Difference between revisions

← Older edit Newer edit →

Revision as of 16:45, 7 August 2011

Assembly

Creosote Assembly

@@ Line 1: / Line 1: @@
-Creosote genome sequencing and assembly notes:
+==Assembly==
+[[Creosote Assembly]]
-*Sample obtained from front yard of 4951 W. McElroy Dr.
-*Sequences obtained from one lane of Illumina HiSeq2000
-*Fastq files delivered from UAGC
-**82 files
-**Headers are Sanger format (code 33)
-***Description of Fastq file format with notes on specific decoding of header names generated by various technologies: http://en.wikipedia.org/wiki/FASTQ_format
-**Pairend reads
-***lane3_NoIndex_L003_R1_041.fastq
-***lane3_NoIndex_L003_R2_041.fastq
-**Need to get adapter sequences used in sequencing
-***TGACCA (Not present in sequence reads)
-*Check quality with fastqc: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
-**[[Creosote First Run FastQC]]
-'''Steps:'''
-*Merge R1 files; merge R2 files
-*gzip them
-'''Trim sequences'''
-*Get this package of Haibaos:
-** git clone git://github.com/tanghaibao/jcvi.git
-** SET PATH: export PYTHONPATH=/lib/python (which is the dir above jcvi)
-** may need to install biopython: sudo easy_install biopython
-*Run this: python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz
-**Automatically trims and cleans sequences, also does the conversion to appropriate fastq format
-**NOTE: This program should download trimmomatic, but may need to update the path of the timmomatic program in the program
-*If the Trimmer script fails for silly reasons, you can run it from the command-line:
- java -Xmx4g -cp Trimmomatic-0.13/trimmomatic-0.13.jar org.usadellab.trimmomatic.TrimmomaticPE lane3_NoIndex_L003_R1_001.b64.fastq.gz lane3_NoIndex_L003_R2_001.b64.fastq.gz lane3_NoIndex_L003_R1_001.pairs.fastq.gz lane3_NoIndex_L003_R1_001.frags.fastq.gz lane3_NoIndex_L003_R2_001.pairs.fastq.gz lane3_NoIndex_L003_R2_001.frags.fastq.gz ILLUMINACLIP:adapters.fasta:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
-'''Genome Assembly'''
-*Note:  Bao recommends CLC for genome assembly.  Runs faster, less memory, less sensitive to bad data.  Compute intensive.  THIS IS COMMERCIAL SOFTWARE
-'''Running SOAPdenovo'''
- SOAPdenovo31mer all -s ../../soap.config.eric -o SoapAssem -K 25 -p 16 -R -d -D -F
-**Note: if SOAP crashes, try another XXmer binary (e.g. 63mer)
-'''Running Velvet'''
-**Need to interleave reads:
- ~/src/velvet_1.1.04/shuffleSequences_fastq.pl lane3_NoIndex_L003_R1_001.pairs.fastq lane3_NoIndex_L003_R2_001.pairs.fastq merged_pairs.fastq
-**set threading of velvet with env var
- export OMP_NUM_THREADS=32
-**running velveth
- OMP_NUM_THREADS=32 velveth VelvetAssem 31 -shortPaired -fastq.gz merged_pairs.fastq.gz -short -fastq.gz lane3_NoIndex_L003_R1_001.frags.fastq.gz -short -fastq.gz lane3_NoIndex_L003_R2_001.frags.fastq.gz
- OMP_NUM_THREADS=32 velvetg VelvetAssem -scaffolding yes -exp_cov auto -cov_cutoff auto -min_contig_lgth 200 -ins_length 150
-'''Other Stuff'''
-'''Trimming reads'''
-*Trim Paired ends with Trimmomatic: http://www.usadellab.org/cms/index.php?page=trimmomatic
-*Assumes Illumina Encoding (code: 64, not code: 33)
-**Need to convert for the HighSeq Reads:
-** easy_install biopython
-** git clone git://github.com/tanghaibao/jcvi.git
-** export PYTHONPATH=/lib/python (which is the dir above jcvi)
-** python -m jcvi.formats.fastq  (Install missing packages)
-'''Cleaning Single Reads:'''
-*Sequences cleaned using trimReads by Haibao Tang: https://github.com/tanghaibao/trimReads/tree/
-**Ran with supplied adapter sequence file:
- >Adapter 4
- TGACCA
- >Adapter 4 rc
- TGGTCA
-**Command-line run:
- Running /home/elyons/bin/trimReads  -Q 33 -f /home/elyons/projects/genome/data/creosote/Sample_lane3/adapter/adapter.faa ./lane3_NoIndex_L003_R2_015.fastq
-'''Converting sequences'''
-*python -m jcvi.formats.fastq convert  (read help file, default conversion Sanger (code 33) to Illumina (code 64)
-'''Other programs to clean sequences'''
-*python -m jcvi.apps.baseclean trim fastqfile (single ended)
-*python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz (paired ended)
-'''keep sequences in single files (or two files for a pair of reads)'''
-*Cat all the R1s together
-*Cat all the R2s together

Creosote: Difference between revisions

Revision as of 16:45, 7 August 2011

Assembly

Navigation menu

Search