|
|
Line 1: |
Line 1: |
− | Creosote genome sequencing and assembly notes:
| + | ==Assembly== |
− | | + | [[Creosote Assembly]] |
− | *Sample obtained from front yard of 4951 W. McElroy Dr.
| + | |
− | *Sequences obtained from one lane of Illumina HiSeq2000
| + | |
− | *Fastq files delivered from UAGC
| + | |
− | **82 files
| + | |
− | **Headers are Sanger format (code 33)
| + | |
− | ***Description of Fastq file format with notes on specific decoding of header names generated by various technologies: http://en.wikipedia.org/wiki/FASTQ_format
| + | |
− | **Pairend reads
| + | |
− | ***lane3_NoIndex_L003_R1_041.fastq
| + | |
− | ***lane3_NoIndex_L003_R2_041.fastq
| + | |
− | **Need to get adapter sequences used in sequencing
| + | |
− | ***TGACCA (Not present in sequence reads)
| + | |
− | *Check quality with fastqc: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
| + | |
− | **[[Creosote First Run FastQC]]
| + | |
− | | + | |
− | '''Steps:'''
| + | |
− | *Merge R1 files; merge R2 files
| + | |
− | *gzip them
| + | |
− | '''Trim sequences'''
| + | |
− | *Get this package of Haibaos:
| + | |
− | ** git clone git://github.com/tanghaibao/jcvi.git
| + | |
− | ** SET PATH: export PYTHONPATH=/lib/python (which is the dir above jcvi)
| + | |
− | ** may need to install biopython: sudo easy_install biopython
| + | |
− | *Run this: python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz
| + | |
− | **Automatically trims and cleans sequences, also does the conversion to appropriate fastq format
| + | |
− | **NOTE: This program should download trimmomatic, but may need to update the path of the timmomatic program in the program
| + | |
− | *If the Trimmer script fails for silly reasons, you can run it from the command-line:
| + | |
− | java -Xmx4g -cp Trimmomatic-0.13/trimmomatic-0.13.jar org.usadellab.trimmomatic.TrimmomaticPE lane3_NoIndex_L003_R1_001.b64.fastq.gz lane3_NoIndex_L003_R2_001.b64.fastq.gz lane3_NoIndex_L003_R1_001.pairs.fastq.gz lane3_NoIndex_L003_R1_001.frags.fastq.gz lane3_NoIndex_L003_R2_001.pairs.fastq.gz lane3_NoIndex_L003_R2_001.frags.fastq.gz ILLUMINACLIP:adapters.fasta:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
| + | |
− | '''Genome Assembly'''
| + | |
− | *Note: Bao recommends CLC for genome assembly. Runs faster, less memory, less sensitive to bad data. Compute intensive. THIS IS COMMERCIAL SOFTWARE
| + | |
− | | + | |
− | '''Running SOAPdenovo'''
| + | |
− | SOAPdenovo31mer all -s ../../soap.config.eric -o SoapAssem -K 25 -p 16 -R -d -D -F
| + | |
− | **Note: if SOAP crashes, try another XXmer binary (e.g. 63mer)
| + | |
− | '''Running Velvet'''
| + | |
− | **Need to interleave reads:
| + | |
− | ~/src/velvet_1.1.04/shuffleSequences_fastq.pl lane3_NoIndex_L003_R1_001.pairs.fastq lane3_NoIndex_L003_R2_001.pairs.fastq merged_pairs.fastq
| + | |
− | **set threading of velvet with env var
| + | |
− | export OMP_NUM_THREADS=32
| + | |
− | **running velveth
| + | |
− | OMP_NUM_THREADS=32 velveth VelvetAssem 31 -shortPaired -fastq.gz merged_pairs.fastq.gz -short -fastq.gz lane3_NoIndex_L003_R1_001.frags.fastq.gz -short -fastq.gz lane3_NoIndex_L003_R2_001.frags.fastq.gz
| + | |
− | OMP_NUM_THREADS=32 velvetg VelvetAssem -scaffolding yes -exp_cov auto -cov_cutoff auto -min_contig_lgth 200 -ins_length 150
| + | |
− | | + | |
− | | + | |
− | | + | |
− | | + | |
− | | + | |
− | '''Other Stuff'''
| + | |
− | '''Trimming reads'''
| + | |
− | *Trim Paired ends with Trimmomatic: http://www.usadellab.org/cms/index.php?page=trimmomatic
| + | |
− | *Assumes Illumina Encoding (code: 64, not code: 33)
| + | |
− | **Need to convert for the HighSeq Reads:
| + | |
− | ** easy_install biopython
| + | |
− | ** git clone git://github.com/tanghaibao/jcvi.git
| + | |
− | ** export PYTHONPATH=/lib/python (which is the dir above jcvi)
| + | |
− | ** python -m jcvi.formats.fastq (Install missing packages)
| + | |
− | | + | |
− | '''Cleaning Single Reads:'''
| + | |
− | *Sequences cleaned using trimReads by Haibao Tang: https://github.com/tanghaibao/trimReads/tree/
| + | |
− | **Ran with supplied adapter sequence file:
| + | |
− | >Adapter 4
| + | |
− | TGACCA
| + | |
− | >Adapter 4 rc
| + | |
− | TGGTCA
| + | |
− | **Command-line run:
| + | |
− | Running /home/elyons/bin/trimReads -Q 33 -f /home/elyons/projects/genome/data/creosote/Sample_lane3/adapter/adapter.faa ./lane3_NoIndex_L003_R2_015.fastq
| + | |
− | | + | |
− | | + | |
− | '''Converting sequences'''
| + | |
− | *python -m jcvi.formats.fastq convert (read help file, default conversion Sanger (code 33) to Illumina (code 64)
| + | |
− | | + | |
− | '''Other programs to clean sequences'''
| + | |
− | *python -m jcvi.apps.baseclean trim fastqfile (single ended)
| + | |
− | *python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz (paired ended)
| + | |
− | | + | |
− | '''keep sequences in single files (or two files for a pair of reads)'''
| + | |
− | *Cat all the R1s together
| + | |
− | *Cat all the R2s together
| + | |