Difference between revisions of "Creosote"
From CoGepedia
(elyons@icoge (~/projects/genome/data/creosote/Sample_lane3) $ python -m jcvi.apps.baseclean trim lane3_NoIndex_L003_R1_001.fastq lane3_NoIndex_L003_R2_001.fastq 14:00:38 [base::DEBUG] wget http://www.) |
|||
Line 12: | Line 12: | ||
*Check quality with fastqc: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ | *Check quality with fastqc: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ | ||
**[[Creosote First Run FastQC]] | **[[Creosote First Run FastQC]] | ||
+ | |||
*Sequences cleaned using trimReads by Haibao Tang: https://github.com/tanghaibao/trimReads/tree/ | *Sequences cleaned using trimReads by Haibao Tang: https://github.com/tanghaibao/trimReads/tree/ | ||
+ | **NOte: Only use on single reads | ||
**Ran with supplied adapter sequence file: | **Ran with supplied adapter sequence file: | ||
>Adapter 4 | >Adapter 4 | ||
TGACCA | TGACCA | ||
+ | >Adapter 4 rc | ||
+ | TGGTCA | ||
**Command-line run: | **Command-line run: | ||
− | /home/elyons/bin/trimReads -Q 33 -f /home/elyons/projects/genome/data/creosote/ | + | Running /home/elyons/bin/trimReads -Q 33 -f /home/elyons/projects/genome/data/creosote/Sample_lane3/adapter/adapter.faa ./lane3_NoIndex_L003_R2_015.fastq |
**Output of trimReads: | **Output of trimReads: | ||
− | + | ||
− | + | ||
− | + | ||
− | + | *Trim Paired ends with Trimmomatic: http://www.usadellab.org/cms/index.php?page=trimmomatic | |
− | + | *Assumes Illumina Encoding (code: 64, not code: 33) | |
− | + | **Need to convert for the HighSeq Reads: | |
− | + | ** easy_install biopython | |
− | + | ** git clone git://github.com/tanghaibao/jcvi.git | |
− | ** | + | ** export PYTHONPATH=/lib/python (which is the dir above jcvi) |
+ | ** python -m jcvi.formats.fastq (Install missing packages) | ||
+ | |||
+ | Steps: | ||
+ | *Merge R1 files; merge R2 files | ||
+ | *gzip them | ||
+ | *Run this: python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz | ||
+ | **NOTE: This program should download trimmomatic, but may need to update the path of the timmomatic program in the program | ||
+ | *Note: Bao recommends CLC for genome assembly. Runs faster, less memory, less sensitive to bad data. Compute intensive. | ||
+ | |||
+ | |||
+ | *python -m jcvi.formats.fastq convert (read help file, default converstion | ||
+ | *python -m jcvi.apps.baseclean trim fastqfile (single ended) | ||
+ | *python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz (paired ended) | ||
+ | |||
+ | *Cat all the R1s together | ||
+ | *Cat all the R2s together |
Revision as of 13:55, 4 August 2011
Creosote genome sequencing and assembly notes:
- Sample obtained from front yard of 4951 W. McElroy Dr.
- Sequences obtained from one lane of Illumina HiSeq2000
- Fastq files delivered from UAGC
- 82 files
- lane3_NoIndex_L003_R1_041.fastq
- lane3_NoIndex_L003_R2_041.fastq
- Need to understand if these are paired-end reads
- Need to get adapter sequences used in sequencing
- Description of Fastq file format with notes on specific decoding of header names generated by various technologies: http://en.wikipedia.org/wiki/FASTQ_format
- 82 files
- Check quality with fastqc: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
- Sequences cleaned using trimReads by Haibao Tang: https://github.com/tanghaibao/trimReads/tree/
- NOte: Only use on single reads
- Ran with supplied adapter sequence file:
>Adapter 4 TGACCA >Adapter 4 rc TGGTCA
- Command-line run:
Running /home/elyons/bin/trimReads -Q 33 -f /home/elyons/projects/genome/data/creosote/Sample_lane3/adapter/adapter.faa ./lane3_NoIndex_L003_R2_015.fastq
- Output of trimReads:
- Trim Paired ends with Trimmomatic: http://www.usadellab.org/cms/index.php?page=trimmomatic
- Assumes Illumina Encoding (code: 64, not code: 33)
- Need to convert for the HighSeq Reads:
- easy_install biopython
- git clone git://github.com/tanghaibao/jcvi.git
- export PYTHONPATH=/lib/python (which is the dir above jcvi)
- python -m jcvi.formats.fastq (Install missing packages)
Steps:
- Merge R1 files; merge R2 files
- gzip them
- Run this: python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz
- NOTE: This program should download trimmomatic, but may need to update the path of the timmomatic program in the program
- Note: Bao recommends CLC for genome assembly. Runs faster, less memory, less sensitive to bad data. Compute intensive.
- python -m jcvi.formats.fastq convert (read help file, default converstion
- python -m jcvi.apps.baseclean trim fastqfile (single ended)
- python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz (paired ended)
- Cat all the R1s together
- Cat all the R2s together