Creosote: Difference between revisions

Revision as of 20:55, 4 August 2011

Creosote genome sequencing and assembly notes:

Sample obtained from front yard of 4951 W. McElroy Dr.
Sequences obtained from one lane of Illumina HiSeq2000
Fastq files delivered from UAGC
- 82 files
  - lane3_NoIndex_L003_R1_041.fastq
  - lane3_NoIndex_L003_R2_041.fastq
- Need to understand if these are paired-end reads
- Need to get adapter sequences used in sequencing
- Description of Fastq file format with notes on specific decoding of header names generated by various technologies: http://en.wikipedia.org/wiki/FASTQ_format
Check quality with fastqc: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
- Creosote First Run FastQC

Sequences cleaned using trimReads by Haibao Tang: https://github.com/tanghaibao/trimReads/tree/
- NOte: Only use on single reads
- Ran with supplied adapter sequence file:

>Adapter 4
TGACCA
>Adapter 4 rc
TGGTCA

- Command-line run:

Running /home/elyons/bin/trimReads  -Q 33 -f /home/elyons/projects/genome/data/creosote/Sample_lane3/adapter/adapter.faa ./lane3_NoIndex_L003_R2_015.fastq

- Output of trimReads:

Trim Paired ends with Trimmomatic: http://www.usadellab.org/cms/index.php?page=trimmomatic
Assumes Illumina Encoding (code: 64, not code: 33)
- Need to convert for the HighSeq Reads:
- easy_install biopython
- git clone git://github.com/tanghaibao/jcvi.git
- export PYTHONPATH=/lib/python (which is the dir above jcvi)
- python -m jcvi.formats.fastq (Install missing packages)

Steps:

Merge R1 files; merge R2 files
gzip them
Run this: python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz
- NOTE: This program should download trimmomatic, but may need to update the path of the timmomatic program in the program
Note: Bao recommends CLC for genome assembly. Runs faster, less memory, less sensitive to bad data. Compute intensive.

python -m jcvi.formats.fastq convert (read help file, default converstion
python -m jcvi.apps.baseclean trim fastqfile (single ended)
python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz (paired ended)

Cat all the R1s together
Cat all the R2s together

@@ Line 12: / Line 12: @@
 *Check quality with fastqc: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
 **[[Creosote First Run FastQC]]
 *Sequences cleaned using trimReads by Haibao Tang: https://github.com/tanghaibao/trimReads/tree/
+**NOte:  Only use on single reads
 **Ran with supplied adapter sequence file:
   >Adapter 4
   TGACCA
+ >Adapter 4 rc
+ TGGTCA
 **Command-line run:
-  /home/elyons/bin/trimReads  -Q 33 -f /home/elyons/projects/genome/data/creosote/src/adapters.fasta ./lane3_NoIndex_L003_R1_033.fastq
+  Running /home/elyons/bin/trimReads  -Q 33 -f /home/elyons/projects/genome/data/creosote/Sample_lane3/adapter/adapter.faa ./lane3_NoIndex_L003_R2_015.fastq
 **Output of trimReads:
- [0] Illumina_PE-1 found 54 times
- [1] Illumina_PE-2 found 3 times
- [2] Illumina_PE-1rc found 2850 times
- [3] Illumina_PE-2rc found 12 times
+*Trim Paired ends with Trimmomatic: http://www.usadellab.org/cms/index.php?page=trimmomatic
+*Assumes Illumina Encoding (code: 64, not code: 33)
-  A total of 92003 too short (trimmed length < 30) reads removed.
+**Need to convert for the HighSeq Reads:
-  A total of 949092 trimmed reads are written to `./lane3_NoIndex_L003_R2_041.trimmed.fastq`.
+** easy_install biopython
- Processed 1041095 sequences took 1557.84 seconds.
+** git clone git://github.com/tanghaibao/jcvi.git
-***Appears to not have the correct linkers as I would assume to see more removed
+** export PYTHONPATH=/lib/python (which is the dir above jcvi)
+** python -m jcvi.formats.fastq  (Install missing packages)
+Steps:
+*Merge R1 files; merge R2 files
+*gzip them
+*Run this: python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz
+**NOTE: This program should download trimmomatic, but may need to update the path of the timmomatic program in the program
+*Note:  Bao recommends CLC for genome assembly.  Runs faster, less memory, less sensitive to bad data.  Compute intensive.
+*python -m jcvi.formats.fastq convert  (read help file, default converstion
+*python -m jcvi.apps.baseclean trim fastqfile (single ended)
+*python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz (paired ended)
+*Cat all the R1s together
+*Cat all the R2s together

Creosote: Difference between revisions

Revision as of 20:55, 4 August 2011

Navigation menu

Search