Difference between revisions of "Creosote"

From CoGepedia
Jump to: navigation, search
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
Creosote genome sequencing and assembly notes:
+
==Twig2Genome Notes==
 +
[[Twig2Genome]]
  
*Sample obtained from front yard of 4951 W. McElroy Dr.
+
==Assembly==
*Sequences obtained from one lane of Illumina HiSeq2000
+
[[Creosote Assembly]]
*Fastq files delivered from UAGC
+
**82 files
+
**Headers are Sanger format (code 33)
+
***Description of Fastq file format with notes on specific decoding of header names generated by various technologies: http://en.wikipedia.org/wiki/FASTQ_format
+
**Pairend reads
+
***lane3_NoIndex_L003_R1_041.fastq
+
***lane3_NoIndex_L003_R2_041.fastq
+
**Need to get adapter sequences used in sequencing
+
***TGACCA (Not present in sequence reads)
+
*Check quality with fastqc: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
+
**[[Creosote First Run FastQC]]
+
  
'''Steps:'''
+
==Loading into CoGe==
*Merge R1 files; merge R2 files
+
SoapDeNovo assemly: 1,570,116 contigs (370MB): http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12183 ('''Note:'''  too many contigs to process and visualize in SynMap)
*gzip them
+
'''Trim sequences'''
+
*Get this package of Haibaos:  
+
** git clone git://github.com/tanghaibao/jcvi.git
+
** SET PATH: export PYTHONPATH=/lib/python (which is the dir above jcvi)
+
** may need to install biopython: sudo easy_install biopython
+
*Run this: python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz
+
**Automatically trims and cleans sequences, also does the conversion to appropriate fastq format
+
**NOTE: This program should download trimmomatic, but may need to update the path of the timmomatic program in the program
+
*If the Trimmer script fails for silly reasons, you can run it from the command-line:
+
java -Xmx4g -cp Trimmomatic-0.13/trimmomatic-0.13.jar org.usadellab.trimmomatic.TrimmomaticPE lane3_NoIndex_L003_R1_001.b64.fastq.gz lane3_NoIndex_L003_R2_001.b64.fastq.gz lane3_NoIndex_L003_R1_001.pairs.fastq.gz lane3_NoIndex_L003_R1_001.frags.fastq.gz lane3_NoIndex_L003_R2_001.pairs.fastq.gz lane3_NoIndex_L003_R2_001.frags.fastq.gz ILLUMINACLIP:adapters.fasta:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
+
'''Genome Assembly'''
+
*Note: Bao recommends CLC for genome assembly.  Runs faster, less memory, less sensitive to bad data.  Compute intensive.  THIS IS COMMERCIAL SOFTWARE
+
  
'''Running SOAPdenovo'''
+
SoapDeNovo assembly of contigs >= 2000nt: 6,976 contigs (20MB): http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12185
SOAPdenovo31mer all -s ../../soap.config.eric -o SoapAssem -K 25 -p 16 -R -d -D -F
+
**Note: if SOAP crashes, try another XXmer binary (e.g. 63mer)
+
'''Running Velvet'''
+
**Need to interleave reads:
+
~/src/velvet_1.1.04/shuffleSequences_fastq.pl lane3_NoIndex_L003_R1_001.pairs.fastq lane3_NoIndex_L003_R2_001.pairs.fastq merged_pairs.fastq
+
**set threading of velvet with env var
+
export OMP_NUM_THREADS=32
+
**running velveth
+
OMP_NUM_THREADS=32 velveth VelvetAssem 31 -shortPaired -fastq.gz merged_pairs.fastq.gz -short -fastq.gz lane3_NoIndex_L003_R1_001.frags.fastq.gz -short -fastq.gz lane3_NoIndex_L003_R2_001.frags.fastq.gz
+
OMP_NUM_THREADS=32 velvetg VelvetAssem -scaffolding yes -exp_cov auto -cov_cutoff auto -min_contig_lgth 200 -ins_length 150
+
  
 +
ABySS assembly (bpsize=64, 2kb minimum contig size): 122,972 contigs (392MB): http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12275
 +
*Syntenic Path Assembly to peach: http://genomevolution.org/r/3xmq
  
 +
Velvet assembly 515,190 contigs (241MB): http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12245
  
 +
CLC4 assembly 685,475 contigs (508MB): http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12244
 +
*Syntenic Path Assembly to peach: http://genomevolution.org/r/3xpd
  
 +
==[[Syntenic Path Assembly]]==
  
'''Other Stuff'''
+
[[File:Master 12185 8400.genomic-CDS.lastz.dag.go c20 D20 g10 A2.aligncoords.gcoords ct0.w1000.ass2.cs1.csoS.nsd.png|thumb|600px|left|Syntenic path assembly with SynMap of creosote (x-axis) and peach (y-axis).  Results may be regenerated at: http://genomevolution.org/r/3w95]]
'''Trimming reads'''
+
*Trim Paired ends with Trimmomatic: http://www.usadellab.org/cms/index.php?page=trimmomatic
+
*Assumes Illumina Encoding (code: 64, not code: 33)
+
**Need to convert for the HighSeq Reads:
+
** easy_install biopython
+
** git clone git://github.com/tanghaibao/jcvi.git
+
** export PYTHONPATH=/lib/python (which is the dir above jcvi)
+
** python -m jcvi.formats.fastq  (Install missing packages)
+
  
'''Cleaning Single Reads:'''
+
[[File:Master 11022 12185.CDS-genomic.lastz.dag.go c20 D20 g10 A2.aligncoords.gcoords ct0.w1000.ass2.cs1.csoS.nsd.png|thumb|600px|left|Syntenic path assembly with SynMap of creosote (y-axis) and Arabidopsis thaliana Col-0 (x-axis).  Results may be regenerated at: http://genomevolution.org/r/3w96]]
*Sequences cleaned using trimReads by Haibao Tang: https://github.com/tanghaibao/trimReads/tree/
+
**Ran with supplied adapter sequence file:
+
>Adapter 4
+
TGACCA
+
>Adapter 4 rc
+
TGGTCA
+
**Command-line run:
+
Running /home/elyons/bin/trimReads  -Q 33 -f /home/elyons/projects/genome/data/creosote/Sample_lane3/adapter/adapter.faa ./lane3_NoIndex_L003_R2_015.fastq
+
  
 +
==Pseudo-Assembly==
 +
[[File:Pseudo-assembly-creosote.png|thumb|600px|left|Pseudo-assembly of creosote (x-axis) using the peach (y-axis) genome.  Syntenic comparison to the peach genome.]]
  
'''Converting sequences'''
+
[[File:Screen Shot 2014-05-14 at 8.33.27 AM.png|thumb|600px|left|Microsynteny analysis of the pseudo-assembled creosote genome to the peach genome. Orange bars are unsequenced Ns that represent contigs glued together in creosote by the [[syntenic path assembly]] method.  Note the concordance of gene model coding sequences.]]
*python -m jcvi.formats.fastq convert  (read help file, default conversion Sanger (code 33) to Illumina (code 64)
+
 
+
'''Other programs to clean sequences'''
+
*python -m jcvi.apps.baseclean trim fastqfile (single ended)
+
*python -m jcvi.apps.baseclean trim R1.fastq.gz R2.fastq.gz (paired ended)
+
 
+
'''keep sequences in single files (or two files for a pair of reads)'''
+
*Cat all the R1s together
+
*Cat all the R2s together
+

Latest revision as of 07:35, 14 May 2014

Twig2Genome Notes

Twig2Genome

Assembly

Creosote Assembly

Loading into CoGe

SoapDeNovo assemly: 1,570,116 contigs (370MB): http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12183 (Note: too many contigs to process and visualize in SynMap)

SoapDeNovo assembly of contigs >= 2000nt: 6,976 contigs (20MB): http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12185

ABySS assembly (bpsize=64, 2kb minimum contig size): 122,972 contigs (392MB): http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12275

Velvet assembly 515,190 contigs (241MB): http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12245

CLC4 assembly 685,475 contigs (508MB): http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12244

Syntenic Path Assembly

Syntenic path assembly with SynMap of creosote (x-axis) and peach (y-axis). Results may be regenerated at: http://genomevolution.org/r/3w95
Syntenic path assembly with SynMap of creosote (y-axis) and Arabidopsis thaliana Col-0 (x-axis). Results may be regenerated at: http://genomevolution.org/r/3w96

Pseudo-Assembly

Pseudo-assembly of creosote (x-axis) using the peach (y-axis) genome. Syntenic comparison to the peach genome.
Microsynteny analysis of the pseudo-assembled creosote genome to the peach genome. Orange bars are unsequenced Ns that represent contigs glued together in creosote by the syntenic path assembly method. Note the concordance of gene model coding sequences.