Difference between revisions of "K12 assembly"

From CoGepedia
Jump to: navigation, search
(Created page with 'Cheap sequencing brings cheap genomes! However, depending on the technology used to sequencing a genome, you will find that even with 50-100x sequencing coverage, a genome canno...')
 
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
Cheap sequencing brings cheap genomes!  However, depending on the technology used to sequencing a genome, you will find that even with 50-100x sequencing coverage, a genome cannot be completely assembled. Due to repeat regions in the genome that are larger than the reads generated by the sequencing technology, pieces of the genome cannot be ordered relative to one another unless bac-ends are used to anchor these pieces relative to one another. What can CoGe do to help?
+
[[Image:Master_6871_4242.genomic-CDS.blastn_geneorder_D100_g50_A5.w2000.png|thumb|600px|right|(1) Example syntenic dotplot comparison of a de novo assembled E. coli genome (x-axis) to a complete reference genome (y-axis).]]
 +
[[Image:Master_6871_4242.genomic-CDS.blastn_geneorder_D100_g50_A5.w2000.ass.png|thumb|600px|right|(2) Example syntenic dotplot comparison of a de novo assembled E. coli genome (x-axis) to a complete reference genome (y-axis) using SynMap's algorithm to order the de novo assembled contigs according to their best syntenic path along the reference genome.]]
  
Fortunately, when it comes to many microbial genomes, there are already many complete genomes that can be used as a reference.  [[SynMap]] is the perfect tool for comparing genomes.
+
Cheap sequencing brings cheap genomes!  However, depending on the technology used to sequencing a genome, you will find that even with 50-100x sequencing coverage, a genome cannot be completely assembled.  Due to repeat regions in the genome that are larger than the reads generated by the sequencing technology, pieces of the genome cannot be ordered relative to one another unless [[bac-ends]] are used to anchor these pieces relative to one another.  What can [[CoGe]] do to help?
 +
 
 +
Fortunately, when it comes to many microbial genomes, there are already many complete genomes that can be used as a reference.  [[SynMap]] is the perfect tool for comparing genomes.  On the right of this page you will see two [[syntenic dotplot]] comparisons.  Figure 1 shows a syntenic comparison using the default display of SynMap.  Contigs are ordered from largest to smallest starting in the lower left corner.  Each contig shows extensive synteny to the reference genome.  Figure 2 show the results of SynMap's option to turn on an algorithm that will order contigs based on their syntenic path through the reference genome (including reverse complementing those contigs that need to be flipped).  When this option is turned on, a link will appear in the results that will allow you to download the sequence of the assembled genome ("Generate Assembled Genomic Sequence") with 100 Ns inserted between assembled contigs.
 +
 
 +
While some genome assembly tools have the option of assembling reads to a reference genome, this can sometimes lead to missing data where there has been an insertion in the sequenced genome and not in the reference genome.  For bacterial genomes, these insertions can be due to phages or IS elements.  If you are interested in recording these types of genomic changes using sequencing technology consisting of small sequencing reads, it is best to do a de novo assembly to contigs, then a syntenic path assembly to a reference genome.

Latest revision as of 19:39, 14 October 2009

(1) Example syntenic dotplot comparison of a de novo assembled E. coli genome (x-axis) to a complete reference genome (y-axis).
(2) Example syntenic dotplot comparison of a de novo assembled E. coli genome (x-axis) to a complete reference genome (y-axis) using SynMap's algorithm to order the de novo assembled contigs according to their best syntenic path along the reference genome.

Cheap sequencing brings cheap genomes! However, depending on the technology used to sequencing a genome, you will find that even with 50-100x sequencing coverage, a genome cannot be completely assembled. Due to repeat regions in the genome that are larger than the reads generated by the sequencing technology, pieces of the genome cannot be ordered relative to one another unless bac-ends are used to anchor these pieces relative to one another. What can CoGe do to help?

Fortunately, when it comes to many microbial genomes, there are already many complete genomes that can be used as a reference. SynMap is the perfect tool for comparing genomes. On the right of this page you will see two syntenic dotplot comparisons. Figure 1 shows a syntenic comparison using the default display of SynMap. Contigs are ordered from largest to smallest starting in the lower left corner. Each contig shows extensive synteny to the reference genome. Figure 2 show the results of SynMap's option to turn on an algorithm that will order contigs based on their syntenic path through the reference genome (including reverse complementing those contigs that need to be flipped). When this option is turned on, a link will appear in the results that will allow you to download the sequence of the assembled genome ("Generate Assembled Genomic Sequence") with 100 Ns inserted between assembled contigs.

While some genome assembly tools have the option of assembling reads to a reference genome, this can sometimes lead to missing data where there has been an insertion in the sequenced genome and not in the reference genome. For bacterial genomes, these insertions can be due to phages or IS elements. If you are interested in recording these types of genomic changes using sequencing technology consisting of small sequencing reads, it is best to do a de novo assembly to contigs, then a syntenic path assembly to a reference genome.