PolyMFind: Difference between revisions

Revision as of 22:58, 23 June 2011

Overview

PolyMFind is a program that scans through a multiple genome alignment to identify and classify all polymorphisms. The multiple sequence alignments are generated by Mauve. Since the alignment is generated using sequences stored in CoGe, PolyMFind can query the location of each polymorphism and identify any underlying genomic feature. If the genomic feature is a protein coding gene, the change in coding sequence is assessed to determine if it causes a frameshift, synonymous, or nonsynonymous mutation.

Note: There a several sources of errors for these polymorphisms that from three major factors:

Sequencing errors
Assembly errors
Multiple sequence alignment errors

PolyMFind not only identifies and classifies polymorphisms, but also provides:

A false positive score of the polymorphism to help prioritize ones to investigate further
Views of data to assess whether a polymorphism is real
Tools to compare genomic regions to assess whether a polymorphism is real
Information about genes with polymorphisms
Links to extract addition information for genes
Summary tables of:
- All polymorphisms detected
- Polymorphisms detected in each genome
- Number of polymorphisms that may be the result of a homopolymer sequences error
- Counts of all the false positive scores

List of information in polymorphism table by column header

Note: all columns may be sorted by clicking on the header for that column.

Position: The position of the polymorphism as referenced by the first genome. This is approximate in the other genomes due to deletions, insertions, and missing sequence.
Position Sort: The position without commas so the column may be sorted numerically instead of alphanumerically
Gene: The name(s) of the gene (if one is present) affected by the polymorphism. This contains link to CoGe's FeatView program for getting more information about the gene.
False positive score: The combined false positive score of the polymorphism. This is the sum of two factors: wether the polymorphism is in a homopolymer and how many polymorphisms the gene has. Both of these metrics are shown at the end of the table and describe in more detail below.
Type: The type of polymorphism:
- SNP: Single Nucleotide Polymorphism
- indel: a insertion or deletion that occurs in more than one strain
- deletion: a single strain missing some sequence
- insertion: a single strain containing additional sequence
Subtype:
- Frameshift: an insertion/deletion/indel that in a protein coding sequence that causes a change in the reading frame for translation
- Large deletion/insertion: an insertion or deletion that is larger than 100 nt
- Nonsynonymous: a SNP in a protein coding sequence that results in the change of the encoded amino acid
- Synonymous: a SNP in a protein coding sequence that does not result in a change of the encoded amino acid

@@ Line 19: / Line 19: @@
 ** Number of polymorphisms that may be the result of a [[homopolymer sequences error]]
 ** Counts of all the false positive scores
+== List of information in polymorphism table by column header ==
+Note:  all columns may be sorted by clicking on the header for that column.
+* Position:  The position of the polymorphism as referenced by the first genome.  This is approximate in the other genomes due to deletions, insertions, and missing sequence.
+* Position Sort:  The position without commas so the column may be sorted numerically instead of alphanumerically
+* Gene:  The name(s) of the gene (if one is present) affected by the polymorphism.  This contains link to CoGe's [[FeatView]] program for getting more information about the gene.
+* False positive score:  The combined false positive score of the polymorphism.  This is the sum of two factors: wether the polymorphism is in a homopolymer and how many polymorphisms the gene has.  Both of these metrics are shown at the end of the table and describe in more detail below.
+* Type:  The type of polymorphism:
+** SNP:  Single Nucleotide Polymorphism
+** indel: a insertion or deletion that occurs in more than one strain
+** deletion: a single strain missing some sequence
+** insertion: a single strain containing additional sequence
+* Subtype:
+** Frameshift:  an insertion/deletion/indel that in a protein coding sequence that causes a change in the reading frame for translation
+** Large deletion/insertion:  an insertion or deletion that is larger than 100 nt
+** Nonsynonymous: a SNP in a protein coding sequence that results in the change of the encoded amino acid
+** Synonymous: a SNP in a protein coding sequence that does not result in a change of the encoded amino acid
 [[File:PolyMFind-GEvo.001.png]]

PolyMFind: Difference between revisions

Revision as of 22:58, 23 June 2011

Overview

List of information in polymorphism table by column header

Navigation menu

Search