SynFind

From CoGepedia
Jump to: navigation, search

Overview

SynFind identifies syntenic regions against any set of genomes given a gene in one genome, even if that gene is not present in a target genome. In the process of these analyses, SynFind identifies all syntenic regions to all genes in the query genome. Complete syntenic gene-sets can be downloaded, and syntenic depth tables are generated to access the polyploidy level between the query genome and each target genome. The primary algorithm of this analysis is SyntenyScore written by Haibao Tang.

Syntenic Depth

SynFind calculates the number and percentage of genes for a particular Syntenic depth in each target genome.

SynFind Syntenic Depth Examples

Options for SynFind

General parameters

  • Comparison algorithm: Select the algorithm for comparing genomes. We recommend Last! Much, much faster than LastZ!

Synteny Finding: SyntenyScore

SynFind is powered by SyntenyScore (part of the BaoTools Package)

Parameters

  • Gene window Size: synteny window size in genes [default: 40]
    • The Window Size is the size of the genomic regions compared between two genomes using genes as the metric. Given an anchor gene, the window size is divided by 2 and that many genes searched up and downstream from the anchor.
    • Example, a window size of 40 means that a total of 41 genes are checked: the anchor gene; plus 20 upstream; plus 20 downstream.
  • Minimum number of genes: The minimum number of anchoring genes to call a region syntenic. [default: 4]
  • Scoring Function: scoring scheme, must be one of ('collinear', 'density') [default: collinear]
    • Collinear: a collinear arrangement of syntenic genes in enforced
    • Density: any arrangement of gene-pairs is tolerated

Master syntenic pairs table

The header line will contain information as to which column contains what data, but the general format is:

  1. COUNTS: a condensed version of the results where each number represents the number of syntenic regions identified to the query gene.
    1. E.g. 1,3,2 means A query gene (1) matched three regions in the first target genome (3) and two regions in the second target genome (2)
  2. (2 or more) ORG (followed my organism name): Lists the gene matched in the target genome. If no gene is matched, the word 'proxy' is used. If multiple genes/proxies are matched, they will all be listed and delimited by ","
  3. (2 or more) CHR (followed by organism name): Lists the chromosomes/contigs/scaffolds matched in the target genome
    1. If a query gene matches multiple syntenic regions in a given target genome, each region's chromosome will be listed in this column and delimited by a ","
    2. E.g.: scaffold-5256,scaffold-9874,scaffold-9464
    3. The order of multiple matches are in decreasing synteny score
  4. GEvo link: a link to GEvo for analyzing those genomic regions.

Reference on Scoring Method

http://gbe.oxfordjournals.org/content/early/2015/11/11/gbe.evv219

SynFind: compiling syntenic regions across any set of genomes on demand

Haibao Tang, Matthew D. Bomhoff, Evan Briones, Liangsheng Zhang, James C. Schnable and Eric Lyons

Download Synteny Score code