Difference between revisions of "FractBias"

From CoGepedia
Jump to: navigation, search
(Example Output)
Line 36: Line 36:
==Example Output==
==Example Output==
[[File:fractbiasfigure_slidingwindow.png|center|thumb|700px|'''Figure 3.''' A demonstration of how the sliding window analysis is carried out. The window size is set by the user, and begins at the first annotated gene on the target chromosome (x = 1 on the x-axis of the output subplot for that target chromosome). Then the window determines the percentage of homologs that exist on all query genome chromosomes within that window (output as the y value for each query genome chromosome in the output subplot). The window then slides down a single gene, and recalculates the percentage of corresponding homologs on each query genome chromosome (x = 2 and corresponding y values). This repeats until the window reaches the last gene on the target genome chromosome, and then the analysis terminates. Analysis is only carried out on full window sizes; any chromosome or window containing less genes than the user set window size is terminated. Results from this analysis are presented in '''Table 1 and 2''' below.]]
[[File:fractbiasfigure_slidingwindow_syntenicgenes.png|center|thumb|700px|'''Figure 4.''' An example FractBias subplot generated from '''Table 1 "All Genes Example Data Table."''']]
To demonstrate how the FractBias tool works, an example of a single syntenic block with eight genes is presented. The FractBias tool can be run with either "all genes" included, or "only retained genes" included. The "all genes" option will include all the genes from the target genome
If the include "only retained genes" option is set, all unique genes from either the target or query genome are not considered for the fractionation bias analysis. This option can be used to remove variation from two genomes that have diverged over longer periods and clean up the analysis.
==Biological Examples==
==Biological Examples==

Revision as of 13:26, 20 April 2016


Whole genome duplications (WGDs) and genome fractionation are covered more thoroughly in other CoGepedia entries. In short, WGDs create two or more copies of a genome: which are referred to as subgenomes. The duplicate subgenomes then undergo gene loss in a process called fractionation which is part of returning to a diploid state, diploidization. All things being equal, one may assume that fractionation would occur randomly across the redundant genes created after a WGD, however bias towards gene loss on one genome, called fractionation bias, has been observed in several species including: maize [1], Brassica rapa [2], and rainbow trout [3].

The FractBias code and an example data set can be found on GitHub https://github.com/bjoyce3/SynMapFractBiasAnalysis


[[File:|right|thumb|900px|Figure 1. A demonstration of which genes are included in the FractBias analysis of retained genes when the include "All genes" setting is selected. All genes that exist on target genome chromosomes will be used to determine the sliding window size and calculate the number of retained genes on query chromosomes.]] [[File:|right|thumb|900px|Figure 2. A demonstration of which genes are included in the FractBias analysis of retained genes when the include "Only retained genes" is selected. Genes unique to either the target or to the query genome will not be considered in either the window size or in calculating the number of retained genes within in the window.]]

What goes in

  1. Two assembled genomes that have annotated coding sequences (CDS)
  2. A syntenic ratio set by the user (identified by empiric tests outside of the FractBias tool)
    1. The genome with a lower ratio will be the target genome
    2. The genome with a higher ratio will be the query genome
  3. The full GFF of the target genome
  4. The syntenic blocks identified by SynMap
  5. Setting defined by the user
    1. What genes should be counted
      1. Count all genes present on the target genome (refer to Figure 1)
      2. Only count genes that are retained in both genomes (refer to Figure 2)
    2. Target chromosome number
    3. Query chromosome number
    4. Window size

Data files passed in

  1. SynMap DAGChainer output: comparison_name.aligncoords.gcoords
  2. GFF file for target genome

What comes out

  1. A figure containing a subplot for every target genome chromosome
  2. Links to the raw data used to create the subplots

Example Output

Biological Examples

Sorghum and Maize Fractionation Bias

Figure 5. The fractionation bias that occurred after the maize whole genome duplication (WGD) has been studied previously[4] by comparing maize to sorghum. The WGDs events are denoted by stars along the Poaceae lineage. This analysis used the 'only retained genes' option to remove any genes unique to maize or sorghum. Link to regenerate analysis:

The fractionation bias in the maize genome has been previously studied[4] independently. This analysis was rerun using the FractBias tool.

Figure 6. Results from running the FractBias tool

Arabidopsis thaliana and Brassica rapa Fractionation Bias

Figure 6. Results from running the FractBias tool
Table 1. FractBias examples available through CoGe’s SynMap. Syntenic depth ratios range from 1:1 to 1:6 using two species of plasmodia, two mammals, and six species of plants to highlight the flexibility and ease of use of FractBias.
Target Species Query Species Syntenic Depth Ratio Link to 'All Genes' Analysis Link to 'Only Syntenic Genes' Analysis
Plasmodium falciparum Plasmodium knowlesi 1:1 https://genomevolution.org/r/k7j6 https://genomevolution.org/r/k7km
Homo sapiens Pan troglodytes 1:1 https://genomevolution.org/r/k813 https://genomevolution.org/r/k811
Sorghum bicolor Zea mays 1:2 https://genomevolution.org/r/k7jx https://genomevolution.org/r/k7j3
Brassica rapa Brassica napus 1:2 https://genomevolution.org/r/k7mw https://genomevolution.org/r/k7k3
Arabidopsis thaliana Brassica rapa 1:3 https://genomevolution.org/r/k7jq https://genomevolution.org/r/k7jg
Vitis vinifera Arabidopsis thaliana 1:4 https://genomevolution.org/r/k7p1 https://genomevolution.org/r/k7ov
Arabidopsis thaliana Brassica napus 1:6 https://genomevolution.org/r/k7qz https://genomevolution.org/r/k7r6