Difference between revisions of "FractBias"

From CoGepedia
Jump to: navigation, search
(Biological Examples)
 
(66 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
[[Whole genome duplication]]s (WGDs) and genome [[fractionation]] are covered more thoroughly in other CoGepedia entries. In short, WGDs create two or more copies of a genome: which are referred to as subgenomes. The duplicate subgenomes then undergo gene loss in a process called fractionation which is part of returning to a diploid state, [[diploidization]]. All things being equal, one may assume that fractionation would occur randomly across the redundant genes created after a WGD, however bias towards gene loss on one genome, called [[fractionation bias]], has been observed in several species including: maize <ref name="schnable
 
[[Whole genome duplication]]s (WGDs) and genome [[fractionation]] are covered more thoroughly in other CoGepedia entries. In short, WGDs create two or more copies of a genome: which are referred to as subgenomes. The duplicate subgenomes then undergo gene loss in a process called fractionation which is part of returning to a diploid state, [[diploidization]]. All things being equal, one may assume that fractionation would occur randomly across the redundant genes created after a WGD, however bias towards gene loss on one genome, called [[fractionation bias]], has been observed in several species including: maize <ref name="schnable
 
_2011">[http://journal.frontiersin.org/article/10.3389/fpls.2011.00002/abstract Schnable, J.C. et al. Dose–sensitivity, conserved non-coding sequences, and duplicate gene retention through multiple tetraploidies in the grasses. Front. Plant Sci. http://dx.doi.org/10.3389/fpls.2011.00002 (2011)]</ref>, ''Brassica rapa'' <ref name="cheng_2012">[http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0036442 Cheng, F. et al. Biased gene fractionation and dominant gene expression among the subgenomes of ''Brassica rapa''. PLOS ONE DOI: 10.1371/journal.pone.0036442 (2012)]</ref>, and rainbow trout <ref name="berthelot 2014">[http://www.nature.com/ncomms/2014/140422/ncomms4657/full/ncomms4657.html Berthelot, C. et al. The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nature Communications 5: DOI:10.1038/ncomms4657 (2014)]</ref>.
 
_2011">[http://journal.frontiersin.org/article/10.3389/fpls.2011.00002/abstract Schnable, J.C. et al. Dose–sensitivity, conserved non-coding sequences, and duplicate gene retention through multiple tetraploidies in the grasses. Front. Plant Sci. http://dx.doi.org/10.3389/fpls.2011.00002 (2011)]</ref>, ''Brassica rapa'' <ref name="cheng_2012">[http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0036442 Cheng, F. et al. Biased gene fractionation and dominant gene expression among the subgenomes of ''Brassica rapa''. PLOS ONE DOI: 10.1371/journal.pone.0036442 (2012)]</ref>, and rainbow trout <ref name="berthelot 2014">[http://www.nature.com/ncomms/2014/140422/ncomms4657/full/ncomms4657.html Berthelot, C. et al. The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nature Communications 5: DOI:10.1038/ncomms4657 (2014)]</ref>.
 +
 +
The FractBias code and an example data set can be found on [https://github.com/bjoyce3/SynMapFractBiasAnalysis GitHub]
  
 
==Overview==
 
==Overview==
[[File:fractbiasfigure_allgenes.png|right|thumb|700px|'''Figure 1.''' A demonstration of which genes are included in the FractBias analysis of retained genes when the include "All genes" setting is selected. All genes that exist on target genome chromosomes will be used to determine the sliding window size and calculate the number of retained genes on query chromosomes.]]
+
[[File:Set_syntenicdepth.png|right|thumb|750px|'''Figure 1.''' Setting the SynMap syntenic depth and Run FractBias options (https://genomevolution.org/CoGe/SynMap.pl).]]
[[File:fractbiasfigure_oneretained.png|right|thumb|700px|'''Figure 2.''' A demonstration of which genes are included in the FractBias analysis of retained genes when the include "Only retained genes" is selected. Genes unique to either the target or to the query genome will not be considered in either the window size or in calculating the number of retained genes within in the window.]]
+
===Workflow===
 +
#SynMap comparison is carried out between two genomes with Syntenic Depth option set (Figure 1)
 +
#FractBias takes output and runs sliding window analysis
 +
#A figure with subplots for every target genome chromosome is created
 +
##x-axis: number of genes present on the chromosome in the target genome chromosome
 +
##y-axis: percentage of retained genes in each sliding window
 +
 
 +
 
 
===What goes in===
 
===What goes in===
 
#Two assembled genomes that have annotated coding sequences (CDS)
 
#Two assembled genomes that have annotated coding sequences (CDS)
Line 13: Line 22:
 
#The full GFF of the target genome
 
#The full GFF of the target genome
 
#The syntenic blocks identified by SynMap
 
#The syntenic blocks identified by SynMap
#Setting defined by the user
+
#Parameters defined by the user
 +
##Window size in number of genes
 +
##How many total chromosomes should be used
 +
###The maximum number of query chromosome that should be considered
 +
###The maximum number of target chromosome that should be considered
 +
##Whether chromosomes containing the name 'unknown' or 'random' should be removed from consideration
 
##What genes should be counted
 
##What genes should be counted
###Count all genes present on the target genome (refer to Figure 1)
+
###Count all genes present on the target genome
###Only count genes that are retained in both genomes (refer to Figure 2)
+
###Only count genes that are retained in both genomes
##Target chromosome number
+
 
##Query chromosome number
+
 
##Window size
+
====Data files passed in====
 +
#SynMap DAGChainer output: comparison_name.aligncoords.gcoords
 +
#GFF file for target genome
 +
 
  
 
===What comes out===
 
===What comes out===
Line 25: Line 42:
 
#Links to the raw data used to create the subplots
 
#Links to the raw data used to create the subplots
  
==FractBias Methods==
+
==Fractionation Bias Examples==
 +
===Sorghum and Maize Fractionation Bias===
 +
The fractionation bias in the maize genome has been previously studied<ref name= schnable_2011B></ref> independently. This analysis was rerun using the FractBias tool. Zea mays (maize) and Sorghum bicolor (sorghum) have recently diverged. Zea mays experienced a whole genome duplication (Figure 5), and so the syntenic depth ratio is sorghum 1:maize 2. Therefore, for each sorghum chromosome subplot there will be up to two chromosomes of maize present (Figure 6). Fractionation bias in maize chromosomes is present across almost all sorghum chromosomes except for sorghum chromosomes 8 and 10. Sorghum chromosomes 1, 7, 9, and 10 have more than two maize chromosomes represented. These indicate areas of the maize chromosomes that have recombined over the maize evolutionary history.
 +
[[File:wgds_monocots.png|left|thumb|300px|'''Figure 2.''' The fractionation bias that occurred after the maize whole genome duplication (WGD) has been studied previously<ref name= schnable_2011B>[http://www.pnas.org/content/108/10/4069 Schnable, J. C. et al. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. PNAS 108:4069-4074]</ref> by comparing maize to sorghum. The WGDs events are denoted by stars along the Poaceae lineage. This analysis used the 'only retained genes' option to remove any genes unique to maize or sorghum. Link to regenerate analysis: ]]
  
 +
[[File:sorghum_maize_fractbias_onlyretained.png|right|thumb|750px|'''Figure 3.''' Results from running the FractBias tool. Areas of underfractionation (black arrows), intermediate fractionation (blue arrows), and overfractionation (red arrows) can be observed. .]]
  
FractBias is a tool used to assess fractionation bias after whole genome duplications (WGDs). To investigate fractionation bias, select an organism that has experienced a WGD (''e.g.'' maize) which will become the 'query' genome, and an organism that diverged before the WGD (''e.g.'' sorghum recently diverged before the WGD in maize). The following is a list of all user inputs:
+
===''Arabidopsis thaliana'' and ''Brassica rapa'' Fractionation Bias===
  
===User Input===
+
[[File:Athaliana_Brapa_onlysyntenic.png|middle|thumb|750px|'''Figure 4.''' Results from running the FractBias tool comparing ''Arabidopsis thaliana'' (target genome) to ''Brassica rapa'' (query genome). Areas of underfractionation (black arrows), intermediate fractionation (blue arrows), and overfractionation (red arrows) can be observed. Results can be regenerated https://genomevolution.org/r/k7jg.]]
#Select two genomes to compare in the [[SynMap]] tool.
+
#Select the SynMap 'Syntenic Depth' option under 'Analysis Options.'
+
#Set syntenic depth ratio between genomes (determined by empirically outside of this tool).
+
#Set how many target genome chromosomes should be included in the analysis. There is a maximum of 40 target chromosomes that can be included, and the longest chromosomes are selected first.
+
#Set how many query genome chromosomes should be included in the analysis. There is a maximum of 40 query chromosomes that can be included, and the longest chromosomes are selected first.
+
#Set the size of the sliding window during analysis.
+
  
===FractBias tool analysis===
+
===Additional Examples===
Once all of the user input options are filled and submitted, the FractBias tool then runs an analysis in the following steps:
+
More examples can be found in the table below. While FractBias was designed to investigate fractionation after whole genome duplications, it can also be used to investigate chromosome composition between species.
 +
 
 +
[[File:Human_Pan_synmap.png|middle|thumb|750px|'''Figure 5.''' A syntenic depth ratio 1:1 comparison between the first 12 chromosomes of Homo sapiens and all the chromosomes of Pan troglodytes using the 'all genes' setting.]]
  
#The coordinates for syntenic regions between the genomes are determined by the [[SynMap]] tool
 
#The syntenic genes are then parsed according to the 'target' and 'query' genomes. The genome with the lower syntenic depth ratio is set as the target genome; the genome with the higher ratio is set as the query genome.
 
#A list of genes present on every target genome chromosome is made and ordered according to start site (bp) in the annotation (gff/gtf file).
 
#The FractBias tool then goes through the list of each target genome gene, and determines if it has a retained homolog on one (or more) of the query chromosomes.
 
#Finally, the FractBias tool runs a sliding window analysis to calculate how many genes are retained for each query chromosome.
 
#A figure is generated that contains a subplot for every target genome chromosome
 
##The x-axis: target genome gene order number in sliding window analysis according to order of start site in genome annotation (gff/gtf). 
 
##The y-axis: percent of retained genes from the target genome present on each query chromosome within that window.
 
#The [[SynMap]] raw data, the FractBias data, genes identified using FractBias, and the images can be downloaded through links for further use.
 
  
==Example Output==
 
[[File:fractbiasfigure_slidingwindow.png|center|thumb|700px|'''Figure 3.''' A demonstration of how the sliding window analysis is carried out. The window size is set by the user, and begins at the first annotated gene on the target chromosome (x = 1 on the x-axis of the output subplot for that target chromosome). Then the window determines the percentage of homologs that exist on all query genome chromosomes within that window (output as the y value for each query genome chromosome in the output subplot). The window then slides down a single gene, and recalculates the percentage of corresponding homologs on each query genome chromosome (x = 2 and corresponding y values). This repeats until the window reaches the last gene on the target genome chromosome, and then the analysis terminates. Analysis is only carried out on full window sizes; any chromosome or window containing less genes than the user set window size is terminated. Results from this analysis are presented in '''Table 1 and 2''' below.]]
 
  
[[File:Fractbias_allgenegraph.png|right|thumb|400px|'''Figure 4.''' An example FractBias subplot generated from '''Table 1 "All Genes Example Data Table."''']]
 
  
To demonstrate how the FractBias tool works, an example of a single syntenic block with eight genes is presented. The FractBias tool can be run with either "all genes" included, or "only retained genes" included. The "all genes" option will include all the genes from the target genome
 
  
 
{| class="wikitable"
 
{| class="wikitable"
!colspan="6"|'''Table 1. All Genes Example Data Table'''
+
!colspan="6"|'''Table 1. FractBias examples available through CoGe’s SynMap.  Syntenic depth ratios range from 1:1 to 1:6 using two species of plasmodia, two mammals, and six species of plants to highlight the flexibility and ease of use of FractBias. '''
 
|-
 
|-
! scope="col"|Window Iteration
+
! scope="col"|Target Species
! scope="col"|X-axis Value
+
! scope="col"|Query Species
! scope="col"|Genes counted in Subgenome 1
+
! scope="col"|Syntenic Depth Ratio
! scope="col"|Y-axis Value Sub 1
+
! scope="col"|Link to 'All Genes' Analysis
! scope="col"|Genes counted in Subgenome 2
+
! scope="col"|Link to 'Only Syntenic Genes' Analysis
! scope="col"|Y-axis Value Sub 2
+
 
|-
 
|-
| style="text-align:center;" |1
+
| style="text-align:center;" |Plasmodium falciparum
| style="text-align:center;" |1
+
| style="text-align:center;" |Plasmodium knowlesi
| style="text-align:center;" |1, 2, 3, 4
+
| style="text-align:center;" |1:1
| style="text-align:center;" |100
+
| style="text-align:center;" |https://genomevolution.org/r/k7j6
| style="text-align:center;" |1, <span style="color: red">2</span>, 3, <span style="color: red">4</span>
+
| style="text-align:center;" |https://genomevolution.org/r/k7km
| style="text-align:center;" |50
+
 
|-
 
|-
| style="text-align:center;" |2
+
| style="text-align:center;" |Homo sapiens
| style="text-align:center;" |2
+
| style="text-align:center;" |Pan troglodytes
| style="text-align:center;" |2, 3, 4,<span style="color: red">5</span>
+
| style="text-align:center;" |1:1
| style="text-align:center;" |75
+
| style="text-align:center;" |https://genomevolution.org/r/k813
| style="text-align:center;" |<span style="color: red">2</span>, 3, <span style="color: red">4</span>, <span style="color: red">5</span>
+
| style="text-align:center;" |https://genomevolution.org/r/k811
| style="text-align:center;" |25
+
 
|-
 
|-
| style="text-align:center;" |3
+
| style="text-align:center;" |Sorghum bicolor
| style="text-align:center;" |3
+
| style="text-align:center;" |Zea mays
| style="text-align:center;" |3, 4, <span style="color: red">5</span>, 6
+
| style="text-align:center;" |1:2
| style="text-align:center;" |75
+
| style="text-align:center;" |https://genomevolution.org/r/k7jx
| style="text-align:center;" |3, <span style="color: red">4</span>, <span style="color: red">5</span>, 6
+
| style="text-align:center;" |https://genomevolution.org/r/k7j3
| style="text-align:center;" |50
+
 
|-
 
|-
| style="text-align:center;" |4
+
| style="text-align:center;" |Brassica rapa
| style="text-align:center;" |4
+
| style="text-align:center;" |Brassica napus
| style="text-align:center;" |4, <span style="color: red">5</span>, 6, 8
+
| style="text-align:center;" |1:2
| style="text-align:center;" |75
+
| style="text-align:center;" |https://genomevolution.org/r/k7mw
| style="text-align:center;" | <span style="color: red">4</span>, <span style="color: red">5</span>, 6, 8
+
| style="text-align:center;" |https://genomevolution.org/r/k7k3
| style="text-align:center;" |50
+
|}
+
'*' Red numbers are counted as zeros in sliding window analysis
+
 
+
 
+
[[File:Fractbias_oneretainedgraph.png|right|thumb|400px|'''Figure 5.''' An example FractBias subplot generated from '''Table 2 "Only Retained Genes Example Data Table."''']]
+
 
+
If the include "only retained genes" option is set, all unique genes from either the target or query genome are not considered for the fractionation bias analysis. This option can be used to remove variation from two genomes that have diverged over longer periods and clean up the analysis.
+
 
+
{| class="wikitable"
+
!colspan="6"|'''Table 2. Only Retained Genes Example Data Table'''
+
 
|-
 
|-
! scope="col"|Window Iteration
+
| style="text-align:center;" |Arabidopsis thaliana
! scope="col"|X-axis Value
+
| style="text-align:center;" |Brassica rapa
! scope="col"|Genes counted in Subgenome 1
+
| style="text-align:center;" |1:3
! scope="col"|Y-axis Value Sub 1
+
| style="text-align:center;" |https://genomevolution.org/r/k7jq
! scope="col"|Genes counted in Subgenome 2
+
| style="text-align:center;" |https://genomevolution.org/r/k7jg
! scope="col"|Y-axis Value Sub 2
+
 
|-
 
|-
| style="text-align:center;" |1
+
| style="text-align:center;" |Vitis vinifera
| style="text-align:center;" |1
+
| style="text-align:center;" |Arabidopsis thaliana
| style="text-align:center;" |1, 2, 3, 4
+
| style="text-align:center;" |1:4
| style="text-align:center;" |100
+
| style="text-align:center;" |https://genomevolution.org/r/k7p1
| style="text-align:center;" |1, <span style="color: red">2</span>, 3, <span style="color: red">4</span>
+
| style="text-align:center;" |https://genomevolution.org/r/k7ov
| style="text-align:center;" |50
+
 
|-
 
|-
| style="text-align:center;" |2
+
| style="text-align:center;" |Arabidopsis thaliana
| style="text-align:center;" |2
+
| style="text-align:center;" |Brassica napus
| style="text-align:center;" |2, 3, 4, 5
+
| style="text-align:center;" |1:6
| style="text-align:center;" |75
+
| style="text-align:center;" |https://genomevolution.org/r/k7qz
| style="text-align:center;" |<span style="color: red">2</span>, 3, <span style="color: red">4</span>, 6
+
| style="text-align:center;" |https://genomevolution.org/r/k7r6
| style="text-align:center;" |25
+
|-
+
| style="text-align:center;" |3
+
| style="text-align:center;" |3
+
| style="text-align:center;" |3, 4, 5, 6
+
| style="text-align:center;" |75
+
| style="text-align:center;" |3, <span style="color: red">4</span>, 6, 8
+
| style="text-align:center;" |50
+
|-
+
| style="text-align:center;" |4
+
| style="text-align:center;" |4
+
| style="text-align:center;" |4, 5, 6, 8
+
| style="text-align:center;" |75
+
| style="text-align:center;" | <span style="color: red">4</span>, 5, 6, 8
+
| style="text-align:center;" |50
+
 
|}
 
|}
'*' Red numbers are counted as zeros in sliding window analysis
 
 
==Biological Examples==
 
Sorghum and Maize Fractionation Bias
 
[[File:wgds_monocots.png|right|thumb|300px|'''Figure 6.''' The maize whole genome duplication (WGD) has been studied previously <ref name= schnable_2011B>[http://www.pnas.org/content/108/10/4069, Schnable, J. C. et al. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. PNAS 108:4069-4074]</ref>. The WGDs are denoted by stars.]]
 
 
 
 
[[File:sorghum_maize_fractbias_onlyretained.png|right|thumb|750px|caption]]
 
 
 
''Arabidopsis'' and ''Brassica'' Fractionation Bias
 
 
''Esox'' and ''Oncorhynchus'' Fractionation Bias
 
  
 
==References==
 
==References==
 
{{reflist}}
 
{{reflist}}

Latest revision as of 14:04, 1 September 2016

Background

Whole genome duplications (WGDs) and genome fractionation are covered more thoroughly in other CoGepedia entries. In short, WGDs create two or more copies of a genome: which are referred to as subgenomes. The duplicate subgenomes then undergo gene loss in a process called fractionation which is part of returning to a diploid state, diploidization. All things being equal, one may assume that fractionation would occur randomly across the redundant genes created after a WGD, however bias towards gene loss on one genome, called fractionation bias, has been observed in several species including: maize [1], Brassica rapa [2], and rainbow trout [3].

The FractBias code and an example data set can be found on GitHub

Overview

Figure 1. Setting the SynMap syntenic depth and Run FractBias options (https://genomevolution.org/CoGe/SynMap.pl).

Workflow

  1. SynMap comparison is carried out between two genomes with Syntenic Depth option set (Figure 1)
  2. FractBias takes output and runs sliding window analysis
  3. A figure with subplots for every target genome chromosome is created
    1. x-axis: number of genes present on the chromosome in the target genome chromosome
    2. y-axis: percentage of retained genes in each sliding window


What goes in

  1. Two assembled genomes that have annotated coding sequences (CDS)
  2. A syntenic ratio set by the user (identified by empiric tests outside of the FractBias tool)
    1. The genome with a lower ratio will be the target genome
    2. The genome with a higher ratio will be the query genome
  3. The full GFF of the target genome
  4. The syntenic blocks identified by SynMap
  5. Parameters defined by the user
    1. Window size in number of genes
    2. How many total chromosomes should be used
      1. The maximum number of query chromosome that should be considered
      2. The maximum number of target chromosome that should be considered
    3. Whether chromosomes containing the name 'unknown' or 'random' should be removed from consideration
    4. What genes should be counted
      1. Count all genes present on the target genome
      2. Only count genes that are retained in both genomes


Data files passed in

  1. SynMap DAGChainer output: comparison_name.aligncoords.gcoords
  2. GFF file for target genome


What comes out

  1. A figure containing a subplot for every target genome chromosome
  2. Links to the raw data used to create the subplots

Fractionation Bias Examples

Sorghum and Maize Fractionation Bias

The fractionation bias in the maize genome has been previously studied[4] independently. This analysis was rerun using the FractBias tool. Zea mays (maize) and Sorghum bicolor (sorghum) have recently diverged. Zea mays experienced a whole genome duplication (Figure 5), and so the syntenic depth ratio is sorghum 1:maize 2. Therefore, for each sorghum chromosome subplot there will be up to two chromosomes of maize present (Figure 6). Fractionation bias in maize chromosomes is present across almost all sorghum chromosomes except for sorghum chromosomes 8 and 10. Sorghum chromosomes 1, 7, 9, and 10 have more than two maize chromosomes represented. These indicate areas of the maize chromosomes that have recombined over the maize evolutionary history.

Figure 2. The fractionation bias that occurred after the maize whole genome duplication (WGD) has been studied previously[4] by comparing maize to sorghum. The WGDs events are denoted by stars along the Poaceae lineage. This analysis used the 'only retained genes' option to remove any genes unique to maize or sorghum. Link to regenerate analysis:
Figure 3. Results from running the FractBias tool. Areas of underfractionation (black arrows), intermediate fractionation (blue arrows), and overfractionation (red arrows) can be observed. .

Arabidopsis thaliana and Brassica rapa Fractionation Bias

Figure 4. Results from running the FractBias tool comparing Arabidopsis thaliana (target genome) to Brassica rapa (query genome). Areas of underfractionation (black arrows), intermediate fractionation (blue arrows), and overfractionation (red arrows) can be observed. Results can be regenerated https://genomevolution.org/r/k7jg.

Additional Examples

More examples can be found in the table below. While FractBias was designed to investigate fractionation after whole genome duplications, it can also be used to investigate chromosome composition between species.

Figure 5. A syntenic depth ratio 1:1 comparison between the first 12 chromosomes of Homo sapiens and all the chromosomes of Pan troglodytes using the 'all genes' setting.



Table 1. FractBias examples available through CoGe’s SynMap. Syntenic depth ratios range from 1:1 to 1:6 using two species of plasmodia, two mammals, and six species of plants to highlight the flexibility and ease of use of FractBias.
Target Species Query Species Syntenic Depth Ratio Link to 'All Genes' Analysis Link to 'Only Syntenic Genes' Analysis
Plasmodium falciparum Plasmodium knowlesi 1:1 https://genomevolution.org/r/k7j6 https://genomevolution.org/r/k7km
Homo sapiens Pan troglodytes 1:1 https://genomevolution.org/r/k813 https://genomevolution.org/r/k811
Sorghum bicolor Zea mays 1:2 https://genomevolution.org/r/k7jx https://genomevolution.org/r/k7j3
Brassica rapa Brassica napus 1:2 https://genomevolution.org/r/k7mw https://genomevolution.org/r/k7k3
Arabidopsis thaliana Brassica rapa 1:3 https://genomevolution.org/r/k7jq https://genomevolution.org/r/k7jg
Vitis vinifera Arabidopsis thaliana 1:4 https://genomevolution.org/r/k7p1 https://genomevolution.org/r/k7ov
Arabidopsis thaliana Brassica napus 1:6 https://genomevolution.org/r/k7qz https://genomevolution.org/r/k7r6

References