Classify gene pair expression
Classifying the expression of a gene pair.
Removing datasets where the genes are off
Highly regulated genes will often be off in a majority of all datasets. This can bias attempts to measure correlation by creating a large cluster of datapoints around 0,0. To avoid this issue any datapoints were gene1 is expressed at less than 1/10th the maximum observed expression of gene1 and gene2 is also expressed at less than 1/10th the maximum observed expression of gene2 are masked from downstream analysis.
The number of datasets analyzed for a given gene pair over to total number of datasets avaliable is reported in the spreadsheet as "X".
Differential Gene Expression
The following criteria are NOT statistically significant and do not correct for multiple testing. Ideally any data gained through these analysis should be verified with qPCR prior to publication or downstream labor intensive work.
Groups of datasets where plants were or were not exposed to a stress/stimulus were collected. Within these datasets, a gene was considered differentially expressed if there was a two-fold difference in average expression between the two conditions AND the highest expressed datapoint in the less expressed condition was less than the least expressed datapoint from the more expressed condition.
These criteria were picked heuristically and can be fine-tuned as we gain more experience with these data-types.
Here are the datasets currently used for testing:
Light Response
Stimulus: Circadian2, Circadian3, Control Shoot
Control: Circadian1, Circadian5, Circadian6 (I dropped circadian4 because the lights would have just turned off and the plants wouldn't have had a chance to adapt yet).
Anerobic Shoot
Stimulus: Anerobic Shoot, Anerobic Shoot2
Control: Control Shoot, Circadian2, Circadian3
Anerobic Root
Stimulus: Anerobic Root
Control: Control Root
Dark Stress
Stimulus: Circadian1-6
Control: Constant Dark1-6
Overall Gene Pair Patterns
Automatically classified patterns of expression for a gene pair
db: Both Dead
Criteria: neither gene is expressed >= 1 FPKM in any dataset in the analysis. These gene pairs are not necessarily dead, but their are either only turned on under conditions not studied in the analysis or are expressed at such a low level pattern analysis is useless.
d1: Gene1 Dead
Criteria: gene1 is expressed < 1 FPKM in all datasets in the analysis AND is always expressed at least 10x less than the average expression of gene2.
d2: Gene2 Dead
Criteria: same as above just switch gene1 and gene2
nc: No Correlation
Criteria: The p-value of the spearman correlation between the expression of the two genes (after removing omitted conditions) is > .01 or the R value is .65 and greater than -.65.
Criteria: Gene pair didn't fail the no-correlation test, and the average expression of gene1 is at least 2x that og gene2 (after removing omitted conditions).
Criteria: Same as above just switch gene1 and gene2
Criteria: Didn't fail the no correlation test, less than a two fold difference in mean expression between the expression of the two genes. The direction of correlation is positive.
ic: Inverse Correlation
Criteria: Any correlated gene (p-value < .01, absolute value of R > .65) where the correlation is negative.