Ancestral Reconstruction Pipeline: Difference between revisions

From CoGepedia
Jump to navigation Jump to search
No edit summary
Line 72: Line 72:
#genome_ID    number_of_synteny_blocks  paleopolyploid_depth title_for_set
#genome_ID    number_of_synteny_blocks  paleopolyploid_depth title_for_set
10997 21 3 cacao
10997 21 3 cacao
#colorCode: means ancestral chromosome assignment
#colorCode: means ancestral chromosome assignment -- better term is bin.  For eudicots, this is thought to be 7 (could be other numbers for other reconstructions)
#subgenome: which subgenome to which a block belongs
#subgenome: which subgenome to which a block belongs
#chr start end:  position of block in extant genome
#chr start end:  position of block in extant genome

Revision as of 16:19, 24 April 2014

This page is to document the Ancestral Reconstruction Pipeline by Chunfang Zheng

Master control is from her batch script: batchFile.txt


#compile 
#gets gene pairs from SynMap output
javac TestGetGenomes.java
#run with config file
#config file:
#number of genomes and number
java TestGetGenomes data/inputInfoCoGe.txt


javac TestGetContigInput.java
java TestGetContigInput data/inputInfoAGRP.txt
cd outputFiles
python contigInput_8400_9050_10997_19515.py> contigOutput.txt
cd ..
javac TestGetContigOutputAndScaffoldInput.java
java TestGetContigOutputAndScaffoldInput data/inputInfoAGRP.txt
cd outputFiles
python scaffoldInput1.py > scaffoldOutput1.txt
python scaffoldInput2.py > scaffoldOutput2.txt
python scaffoldInput3.py > scaffoldOutput3.txt
python scaffoldInput4.py > scaffoldOutput4.txt
python scaffoldInput5.py > scaffoldOutput5.txt
python scaffoldInput6.py > scaffoldOutput6.txt
python scaffoldInput7.py > scaffoldOutput7.txt
cd ..
javac TestScaffoldOutput.java
java TestScaffoldOutput


inputInfo example file (describes input from CoGe)

#obvious
numberOfGenomes	4
numberOfGenomePairs	9

#synmap output with correct syntenic depth
8400	9050	data/8400_9050.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords
10997	8400	data/10997_8400.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords
10997	9050	data/10997_9050.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords
10997	19515	data/10997_19515.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.1.40.gcoords
19515	8400	data/19515_8400.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac1.3.40.gcoords
19515	9050	data/19515_9050.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac1.3.40.gcoords
8400	8400	data/8400_8400.CDS-CDS.last.tdd10.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords
9050	9050	data/9050_9050.CDS-CDS.last.tdd10.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords
10997	10997	data/10997_10997.CDS-CDS.last.tdd10.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords

#syntenic depth among the genomes
#peach
8400	3
#grape
9050	3
Cacao
10997	3
#amborella
19515	1

#subgenome information
data/subGenomeRegions.txt

SubGenomeRegions.txt

This file contains infromation about subgenomes (parental genomes) making up an extant genome. Chunfang often creates these by hand, but does have a program to generate this. Practical_Aliquoting

#genome_ID    number_of_synteny_blocks   paleopolyploid_depth title_for_set
10997	21	3	cacao		
#colorCode: means ancestral chromosome assignment -- better term is bin.  For eudicots, this is thought to be 7 (could be other numbers for other reconstructions)
#subgenome: which subgenome to which a block belongs
#chr start end:  position of block in extant genome
colorCode	subgenome	chr	start	end
1	1	2	12716774	27462648
1	2	4	349021	14314443
1	3	3	208385	16091087
2	1	3	19982135	24212437
2	2	1	27207631	30674661
2	3	3	16741484	19970692
3	1	2	1350572	7237080
3	2	1	315357	7988483
3	3	8	43353	6712481
4	1	9	739576	3437504
4	2	6	1071819	9467758
4	3	9	3437504	9589803
5	1	4	18966341	23343107
5	2	1	21083375	26683534
5	3	5	23329957	25395907
6	1	9	23851693	28019603
6	2	5	541440	5362779
6	3	10	333882	12953021
7	1	6	10864133	14795052
7	2	1	8722499	15224371
7	3	7	511932	6542889
19515	0	1	amborella
colorCode	subgenome	chr	start	end