Ancestral Reconstruction Pipeline: Difference between revisions

From CoGepedia
Jump to navigation Jump to search
No edit summary
No edit summary
Line 65: Line 65:
#subgenome information
#subgenome information
data/subGenomeRegions.txt
data/subGenomeRegions.txt
</pre>
===SubGenomeRegions.txt===
This file contains infromation about subgenomes (parental genomes) making up an extant genome.  Chunfang often creates these by hand, but does have a program to generate this.  Practical_Aliquoting
<pre>
#genome_ID    number_of_synteny_blocks  paleopolyploid_depth title_for_set
10997 21 3 cacao
#colorCode: means ancestral chromosome assignment
#subgenome: which subgenome to which a block belongs
#chr start end:  position of block in extant genome
colorCode subgenome chr start end
1 1 2 12716774 27462648
1 2 4 349021 14314443
1 3 3 208385 16091087
2 1 3 19982135 24212437
2 2 1 27207631 30674661
2 3 3 16741484 19970692
3 1 2 1350572 7237080
3 2 1 315357 7988483
3 3 8 43353 6712481
4 1 9 739576 3437504
4 2 6 1071819 9467758
4 3 9 3437504 9589803
5 1 4 18966341 23343107
5 2 1 21083375 26683534
5 3 5 23329957 25395907
6 1 9 23851693 28019603
6 2 5 541440 5362779
6 3 10 333882 12953021
7 1 6 10864133 14795052
7 2 1 8722499 15224371
7 3 7 511932 6542889
19515 0 1 amborella
colorCode subgenome chr start end
</pre>
</pre>

Revision as of 15:47, 24 April 2014

This page is to document the Ancestral Reconstruction Pipeline by Chunfang Zheng

Master control is from her batch script: batchFile.txt


#compile 
#gets gene pairs from SynMap output
javac TestGetGenomes.java
#run with config file
#config file:
#number of genomes and number
java TestGetGenomes data/inputInfoCoGe.txt


javac TestGetContigInput.java
java TestGetContigInput data/inputInfoAGRP.txt
cd outputFiles
python contigInput_8400_9050_10997_19515.py> contigOutput.txt
cd ..
javac TestGetContigOutputAndScaffoldInput.java
java TestGetContigOutputAndScaffoldInput data/inputInfoAGRP.txt
cd outputFiles
python scaffoldInput1.py > scaffoldOutput1.txt
python scaffoldInput2.py > scaffoldOutput2.txt
python scaffoldInput3.py > scaffoldOutput3.txt
python scaffoldInput4.py > scaffoldOutput4.txt
python scaffoldInput5.py > scaffoldOutput5.txt
python scaffoldInput6.py > scaffoldOutput6.txt
python scaffoldInput7.py > scaffoldOutput7.txt
cd ..
javac TestScaffoldOutput.java
java TestScaffoldOutput


inputInfo example file (describes input from CoGe)

#obvious
numberOfGenomes	4
numberOfGenomePairs	9

#synmap output with correct syntenic depth
8400	9050	data/8400_9050.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords
10997	8400	data/10997_8400.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords
10997	9050	data/10997_9050.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords
10997	19515	data/10997_19515.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.1.40.gcoords
19515	8400	data/19515_8400.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac1.3.40.gcoords
19515	9050	data/19515_9050.CDS-CDS.last.tdd10.cs0.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac1.3.40.gcoords
8400	8400	data/8400_8400.CDS-CDS.last.tdd10.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords
9050	9050	data/9050_9050.CDS-CDS.last.tdd10.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords
10997	10997	data/10997_10997.CDS-CDS.last.tdd10.filtered.dag.all.go_D20_g10_A5.aligncoords.Dm0.ma1.qac3.3.40.gcoords

#syntenic depth among the genomes
#peach
8400	3
#grape
9050	3
Cacao
10997	3
#amborella
19515	1

#subgenome information
data/subGenomeRegions.txt

SubGenomeRegions.txt

This file contains infromation about subgenomes (parental genomes) making up an extant genome. Chunfang often creates these by hand, but does have a program to generate this. Practical_Aliquoting

#genome_ID    number_of_synteny_blocks   paleopolyploid_depth title_for_set
10997	21	3	cacao		
#colorCode: means ancestral chromosome assignment
#subgenome: which subgenome to which a block belongs
#chr start end:  position of block in extant genome
colorCode	subgenome	chr	start	end
1	1	2	12716774	27462648
1	2	4	349021	14314443
1	3	3	208385	16091087
2	1	3	19982135	24212437
2	2	1	27207631	30674661
2	3	3	16741484	19970692
3	1	2	1350572	7237080
3	2	1	315357	7988483
3	3	8	43353	6712481
4	1	9	739576	3437504
4	2	6	1071819	9467758
4	3	9	3437504	9589803
5	1	4	18966341	23343107
5	2	1	21083375	26683534
5	3	5	23329957	25395907
6	1	9	23851693	28019603
6	2	5	541440	5362779
6	3	10	333882	12953021
7	1	6	10864133	14795052
7	2	1	8722499	15224371
7	3	7	511932	6542889
19515	0	1	amborella
colorCode	subgenome	chr	start	end