Ancestral Reconstruction Pipeline: Difference between revisions
No edit summary |
No edit summary |
||
Line 7: | Line 7: | ||
==== GetGenomes: remove config file, add option to specify output dir for output files ==== | ==== GetGenomes: remove config file, add option to specify output dir for output files ==== | ||
This program gets data ready for downstream processing | |||
* -d<directory of input synmap files> | * -d<directory of input synmap files> | ||
* -g gid1,gid2,gid3,gid4... <list of common separated coge genome ids> | * -g gid1,gid2,gid3,gid4... <list of common separated coge genome ids> | ||
Line 13: | Line 14: | ||
* -o <output directory for SubGenomeInGeneOrder OrthologSets GenomeInString files> | * -o <output directory for SubGenomeInGeneOrder OrthologSets GenomeInString files> | ||
====GetContigInput==== | ====GetContigInput==== | ||
This program gets data ready for ancestral ordering of genes by MWM | |||
*Remove config file dependency | *Remove config file dependency | ||
* -g gid1,gid2,gid3,gid4... <list of common separated coge genome ids> | * -g gid1,gid2,gid3,gid4... <list of common separated coge genome ids> | ||
Line 22: | Line 24: | ||
====MWMPython: http://jorisvr.nl/maximummatching.html needs command line options for ==== | ====MWMPython: http://jorisvr.nl/maximummatching.html needs command line options for ==== | ||
This program is a general tool for Maximum Weight Matching. First run is for ancestral gene ordering. Second run is for ancestral contig ordering | |||
* -i <input file or directory> | * -i <input file or directory> | ||
*** File type is a set of vertex vertex weight | *** File type is a set of vertex vertex weight | ||
Line 28: | Line 31: | ||
====GetContigOutputAndScaffoldInput==== | ====GetContigOutputAndScaffoldInput==== | ||
This program maps ancestral contigs back to various genomes, gets their positions, and gets data formatted for a second MWM to generate ancestral ordered contigs | |||
Note: goal is to get everything onto the command line. Currently, several of the files are hardcoded | Note: goal is to get everything onto the command line. Currently, several of the files are hardcoded | ||
*-cl <threshold minimum contig length. Called 'minimumGeneGroupLength' in original config file> | *-cl <threshold minimum contig length. Called 'minimumGeneGroupLength' in original config file> | ||
Line 40: | Line 44: | ||
====MWMPython==== | ====MWMPython==== | ||
Reuse of the same program listed above | Reuse of the same program listed above for ancestral ordering of contigs | ||
====ScaffoldOutput==== | ====ScaffoldOutput==== | ||
Putting all the output data back together for the final ancestral genome | |||
*-i <directory of input files> | *-i <directory of input files> | ||
*-o <output file of reconstructed genome> | *-o <output file of reconstructed genome> |
Revision as of 22:10, 25 April 2014

Plan for refactoring
GetGenomes: remove config file, add option to specify output dir for output files
This program gets data ready for downstream processing
- -d<directory of input synmap files>
- -g gid1,gid2,gid3,gid4... <list of common separated coge genome ids>
- -p p1,p2,p3,p4 <list of comma separated ploidy levels for genomes -- note these are paired ordered data with the -g option?>
- -s < subgenome_file>
- -o <output directory for SubGenomeInGeneOrder OrthologSets GenomeInString files>
GetContigInput
This program gets data ready for ancestral ordering of genes by MWM
- Remove config file dependency
- -g gid1,gid2,gid3,gid4... <list of common separated coge genome ids>
- -w w1,w2,w3,w4 <list of comma separated weights for genomes -- note these are paired ordered data with the -g option?>
- -wa <threshold minimum adjacency score for keeping a contig. Called 'weightOfAdjacent' in original config file>
- -i <input file: GenomesInString from program GetGenomes>
- -o <output file name>
Note: Output is now a tab delimited file with each line containing: vertex vertex weight . These data will be used by the MEMPython data below
MWMPython: http://jorisvr.nl/maximummatching.html needs command line options for
This program is a general tool for Maximum Weight Matching. First run is for ancestral gene ordering. Second run is for ancestral contig ordering
- -i <input file or directory>
- File type is a set of vertex vertex weight
- note: if directory, will batch process all files
- -o <outfile or directory> If no option is specified, the results go to STDOUT
GetContigOutputAndScaffoldInput
This program maps ancestral contigs back to various genomes, gets their positions, and gets data formatted for a second MWM to generate ancestral ordered contigs Note: goal is to get everything onto the command line. Currently, several of the files are hardcoded
- -cl <threshold minimum contig length. Called 'minimumGeneGroupLength' in original config file>
- -b <Number of bins for assignment to ancestral chromosomes. Called 'AncChrNumber' in original config file> (Note: this may be removed if this info can be derived from the input -- specifically the SubGenomesInGeneOrder file).)
Note: genomeInContigIndex is specified in the config file. This is the weighting for each subgenome. Need to discuss how best to deal with these.
- -co <configoutput file generated by the MWMPython program>
- -sg <subGenomesInGeneOrder file>
- -gf <genomeInString file>
- -o <output directory name>
Note: Output is a directory of tab delimited files with each line containing: vertex vertex weight . These data will be used by the MEMPython program Note: Multiple output files, one for each ancestral chromosome bin
MWMPython
Reuse of the same program listed above for ancestral ordering of contigs
ScaffoldOutput
Putting all the output data back together for the final ancestral genome
- -i <directory of input files>
- -o <output file of reconstructed genome>