Ancestral Reconstruction Pipeline

Plan for refactoring

GetGenomes: remove config file, add option to specify output dir for output files

This program gets data ready for downstream processing

-d <directory of input synmap files>
-g gid1,gid2,gid3,gid4... <list of common separated coge genome ids>
-p p1,p2,p3,p4 <list of comma separated ploidy levels for genomes -- note these are paired ordered data with the -g option?>
-s < subgenome_file>
-o <output directory for SubGenomeInGeneOrder OrthologSets GenomeInString files>

GetContigInput

This program gets data ready for ancestral ordering of genes by MWM

Remove config file dependency
-g gid1,gid2,gid3,gid4... <list of common separated coge genome ids>
-w w1,w2,w3,w4 <list of comma separated weights for genomes -- note these are paired ordered data with the -g option?>
-wa <threshold minimum adjacency score for keeping a contig. Called 'weightOfAdjacent' in original config file>
-i <input file: GenomesInString from program GetGenomes>
-o <output file name>

Note: Output is now a tab delimited file with each line containing: vertex vertex weight . These data will be used by the MEMPython data below

MWMPython: http://jorisvr.nl/maximummatching.html needs command line options for

This program is a general tool for Maximum Weight Matching. First run is for ancestral gene ordering. Second run is for ancestral contig ordering

-i <input file or directory>
- File type is a set of vertex vertex weight
- note: if directory, will batch process all files
-o <outfile or directory> If no option is specified, the results go to STDOUT

GetContigOutputAndScaffoldInput

This program maps ancestral contigs back to various genomes, gets their positions, and gets data formatted for a second MWM to generate ancestral ordered contigs Note: goal is to get everything onto the command line. Currently, several of the files are hardcoded

-mml <threshold minimum mapping length. The minimum number of "genes" mapping to a subgenome. Called 'minimumGeneGroupLength' in original config file>

Note: genomeInContigIndex is specified in the config file. This is the weighting for each subgenome. These values will be removed and the weights for genomes used instead:

-g gid1,gid2,gid3,gid4... <list of common separated coge genome ids>
-w w1,w2,w3,w4 <list of comma separated weights for genomes -- note these are paired ordered data with the -g option>
-co <configoutput file generated by the MWMPython program>
-s <subGenomesInGeneOrder file>
-gf <genomeInString file>
-o <output directory name>

Note: Multiple output files, one for each ancestral chromosome bin

Note: There are three sets of data that are needed for the subsequent steps:

Directory named "scaffolds". These tab delimited files with each line containing: vertex vertex weight. These data will be used by the MEMPython program
Directory named "binfiles". These files are used in the final program ScaffoldOutput to assign the correct contigs to the bins
File named "contig2genes.txt"

MWMPython

Reuse of the same program listed above for ancestral ordering of contigs

ScaffoldOutput

Putting all the output data back together for the final ancestral genome

-im <directory of MWM files>
-ib <directory composition of bin files>
-cg <file containing the looking up of gene families comprising contigs
-o <output file of reconstructed genome>

Ancestral Reconstruction Pipeline

Contents

Plan for refactoring

GetGenomes: remove config file, add option to specify output dir for output files

GetContigInput

MWMPython: http://jorisvr.nl/maximummatching.html needs command line options for

GetContigOutputAndScaffoldInput

MWMPython

ScaffoldOutput

Navigation menu

Ancestral Reconstruction Pipeline

Plan for refactoring

GetGenomes: remove config file, add option to specify output dir for output files

GetContigInput

MWMPython: http://jorisvr.nl/maximummatching.html needs command line options for

GetContigOutputAndScaffoldInput

MWMPython

ScaffoldOutput

Navigation menu

Search