Ancestral Reconstruction Pipeline: Difference between revisions

From CoGepedia
Jump to navigation Jump to search
No edit summary
Line 18: Line 18:
* -wa <threshold minimum adjacency score for keeping a contig.  Called 'weightOfAdjacent' in original config file>
* -wa <threshold minimum adjacency score for keeping a contig.  Called 'weightOfAdjacent' in original config file>
* -i <input file: GenomesInString from program GetGenomes>
* -i <input file: GenomesInString from program GetGenomes>
* -o <output file name>
Note:  Output is now a tab delimited file with each line containing:  vertex vertex weight .  These data will be used by the MEMPython data below
Note:  Output is now a tab delimited file with each line containing:  vertex vertex weight .  These data will be used by the MEMPython data below


Line 25: Line 26:
*** note: if directory, will batch process all files  
*** note: if directory, will batch process all files  
* -o &lt;outfile or directory&gt;  If no option is specified, the results go to STDOUT
* -o &lt;outfile or directory&gt;  If no option is specified, the results go to STDOUT
====GetContigOutputAndScaffoldInput====
====GetContigOutputAndScaffoldInput====
Note:  goal is to get everything onto the command line.  Currently, several of the files are hardcoded
Note:  goal is to get everything onto the command line.  Currently, several of the files are hardcoded
Line 31: Line 33:
Note:  genomeInContigIndex is specified in the config file.  This is the weighting for each subgenome.  Need to discuss how best to deal with these.
Note:  genomeInContigIndex is specified in the config file.  This is the weighting for each subgenome.  Need to discuss how best to deal with these.
*-co <configoutput file generated by the MWMPython program>
*-co <configoutput file generated by the MWMPython program>
*-sg <subGenomesInGeneOrder file>
*-gf <genomeInString file>
* -o <output file name>
Note:  Output is now a tab delimited file with each line containing:  vertex vertex weight .  These data will be used by the MEMPython data below

Revision as of 21:55, 25 April 2014

Plan for refactoring

GetGenomes: remove config file, add option to specify output dir for output files

  • -d<directory of input synmap files>
  • -g gid1,gid2,gid3,gid4... <list of common separated coge genome ids>
  • -p p1,p2,p3,p4 <list of comma separated ploidy levels for genomes -- note these are paired ordered data with the -g option?>
  • -s < subgenome_file>
  • -o <output directory for SubGenomeInGeneOrder OrthologSets GenomeInString files>

GetContigInput

  • Remove config file dependency
  • -g gid1,gid2,gid3,gid4... <list of common separated coge genome ids>
  • -w w1,w2,w3,w4 <list of comma separated weights for genomes -- note these are paired ordered data with the -g option?>
  • -wa <threshold minimum adjacency score for keeping a contig. Called 'weightOfAdjacent' in original config file>
  • -i <input file: GenomesInString from program GetGenomes>
  • -o <output file name>

Note: Output is now a tab delimited file with each line containing: vertex vertex weight . These data will be used by the MEMPython data below

MWMPython: http://jorisvr.nl/maximummatching.html needs command line options for

  • -i <input file or directory>
      • File type is a set of vertex vertex weight
      • note: if directory, will batch process all files
  • -o <outfile or directory> If no option is specified, the results go to STDOUT

GetContigOutputAndScaffoldInput

Note: goal is to get everything onto the command line. Currently, several of the files are hardcoded

  • -cl <threshold minimum contig length. Called 'minimumGeneGroupLength' in original config file>
  • -b <Number of bins for assignment to ancestral chromosomes. Called 'AncChrNumber' in original config file> (Note: this may be removed if this info can be derived from the input -- specifically the SubGenomesInGeneOrder file).)

Note: genomeInContigIndex is specified in the config file. This is the weighting for each subgenome. Need to discuss how best to deal with these.

  • -co <configoutput file generated by the MWMPython program>
  • -sg <subGenomesInGeneOrder file>
  • -gf <genomeInString file>
  • -o <output file name>

Note: Output is now a tab delimited file with each line containing: vertex vertex weight . These data will be used by the MEMPython data below