Difference between revisions of "Load Genome Script"

From CoGepedia
Jump to: navigation, search
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
The load genome script, load_genome.pl, allows genomes to be created via the backend.
+
The load genome script, scripts/load_genome.pl, allows genomes to be created from FASTA files via the backend.
  
Usage:
+
Some data are required to exist in the database prior to running this script:
 +
* an organism to specify in the organism_id parameter
 +
* a user to specify in the user_id parameter
 +
* a genomic_sequence_type to specify in the type_id parameter
 +
 
 +
'''Usage:'''
 
<pre>
 
<pre>
perl load_genome.pl -name <string> -desc <string> ...
+
perl load_genome.pl -name <string> -desc <string> -fasta_files <file1>,<file2>,...<fileN> ...
 
</pre>
 
</pre>
  
Required parameters:
+
'''Required parameters:'''
* fasta_files     comma-separated list of FASTA files
+
* fasta_files
* staging_dir  temporary staging directory for processing files, use "."
+
** comma-separated list of FASTA files
* install_dir      permanent installation directory for genome files with DATADIR in configuration file
+
* staging_dir   
* user_id        user ID
+
** temporary staging directory for processing files, use "."
* config           configuration file
+
* install_dir       
 +
** permanent installation directory for genome files
 +
** should match SEQDIR in the configuration file
 +
** example:  /opt/apache2/coge/data/genomic_sequence/
 +
* user_id         
 +
** ID for user to associate the genome
 +
* organism_id     
 +
** Organism ID
 +
* source_name   
 +
** Name of data source, e.g. the lab that generated the sequence data
 +
* config
 +
** CoGe configuration file (web/coge.conf)
  
Optional parameters:
+
'''Optional parameters:'''
* name               String name of the genome
+
* name              
* desc                String description of the genome
+
** String name of the genome
* link                  URL to the data source or publication
+
* desc                 
* version            Version of the genome data
+
** String description of the genome
* type_id            Sequence type ID, defaults to 1 for "unmasked"
+
* link                   
* restricted          Flag to make genome private (1) or public (0)
+
** URL to the data source or publication
* organism_id      Organism ID
+
* version             
* source_name    Name of data source, e.g. the lab that generated the sequence data
+
** Version of the genome data
* source_desc    Description of the data source
+
* type_id             
 +
** Sequence type ID, defaults to 1 for "unmasked"
 +
* source_desc   
 +
** Description of the data source
 +
* restricted           
 +
** Flag to make genome private (1) or public (0), defaults to public

Latest revision as of 10:57, 16 February 2015

The load genome script, scripts/load_genome.pl, allows genomes to be created from FASTA files via the backend.

Some data are required to exist in the database prior to running this script:

  • an organism to specify in the organism_id parameter
  • a user to specify in the user_id parameter
  • a genomic_sequence_type to specify in the type_id parameter

Usage:

perl load_genome.pl -name <string> -desc <string> -fasta_files <file1>,<file2>,...<fileN> ...

Required parameters:

  • fasta_files
    • comma-separated list of FASTA files
  • staging_dir
    • temporary staging directory for processing files, use "."
  • install_dir
    • permanent installation directory for genome files
    • should match SEQDIR in the configuration file
    • example: /opt/apache2/coge/data/genomic_sequence/
  • user_id
    • ID for user to associate the genome
  • organism_id
    • Organism ID
  • source_name
    • Name of data source, e.g. the lab that generated the sequence data
  • config
    • CoGe configuration file (web/coge.conf)

Optional parameters:

  • name
    • String name of the genome
  • desc
    • String description of the genome
  • link
    • URL to the data source or publication
  • version
    • Version of the genome data
  • type_id
    • Sequence type ID, defaults to 1 for "unmasked"
  • source_desc
    • Description of the data source
  • restricted
    • Flag to make genome private (1) or public (0), defaults to public