Difference between revisions of "Load Genome Script"

From CoGepedia
Jump to: navigation, search
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
The load genome script, load_genome.pl, allows genomes to be created via the backend.
+
The load genome script, scripts/load_genome.pl, allows genomes to be created from FASTA files via the backend.
  
Usage:
+
Some data are required to exist in the database prior to running this script:
<pre>
+
* an organism to specify in the organism_id parameter
perl load_genome.pl -name <string> -desc <string> ...
+
* a user to specify in the user_id parameter
</pre>
+
* a genomic_sequence_type to specify in the type_id parameter
  
Required parameters:
+
'''Usage:'''
 
<pre>
 
<pre>
fasta_files    comma-separated list of FASTA files
+
perl load_genome.pl -name <string> -desc <string> -fasta_files <file1>,<file2>,...<fileN> ...
staging_dir  temporary staging directory for processing files, use "."
+
install_dir      permanent installation directory for genome files with DATADIR in configuration file
+
user_id        user ID
+
config          configuration file
+
 
</pre>
 
</pre>
  
Optional parameters:
+
'''Required parameters:'''
<pre>
+
* fasta_files
name               String name of the genome
+
** comma-separated list of FASTA files
desc                String description of the genome
+
* staging_dir 
link                  URL to the data source or publication
+
** temporary staging directory for processing files, use "."
version            Version of the genome data
+
* install_dir     
type_id            Sequence type ID, defaults to 1 for "unmasked"
+
** permanent installation directory for genome files
restricted          Flag to make genome private (1) or public (0)
+
** should match SEQDIR in the configuration file
organism_id      Organism ID
+
** example:  /opt/apache2/coge/data/genomic_sequence/
source_name    Name of data source, e.g. the lab that generated the sequence data
+
* user_id       
source_desc    Description of the data source
+
** ID for user to associate the genome
</pre>
+
* organism_id     
 +
** Organism ID
 +
* source_name   
 +
** Name of data source, e.g. the lab that generated the sequence data
 +
* config
 +
** CoGe configuration file (web/coge.conf)
 +
 
 +
'''Optional parameters:'''
 +
* name              
 +
** String name of the genome
 +
* desc                 
 +
** String description of the genome
 +
* link                   
 +
** URL to the data source or publication
 +
* version             
 +
** Version of the genome data
 +
* type_id             
 +
** Sequence type ID, defaults to 1 for "unmasked"
 +
* source_desc   
 +
** Description of the data source
 +
* restricted           
 +
** Flag to make genome private (1) or public (0), defaults to public

Latest revision as of 10:57, 16 February 2015

The load genome script, scripts/load_genome.pl, allows genomes to be created from FASTA files via the backend.

Some data are required to exist in the database prior to running this script:

  • an organism to specify in the organism_id parameter
  • a user to specify in the user_id parameter
  • a genomic_sequence_type to specify in the type_id parameter

Usage:

perl load_genome.pl -name <string> -desc <string> -fasta_files <file1>,<file2>,...<fileN> ...

Required parameters:

  • fasta_files
    • comma-separated list of FASTA files
  • staging_dir
    • temporary staging directory for processing files, use "."
  • install_dir
    • permanent installation directory for genome files
    • should match SEQDIR in the configuration file
    • example: /opt/apache2/coge/data/genomic_sequence/
  • user_id
    • ID for user to associate the genome
  • organism_id
    • Organism ID
  • source_name
    • Name of data source, e.g. the lab that generated the sequence data
  • config
    • CoGe configuration file (web/coge.conf)

Optional parameters:

  • name
    • String name of the genome
  • desc
    • String description of the genome
  • link
    • URL to the data source or publication
  • version
    • Version of the genome data
  • type_id
    • Sequence type ID, defaults to 1 for "unmasked"
  • source_desc
    • Description of the data source
  • restricted
    • Flag to make genome private (1) or public (0), defaults to public