Load Genome Script

From CoGepedia
Jump to: navigation, search

The load genome script, scripts/load_genome.pl, allows genomes to be created from FASTA files via the backend.

Some data are required to exist in the database prior to running this script:

  • an organism to specify in the organism_id parameter
  • a user to specify in the user_id parameter
  • a genomic_sequence_type to specify in the type_id parameter

Usage:

perl load_genome.pl -name <string> -desc <string> -fasta_files <file1>,<file2>,...<fileN> ...

Required parameters:

  • fasta_files
    • comma-separated list of FASTA files
  • staging_dir
    • temporary staging directory for processing files, use "."
  • install_dir
    • permanent installation directory for genome files
    • should match SEQDIR in the configuration file
    • example: /opt/apache2/coge/data/genomic_sequence/
  • user_id
    • ID for user to associate the genome
  • organism_id
    • Organism ID
  • source_name
    • Name of data source, e.g. the lab that generated the sequence data
  • config
    • CoGe configuration file (web/coge.conf)

Optional parameters:

  • name
    • String name of the genome
  • desc
    • String description of the genome
  • link
    • URL to the data source or publication
  • version
    • Version of the genome data
  • type_id
    • Sequence type ID, defaults to 1 for "unmasked"
  • source_desc
    • Description of the data source
  • restricted
    • Flag to make genome private (1) or public (0), defaults to public