Difference between revisions of "Metadata"

From CoGepedia
Jump to: navigation, search
(CoGe Metadata File Format)
Line 5: Line 5:
 
The [[LoadBatch]] tool enables users to load batches of genome and experiment data sets.  A metadata file, e.g. metadata.txt or any other name ending in .txt, is required along with any number of FASTA, GFF (note: still under development), and [[LoadExperiment#Data_Formats_and_Track_Types|experiment data files]].
 
The [[LoadBatch]] tool enables users to load batches of genome and experiment data sets.  A metadata file, e.g. metadata.txt or any other name ending in .txt, is required along with any number of FASTA, GFF (note: still under development), and [[LoadExperiment#Data_Formats_and_Track_Types|experiment data files]].
  
The metadata file is a tab-separated file ending in .txt that contains a header line followed by a metadata line for each genome or experiment.  Genomes and experiments can be mixed in the same file.  There are some required columns and any number of free-form optional columns.
+
The metadata file is a tab-separated file ending in .txt that contains a header line followed by a metadata line for each genome or experiment.  Genomes and experiments cannot be mixed in the same file.  There are some required columns and any number of free-form optional columns.
  
<span style="color:red">Required Columns (in order as below):</span>
+
===Genome Metadata Required Columns===
 
# Filename: the name of the file containing the experiment's data.  Supported file types: .csv, .bam, .bed, .gff, .vcf.  See [[LoadExperiment]] for details.
 
# Filename: the name of the file containing the experiment's data.  Supported file types: .csv, .bam, .bed, .gff, .vcf.  See [[LoadExperiment]] for details.
 
# Name:  the name of the experiment
 
# Name:  the name of the experiment
 
# Organism:  the CoGe ID for the organism to assign genome (note: only required for genome FASTA/GFF files  
 
# Organism:  the CoGe ID for the organism to assign genome (note: only required for genome FASTA/GFF files  
  
Optional Columns (in any order):
+
===Experiment Metadata Required Columns===
 +
# Filename: the name of the file containing the experiment's data.  Supported file types: .csv, .bam, .bed, .gff, .vcf.  See [[LoadExperiment]] for details.
 +
# Name:  the name of the experiment
 +
 
 +
=== Genome/Experiment Metadata Optional Columns (in any order):
 
* Description:  a description of the experiment
 
* Description:  a description of the experiment
 
* Source:  the source of the data file (i.e., JGI)
 
* Source:  the source of the data file (i.e., JGI)
Line 22: Line 26:
 
<hr>
 
<hr>
  
 +
===Examples===
 
Example 1: single experiment metadata line with optional columns<br>
 
Example 1: single experiment metadata line with optional columns<br>
 
[[File:Screen_Shot_2014-01-23_at_2.53.15_PM.png]]
 
[[File:Screen_Shot_2014-01-23_at_2.53.15_PM.png]]
  
 
Looks like this:  http://genomevolution.org/CoGe/ExperimentView.pl?eid=193
 
Looks like this:  http://genomevolution.org/CoGe/ExperimentView.pl?eid=193

Revision as of 11:28, 22 September 2014

Metadata is data about data. E.g. the name of a genome is metadata about that genome.

CoGe Metadata File Format

The LoadBatch tool enables users to load batches of genome and experiment data sets. A metadata file, e.g. metadata.txt or any other name ending in .txt, is required along with any number of FASTA, GFF (note: still under development), and experiment data files.

The metadata file is a tab-separated file ending in .txt that contains a header line followed by a metadata line for each genome or experiment. Genomes and experiments cannot be mixed in the same file. There are some required columns and any number of free-form optional columns.

Genome Metadata Required Columns

  1. Filename: the name of the file containing the experiment's data. Supported file types: .csv, .bam, .bed, .gff, .vcf. See LoadExperiment for details.
  2. Name: the name of the experiment
  3. Organism: the CoGe ID for the organism to assign genome (note: only required for genome FASTA/GFF files

Experiment Metadata Required Columns

  1. Filename: the name of the file containing the experiment's data. Supported file types: .csv, .bam, .bed, .gff, .vcf. See LoadExperiment for details.
  2. Name: the name of the experiment

=== Genome/Experiment Metadata Optional Columns (in any order):

  • Description: a description of the experiment
  • Source: the source of the data file (i.e., JGI)
  • Version: the version number
  • Restricted: restrict the data from public access ("yes" or "no", default is no)
  • Add your own unique column names

Note: adding "_link" to the end of the field name denotes a link for another column (i.e., columns "citation" and "citation_link")


Examples

Example 1: single experiment metadata line with optional columns
Screen Shot 2014-01-23 at 2.53.15 PM.png

Looks like this: http://genomevolution.org/CoGe/ExperimentView.pl?eid=193