Metadata

From CoGepedia
Jump to: navigation, search

Metadata is data about data. E.g. the name of a genome is metadata about that genome.

Metadata on User Profile page

The Metadata pane on the User Profile page shows statistics about metadata for experiments, genomes and notebooks, and allows uploading of new metadata. To upload a file containing metadata and attach it to experiments, genomes or notebooks, click the appropriate "Upload" button and then select the file you want to upload.

File Format

The file you upload must be a tab delimited text file. The file can be thought of as a table with rows (lines in the file) and columns (text on each line delimited by tab characters).

  • The first line is a header row and must contain the metadata keys for each of the columns
  • The remaining lines contain the metadata values to attach
  • The first column of each row must contain one or more comma separated ids that correspond to the experiments/genomes/notebooks you are annotating
  • The first column of the first row, corresponding to the ids column, is ignored but must be present

Example

Experiment ID Individual Sex Gestation Treatment Postnatal Treatment Sow Sire Kill Day Adipose Index Liver Index Longissimus dorsi Index
1366 17 F TNTN HS 112 1 2 5 5 5
1367,1368 18 M TNTN HS 112 1 4 3 2 5

Batch Loading

File Format

The LoadBatch tool enables users to load batches of genome and experiment data sets. A metadata file, e.g. metadata.txt or any other name ending in .txt, is required along with any number of FASTA, GFF (note: still under development), and experiment data files.

The metadata file is a tab-separated file ending in .txt that contains a header line followed by a metadata line for each genome or experiment. Genomes and experiments cannot be mixed in the same file. There are some required columns and any number of free-form optional columns.

Genome Metadata: Required Columns

  1. Filename: the name of the file containing the genome's data. Supported file types: .fasta. See LoadGenome for details.
  2. Name: the name of the genome
  3. Organism: the organism ID for the genome

Experiment Metadata: Required Columns

  1. Filename: the name of the file containing the experiment's data. Supported file types: .csv, .bam, .bed, .gff, .vcf. See LoadExperiment for details.
  2. Name: the name of the experiment

Genome/Experiment Metadata: Optional Columns

  • Description: a description of the experiment
  • Source: the source of the data file (i.e., JGI)
  • Version: the version number
  • Restricted: restrict the data from public access ("yes" or "no", default is no)
  • Add your own unique column names

Note: columns can be given in any order. Adding "_link" to the end of the field name denotes a link for another column (i.e., columns "citation" and "citation_link").

Examples

Example 1: single experiment metadata line with optional columns
Screen Shot 2014-01-23 at 2.53.15 PM.png

Looks like this: http://genomevolution.org/CoGe/ExperimentView.pl?eid=193


Example 2: three genome metadata lines with optional columns
Screen Shot 2014-09-22 at 10.49.58 AM.png