Difference between revisions of "Metadata"

From CoGepedia
Jump to: navigation, search
(Genome Metadata: Required Columns)
 
(6 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
'''Metadata''' is data about data.  E.g. the name of a genome is metadata about that genome.
 
'''Metadata''' is data about data.  E.g. the name of a genome is metadata about that genome.
  
==CoGe Metadata File Format==
+
==Metadata on User Profile page==
 +
 
 +
The Metadata pane on the User Profile page shows statistics about metadata for experiments, genomes and notebooks, and allows uploading of new metadata. To upload a file containing metadata and attach it to experiments, genomes or notebooks, click the appropriate "Upload" button and then select the file you want to upload.
 +
 
 +
===File Format===
 +
 
 +
The file you upload must be a tab delimited text file. The file can be thought of as a table with rows (lines in the file) and columns (text on each line delimited by tab characters).
 +
* The first line is a header row and must contain the metadata keys for each of the columns
 +
* The remaining lines contain the metadata values to attach
 +
* The first column of each row must contain one or more comma separated ids that correspond to the experiments/genomes/notebooks you are annotating
 +
* The first column of the first row, corresponding to the ids column, is ignored but must be present
 +
 
 +
===Example===
 +
 
 +
{| class="wikitable"
 +
|Experiment ID
 +
|Individual
 +
|Sex
 +
|Gestation Treatment
 +
|Postnatal Treatment
 +
|Sow
 +
|Sire
 +
|Kill Day
 +
|Adipose Index
 +
|Liver Index
 +
|Longissimus dorsi Index
 +
|-
 +
|1366
 +
|17
 +
|F
 +
|TNTN
 +
|HS
 +
|112
 +
|1
 +
|2
 +
|5
 +
|5
 +
|5
 +
|-
 +
|1367,1368
 +
|18
 +
|M
 +
|TNTN
 +
|HS
 +
|112
 +
|1
 +
|4
 +
|3
 +
|2
 +
|5
 +
|}
 +
 
 +
==Batch Loading==
 +
 
 +
===File Format===
  
 
The [[LoadBatch]] tool enables users to load batches of genome and experiment data sets.  A metadata file, e.g. metadata.txt or any other name ending in .txt, is required along with any number of FASTA, GFF (note: still under development), and [[LoadExperiment#Data_Formats_and_Track_Types|experiment data files]].
 
The [[LoadBatch]] tool enables users to load batches of genome and experiment data sets.  A metadata file, e.g. metadata.txt or any other name ending in .txt, is required along with any number of FASTA, GFF (note: still under development), and [[LoadExperiment#Data_Formats_and_Track_Types|experiment data files]].
Line 7: Line 61:
 
The metadata file is a tab-separated file ending in .txt that contains a header line followed by a metadata line for each genome or experiment.  Genomes and experiments cannot be mixed in the same file.  There are some required columns and any number of free-form optional columns.
 
The metadata file is a tab-separated file ending in .txt that contains a header line followed by a metadata line for each genome or experiment.  Genomes and experiments cannot be mixed in the same file.  There are some required columns and any number of free-form optional columns.
  
===Genome Metadata: <span style='color:red'>Required Columns</span>===
+
====Genome Metadata: <span style='color:red'>Required Columns</span>====
 
# Filename: the name of the file containing the genome's data.  Supported file types: .fasta.  See [[LoadGenome]] for details.
 
# Filename: the name of the file containing the genome's data.  Supported file types: .fasta.  See [[LoadGenome]] for details.
# Name:  the name of the experiment
+
# Name:  the name of the genome
# Organism:  the CoGe ID for the organism to assign genome (note: only required for genome FASTA/GFF files
+
# Organism:  the organism ID for the genome
  
===Experiment Metadata: <span style='color:red'>Required Columns</span>===
+
====Experiment Metadata: <span style='color:red'>Required Columns</span>====
 
# Filename: the name of the file containing the experiment's data.  Supported file types: .csv, .bam, .bed, .gff, .vcf.  See [[LoadExperiment]] for details.
 
# Filename: the name of the file containing the experiment's data.  Supported file types: .csv, .bam, .bed, .gff, .vcf.  See [[LoadExperiment]] for details.
 
# Name:  the name of the experiment
 
# Name:  the name of the experiment
  
===Genome/Experiment Metadata: Optional Columns===
+
====Genome/Experiment Metadata: Optional Columns====
 
* Description:  a description of the experiment
 
* Description:  a description of the experiment
 
* Source:  the source of the data file (i.e., JGI)
 
* Source:  the source of the data file (i.e., JGI)
Line 25: Line 79:
 
Note: columns can be given in any order.  Adding "_link" to the end of the field name denotes a link for another column (i.e., columns "citation" and "citation_link").
 
Note: columns can be given in any order.  Adding "_link" to the end of the field name denotes a link for another column (i.e., columns "citation" and "citation_link").
  
===Examples===
+
====Examples====
Example 1: single experiment metadata line with optional columns<br>
+
'''Example 1: single experiment metadata line with optional columns'''<br>
 
[[File:Screen_Shot_2014-01-23_at_2.53.15_PM.png]]
 
[[File:Screen_Shot_2014-01-23_at_2.53.15_PM.png]]
  
 
Looks like this:  http://genomevolution.org/CoGe/ExperimentView.pl?eid=193
 
Looks like this:  http://genomevolution.org/CoGe/ExperimentView.pl?eid=193
  
Example 2: three genome metadata lines with optional columns<br>
+
 
 +
'''Example 2: three genome metadata lines with optional columns'''<br>
 
[[File:Screen_Shot_2014-09-22_at_10.49.58_AM.png]]
 
[[File:Screen_Shot_2014-09-22_at_10.49.58_AM.png]]

Latest revision as of 14:57, 27 October 2015

Metadata is data about data. E.g. the name of a genome is metadata about that genome.

Metadata on User Profile page

The Metadata pane on the User Profile page shows statistics about metadata for experiments, genomes and notebooks, and allows uploading of new metadata. To upload a file containing metadata and attach it to experiments, genomes or notebooks, click the appropriate "Upload" button and then select the file you want to upload.

File Format

The file you upload must be a tab delimited text file. The file can be thought of as a table with rows (lines in the file) and columns (text on each line delimited by tab characters).

  • The first line is a header row and must contain the metadata keys for each of the columns
  • The remaining lines contain the metadata values to attach
  • The first column of each row must contain one or more comma separated ids that correspond to the experiments/genomes/notebooks you are annotating
  • The first column of the first row, corresponding to the ids column, is ignored but must be present

Example

Experiment ID Individual Sex Gestation Treatment Postnatal Treatment Sow Sire Kill Day Adipose Index Liver Index Longissimus dorsi Index
1366 17 F TNTN HS 112 1 2 5 5 5
1367,1368 18 M TNTN HS 112 1 4 3 2 5

Batch Loading

File Format

The LoadBatch tool enables users to load batches of genome and experiment data sets. A metadata file, e.g. metadata.txt or any other name ending in .txt, is required along with any number of FASTA, GFF (note: still under development), and experiment data files.

The metadata file is a tab-separated file ending in .txt that contains a header line followed by a metadata line for each genome or experiment. Genomes and experiments cannot be mixed in the same file. There are some required columns and any number of free-form optional columns.

Genome Metadata: Required Columns

  1. Filename: the name of the file containing the genome's data. Supported file types: .fasta. See LoadGenome for details.
  2. Name: the name of the genome
  3. Organism: the organism ID for the genome

Experiment Metadata: Required Columns

  1. Filename: the name of the file containing the experiment's data. Supported file types: .csv, .bam, .bed, .gff, .vcf. See LoadExperiment for details.
  2. Name: the name of the experiment

Genome/Experiment Metadata: Optional Columns

  • Description: a description of the experiment
  • Source: the source of the data file (i.e., JGI)
  • Version: the version number
  • Restricted: restrict the data from public access ("yes" or "no", default is no)
  • Add your own unique column names

Note: columns can be given in any order. Adding "_link" to the end of the field name denotes a link for another column (i.e., columns "citation" and "citation_link").

Examples

Example 1: single experiment metadata line with optional columns
Screen Shot 2014-01-23 at 2.53.15 PM.png

Looks like this: http://genomevolution.org/CoGe/ExperimentView.pl?eid=193


Example 2: three genome metadata lines with optional columns
Screen Shot 2014-09-22 at 10.49.58 AM.png