Difference between revisions of "Metadata"
(→CoGe Metadata File Format) |
(→CoGe Metadata File Format) |
||
Line 8: | Line 8: | ||
<span style="color:red">Required Columns (in order as below):</span> | <span style="color:red">Required Columns (in order as below):</span> | ||
− | + | # Filename: the name of the file containing the experiment's data | |
** supported file types: .csv, .bam, .bed, .gff, .vcf ... see [[LoadExperiment]] for more info | ** supported file types: .csv, .bam, .bed, .gff, .vcf ... see [[LoadExperiment]] for more info | ||
− | + | # Name: the name of the experiment | |
+ | # Organism: the CoGe ID for the organism to assign genome (note: only required for genome FASTA/GFF files | ||
Optional Columns (in any order): | Optional Columns (in any order): | ||
Line 17: | Line 18: | ||
* Version: the version number | * Version: the version number | ||
* Restricted: restrict the data from public access ("yes" or "no", default is no) | * Restricted: restrict the data from public access ("yes" or "no", default is no) | ||
− | * Add your own | + | * Add your own unique column names |
Note: adding "_link" to the end of the field name denotes a link for another column (i.e., columns "citation" and "citation_link") | Note: adding "_link" to the end of the field name denotes a link for another column (i.e., columns "citation" and "citation_link") |
Revision as of 11:14, 22 September 2014
Metadata is data about data. E.g. the name of a genome is metadata about that genome.
CoGe Metadata File Format
The LoadBatch tool enables users to load batches of genome and experiment data sets. A metadata file, e.g. metadata.txt or any other name ending in .txt, is required along with any number of FASTA, GFF (note: still under development), and experiment data files.
The metadata file is a tab-separated file ending in .txt that contains a header line followed by a metadata line for each genome or experiment. Genomes and experiments can be mixed in the same file. There are some required columns and any number of free-form optional columns.
Required Columns (in order as below):
- Filename: the name of the file containing the experiment's data
- supported file types: .csv, .bam, .bed, .gff, .vcf ... see LoadExperiment for more info
- Name: the name of the experiment
- Organism: the CoGe ID for the organism to assign genome (note: only required for genome FASTA/GFF files
Optional Columns (in any order):
- Description: a description of the experiment
- Source: the source of the data file (i.e., JGI)
- Version: the version number
- Restricted: restrict the data from public access ("yes" or "no", default is no)
- Add your own unique column names
Note: adding "_link" to the end of the field name denotes a link for another column (i.e., columns "citation" and "citation_link")
Looks like this: http://genomevolution.org/CoGe/ExperimentView.pl?eid=193