LoadBatch: Difference between revisions

From CoGepedia
Jump to navigation Jump to search
Created page with 'LoadExperiment enables you to load a set of experimental quantitative, polymorphism, or alignment data for a genome in CoGe. Several different file formats are supported. The dat...'
 
mNo edit summary
 
(20 intermediate revisions by the same user not shown)
Line 1: Line 1:
LoadExperiment enables you to load a set of experimental quantitative, polymorphism, or alignment data for a genome in CoGe. Several different file formats are supported. The data can then be viewed alongside annotation in [[GenomeView]].  
LoadBatch provides the ability to conveinently load a set of genomes or experiments in a single operation. To load a set of genomes or experiments using [[LoadGenome]] and [[LoadExperiment]] would require running the tool for each genome/experiment individually and is very time consuming for large data sets. [[File:LoadBatch.png|thumb|400px]]
 
[[File:LoadExperiment.png|thumb|400px]]


== Inputs  ==
== Inputs  ==


=== Metadata ===
=== Metadata File ===


*'''Name:''' Name of experiment
A single metadata file that describes the data files contained is required for the load. See the metadata section: '''[[Metadata]]'''
*'''Description:''' Description of experiment
*'''Version:''' Version of experiment
*'''Source:''' Where is the data from? This could be you, your lab, your university, a sequencing center, your collaborator.  
*'''Restricted:''' Is this experiment public or restricted to you and your collaborators
*'''Genome:''' Select the appropriate genome from CoGe
*'''Select Data File:''' Opens a window for specifying the input data file


*'''Note''':  Additional metadata about the experiment can be added as well.
=== Data File(s)===
** Example from an experiment loaded into EPIC-CoGe: http://genomevolution.org/CoGe/ExperimentView.pl?eid=193
** Information on providing a metadata file for bulk import: [[Experiment Metadata]]
=== Data File ===


You can select and retrieve data file located at:
Data files can be given individually or together as a compressed tar archive file (ending in .tar.gz, also known as a "tarball").
 
*The iPlant Data Store
*An FTP server
*Your computer (Upload)<br>


=== Data Formats and Track Types ===
'''Valid combinations of input files include:'''
* tarball of metadata file and data file(s)
* metadata file and tarball of data file(s)
* separate metadata file and data files


LoadExperiment supports several data file formats depending on the data type:  
<span style="color:red">''Note:  tarballs must not contain subdirectories.''</span>


*Quantitative data [[File:quant_track.png|thumb|200px|Quantitative track]]
'''The interface allows you to select and retrieve data files located at:'''
**Comma-separated (CSV) file format
**Tab-separated (TSV) file format
**BED file format
*Marker data [[File:marker_track.png|thumb|200px|Marker track]]
** GFF/GTF file format
*Polymorphism (SNP) data [[File:snp_track.png|thumb|200px|SNP track]]
**Variant Call Format (VCF) file format
*Alignment data [[File:alignment_track.png|thumb|200px|Alignment track]]
**BAM file format


Each of these file formats are described below in their own section. The file type can be auto-detected by LoadExperiment if the file name ends with the expected extension (.csv, .tsv, .bed, .gff, .gtf, .vcf, .bam). Files can be compressed (.zip, .gz) and still have their type auto-detected (e.g., mydata.bed.gz). For non-standard file name extensions, you can select the file type from a list.
*The iPlant Data Store
 
*An FTP server
==== CSV File Format  ====
*Your computer (Upload)
 
This is a comma-delimited file that contains the following columns
 
*Chromosome (string)
*Start position (integer)  
*Stop position (integer)
*Chromosome Strand (1 or -1)
*Measurement Value must be between [1-0] (real number; inclusive)
*Second Value (OPTIONAL): can store a second value such as an expect value (real number)
 
#CHR,START,STOP,STRAND,VALUE1(0-1),VALUE2(ANY-ANY)
Chr1,11486,12316,1,0.181430277220112,7.3980806218146
Chr1,27309,28272,1,0.944373742485446,5.08225285439412
Chr1,32484,32978,1,0.328500324191726,1.97719838086201
Chr1,41942,42508,-1,0.825027233105203,6.56057592312617
Chr1,56394,57527,-1,0.183234367788511,0.795527328556531
Chr1,67705,68809,-1,0.956523086778851,5.20992343466606
Chr1,71144,72409,1,0.42955128220331,1.80604269639474
Chr1,81671,82833,1,0.626003507696723,2.77834108023821
Chr1,86467,87623,-1,0.0878653961575928,7.42843749315945
 
==== TSV File Format  ====
 
Same as CSV format but with tab delimiters instead of commas.
 
==== BED File Format  ====
 
Standard BED format as defined here: http://genome.ucsc.edu/FAQ/FAQformat.html#format1
 
Only the first six columns are used, with the "name" field ignored.
 
==== GFF File Format ====
 
Standard GFF3 format as defined here:  http://gmod.org/wiki/GFF3
 
Only the seqid, start, end, score, strand, and attribute columns are used (column numbers 1, 4, 5, 6, 7, 9 respectively).
 
==== VCF File Format  ====
 
Standard VCF 4.1 format as defined here: http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
 
==== BAM File Format  ====
 
Standard BAM format.


====FASTQ Data====
=== Data Formats ===


[[EPIC-CoGe]] now supports fastq data generated by RNASeq.  When loaded, EPIC-CoGe will run and the [[Expression Analysis Pipeline]] developed by James Schnable for his [http://qteller.com qTeller] project.
For supported '''genome''' data file formats, see '''[[LoadGenome]]'''.


==Bulk Loading==
For supported '''experiment''' data file formats, see '''[[LoadExperiment]]'''.
Please contact the [mailto:coge.genome@gmail.com CoGe Team] if you have large numbers of experiments you wish to load and we can help you with the bulk loading.

Latest revision as of 18:58, 30 March 2015

LoadBatch provides the ability to conveinently load a set of genomes or experiments in a single operation. To load a set of genomes or experiments using LoadGenome and LoadExperiment would require running the tool for each genome/experiment individually and is very time consuming for large data sets.

Inputs

Metadata File

A single metadata file that describes the data files contained is required for the load. See the metadata section: Metadata

Data File(s)

Data files can be given individually or together as a compressed tar archive file (ending in .tar.gz, also known as a "tarball").

Valid combinations of input files include:

  • tarball of metadata file and data file(s)
  • metadata file and tarball of data file(s)
  • separate metadata file and data files

Note: tarballs must not contain subdirectories.

The interface allows you to select and retrieve data files located at:

  • The iPlant Data Store
  • An FTP server
  • Your computer (Upload)

Data Formats

For supported genome data file formats, see LoadGenome.

For supported experiment data file formats, see LoadExperiment.