How to load genomes into CoGe: Difference between revisions

Latest revision as of 15:38, 28 May 2020

You can load your own genome sequence, annotation, and quantitative data for use with CoGe's tools. These data may be kept private and shared with collaborators, or made fully public.

Register for a CyVerse user account if you don't have one: https://user.cyverse.org
1. Add CoGe as a CyVerse Service:
Log into CoGe: https://genomevolution.org
Go to your User Profile Page by clicking My Data in the menu bar.
Click New -> New Genome
Follow this link for information on how to use LoadGenome.
1. Once your genome is loaded into the system, you can add annotation and quantitative data to it using the LoadAnnotation and LoadExperiment features.
Note: Make sure your GFF file is in the correct format for CoGe: GFF ingestion
If you get an error when accessing the CyVerse Data Store, make sure that CoGe has been added as a service to your CyVerse account: https://user.cyverse.org

@@ Line 1: / Line 1: @@
-There are two general programs to run:
+You can load your own genome sequence, annotation, and quantitative data for use with CoGe's tools.  These data may be kept private and shared with collaborators, or made fully public.
-*'''fasta_genome_loader.pl'':
-**Loads in fasta sequences into CoGe
-*annotation loader:
-**usually some version of '''gff_annotation_loader.pl''' or some other program for loading text based gene models and annotations
+#Register for a CyVerse user account if you don't have one:  https://user.cyverse.org
-==UAGC Example==
+## Add CoGe as a CyVerse Service:
-The UAGC produces many genomic sequences.  This is to help them streamline their procedure for loading genomes into CoGe
+## [[File:Screen Shot 2017-08-14 at 3.41.40 PM.png|400px]]
-#Get 454AllContigs.fna
+#Log into CoGe:  https://genomevolution.org
-##This is the usual contig-level genome assembly from the 454 genome sequencing pipeline
+## [[File:Screen Shot 2020-05-28 at 8.52.36 AM.png|400px]]
-#run fasta_genome_loader.pl
+#Go to your [[User|User Profile Page]] by clicking My Data in the menu bar.
-  ~/projects/CoGeX/scripts/fasta_genome_loader.pl \
+## [[File:Screen Shot 2020-05-28 at 8.54.09 AM.png|400px]]
- -org_name "Acidovorax sp. strain JS42 substrain KSJ2" \
+#Click '''New''' -&gt; '''New Genome'''
- -org_desc "Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Acidovorax;" \
+##[[File:Screen Shot 2020-05-28 at 8.54.30 AM.png|400px]]
- -source_name "University of Arizona Genetics Core" \
+# Follow [[LoadGenome|this link]] for information on how to use [[LoadGenome]].
- -source_link "http://uagc.arl.arizona.edu/" \
+## Once your genome is loaded into the system, you can add annotation and quantitative data to it using the [[LoadAnnotation]] and [[LoadExperiment]] features.
- -ds_version .1 \
+#'''Note:''' Make sure your GFF file is in the correct format for CoGe: [[GFF ingestion]]
- -nt KSJ2_454AllContigs.fna \
+#If you get an error when accessing the CyVerse Data Store, make sure that CoGe has been added as a service to your CyVerse account:  https://user.cyverse.org
- -dsg_restricted 1
+## [[File:Screen Shot 2017-08-14 at 3.41.40 PM.png|400px]]
-===Important Notes===
-*CoGe organisms genomes by a collection of datasets (often abbreviated as '''ds''') into a dataset_group (abbreviated as '''dsg''').  The general idea is that a genome may consist of multiple files, and we want to track the provenance of each file.  If you search for a genome/organism in [[OrganismView]], you'll see that dsg is listed as "genome", but that there is an associated dsgid with each genome.
-===Option Descriptions===
-*-org_name : the name of the organism
-*-org_desc : the GenBank taxanomic description of the organism
-*-source_name : the source of the data
-*-source_desc (optional) : description of the source of the data
-*-source_link (optional) : a http:// url to the the place that generated the data (or who owns the data)
-*-ds_version : version number for the genome
-*-ds_link (optional) : a http:// url to link to the place where the data file was downloaded
-*-nt : path to the nucleotide
-*-dsg_restricted (optional) : make this genome private
-====Additional Options====
-*-org_id : if the organism is already entered into CoGe, you can use its internal CoGe ID (available by searching for the organism in [[OrganismView]]).  This will automatically use the associated name and description.
-*-source_id : if the data source is already entered into CoGe, you can use its internal CoGe ID (available by searching for the organism in [[OrganismView]]).  This will automatically use the associated name, description, and link.
-*-dsg_name (optional) : specify a name for the genome (dsg).  If not used, will default to the name of the organism
-*-dsg_desc (optional) : specify a description for the genome (dsg).
-*-seq_type_id (optional) : specify a different type of sequence for the genome (e.g. masked).  By default, unmasked is assumed.  Available types:
- mysql> select * from genomic_sequence_type;
- +--------------------------+-----------------------------------+---------------------------------------------------------------------------------------------------------------+
- | genomic_sequence_type_id | name                              | description                                                                                                   |
-+--------------------------+-----------------------------------+---------------------------------------------------------------------------------------------------------------+
- |                        1 | unmasked                          | unmasked sequence data                                                                                        |
- |                        2 | masked repeats 50x                | repeats with more than 50x occurrence have been masked                                                        |
- |                        3 | 50X mask +syntenic thread with Os | double masked: 50x repeats and non-coding sequences.  CNS sequences with Os retained                          |
- |                        4 | masked repeats 40x                | repeats with more than 40x occurrence have been masked                                                        |
- |                        5 | super masked repeats 50x          | repeats with more than 50x occurrence have been masked.  Additional processing was needed for these sequences |
- |                        6 | te+kmer masked                    | transposons and kmer hard masked (Bao method)                                                                 |
- |                        7 | masked by JGI                     | downloaded masked                                                                                             |
- |                        8 | masked by genoscope               | NULL                                                                                                          |
- |                        9 | RepeatMasker                      | with MIPS repeat data                                                                                         |
- |                       10 | masked by Cacao Genome Database   | NULL                                                                                                          |
- |                       11 | Repeat masked by Andrea Zuccolo   | NULL                                                                                                          |
- |                       12 | masked by GMGC-nt                 | NULL                                                                                                          |
- |                       13 | masked by GMGC                    | NULL                                                                                                          |
- +--------------------------+-----------------------------------+---------------------------------------------------------------------------------------------------------------+
-*-seq_type_name (optional) : specify

How to load genomes into CoGe: Difference between revisions

Latest revision as of 15:38, 28 May 2020

Navigation menu

Search