FAQs

From CoGepedia
Revision as of 12:50, 9 June 2016 by Mbomhoff (Talk | contribs) (Lines of Code Breakdown)

Jump to: navigation, search

Commonly asked questions about CoGe.

What is CoGe?

CoGe is a online system for making the retrieval and comparison of genomic information and sequence data quick and easy.

Why call it CoGe?

CoGe (pronounced /kō:jē/ ) stands for Comparative Genomics.

Why make another comparative genomics system?

We found that existing comparative genomic systems were limited in their ability to accommodate genomic information and making it easily accessible for comparative analyses. We designed CoGe from the ground up to solve four major limitations:

  1. Store multiple versions of multiple genomes from multiple organism in a single platform
  2. Quickly find sequences of interest in genomes of interest (with associated information)
  3. Comparing multiple genomic regions using any algorithms
  4. Visualize the results of analyses in such a way as to make the identification of "interesting" patterns quick and easy.

All told, we wanted a comparative genomics system that would allow us to test our ideas and hypotheses as quick as possible so we could spend more time thinking about genomes and their evolution instead of trying to get and analyze genomic sequences.

Also, we realized that we wanted a system that allowed us to quickly develop new tools and add new genomic data as they become available. This means that when we load a new genome into CoGe, all the tools of CoGe are immediately available to analyze it. Likewise, if we develop a new tool to solve one particular problem with one set of genomes, it is immediately available to all the genomes in CoGe.

How is CoGe designed and put together?

CoGe's core design principle is to keep things easy and efficient. This extends from the underling computational infrastructure to the web-based tools. Primarily, the web-based tools are what drives analyses, although large-scale comparative genomics analytics are best done using programming and access to CoGe's API. Follow this link for an overview of CoGe's system design.

What is needed to run CoGe?

Not much. Just a web browser a connection to the internet:

What is CoGe's sequence analysis workflow or pipeline?

While we designed CoGe to make it easy to find and comparing genomic sequences, there is no single, linear workflow through the system. Instead, CoGe's tools create an Open-ended analysis network. There are central tools and access points that allow you to access the system to find sequences of interest, and "hub" points to take you from one part of the system to another. This allows for ideas to be generated while working CoGe, and be able to quickly branch out to investigate any number of interesting phenomena you find. An analysis ends when you have your answer.

For example, you start with your favorite genome (mouse), do a whole genome comparison of it to human using SynMap, identify a region with an Inversion, compare the breakpoints of that region in high-detail using GEvo, extract out the human sequence using SeqView, find all the protein coding regions using FeatView, use them to find homologs in other vertebrate genomes (e.g. chimp, mouse, and platypus) using CoGeBlast, validate putative syntenic regions using GEvo, find a particular gene extra interesting because of its copy-number variation in this syntenic region and get its sequence using FeatView once again, find putative intra- and inter-specific homologs of it using CoGeBlast, generate a fasta file of those putative homologs using FastaView, which you can align using CoGeAlign, and then use to build a phylogenetic tree using TreeView or export to more expansive phylogenetic tools-sets such as CIPRES. While waiting for your trees to be reconstructed, you decided to check out the codon and protein usage variation of the genes using FeatList, notice that there is some interesting variation in a couple of genes, check their over all GC content and wobble-position GC content FeatView, wonder if these have been horizontally transferred from the mitochondria, send those sequences to CoGeBlast to search mitochondrial genomes, find putative a homolog in several of those genomes, and then compare mitochondrial genomes to determine if there are inversions near those homologs using GEvo ...

In other words, there is no predefined end-point to an analysis.

What web browser do you recommend and how should it be configured?

We only test CoGe using Firefox or Chrome. For CoGe to function properly, make sure to allow pop-ups, javascript, and install Adobe's Flash Player.

What programming languages/software are used in CoGe?

Different parts of CoGe use different languages, based on what works best and what language(s) a programmer knows. In no particular order:

  • Operating System: Ubuntu
  • Web Server: Apache
  • Database: MySQL
  • Web interface: HTML, Perl, Python, Javascript, jQuery, Flash
  • Web services: Perl and Mojolicious
  • Algorithms and other stuff: Python, C/C++, Java. Note that many programs used by CoGe are written by other programmers (e.g. Blast, Lagan, DiAlign, DAGChainer, codeml) and use a variety of languages not listed here.

Lines of Code

cloc /opt/apache2/coge --exclude-dir=jbrowse,bin,tmp,data,old,SSWAP,jstree,vendor   --force-lang=html,tmpl
   1161 text files.
    744 unique files.
    652 files ignored.
github.com/AlDanial/cloc v 1.69  T=5.37 s (104.6 files/s, 35295.7 lines/s)
------------------------------------------------------------------------------- 
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Perl                           341          14143          12702          78647
JavaScript                      55           5080           4528          27724 
HTML                            66           1406            396          17596
make                            13           2206           1297           6318
Python                          17           1380           1368           4490
CSS                             14            505            372           3556
Haxe                            10            310            884           1769
SQL                              2             84            128            617
XML                              1              0              0            602
JSON                            14              0              0            522
Bourne Shell                     8             77             70            278
RobotFramework                   8             37             54            273
YAML                            12              0              0            264
Markdown                         1              8              0             18
-------------------------------------------------------------------------------
SUM:                           562          25236          21799         142674
-------------------------------------------------------------------------------

How many lines of code are in CoGe?

Around 75,000 lines of code. Not all of it that pretty. More being created. . .

Can you add a genome?

Yes, including annotation and quantitative data!

Follow these directions: How to load a private genome into CoGe?

What is CoGe's data release policy?

Genomes in CoGe are obtained, provided, and released under the policies described at the Fort Lauderdale Convention.

What is CoGe's Graphical Genomic Visualization Library?

CoGe uses its own genomic visualization library called GeLo.

Can CoGe be installed locally

First, remember that CoGe is a multi-component system designed to run on linux servers running apache, mysql, perl, python, and requires a plethora of other bioinformatics algorithms. Unfortunately, CoGe hasn't been packaged for redistribution. This is because our development team is very small, CoGe's code base has many independent components, and CoGe was developed for research goals (hence to get the job done). As the code base for CoGe matures, we will be releasing Source code. However, under certain circumstances, we will help you install a local version of the system. Please see our System support page for additional information.

How can I link to CoGe?

Since CoGe's tools are all web-based, it is easy to make URLs that link directly into CoGe's applications with specific genomic regions and data pre-loaded. For documentation on how to do that please see this section of CoGe's Documentation.

My web site does stuff that CoGe doesn't do. Can you add a link from CoGe to our site?

Most likely yes, especially if your site analyzes sequences in ways that CoGe does not. We are always looking for ways to extend CoGe's utility for scientists, and sending sequences to other web-sites is a great way to achieve that. Just send an e-mail to CoGe Support

Can I publish a link to CoGe in a paper and will it always work?

Yes. We are committed to keeping all links to CoGe working such that any analysis done with the system at any time will be completely reproducible for as long as CoGe exists.

Why is CoGe so hard to use?

We try our best to make CoGe as easy to use as possible. However, there is a learning curve. Take a look at some of our Tutorials to see if any of them help get you started. If you still find yourself lost, please feel free to Contact us and we'll do our best to help.

Help! I don't know how to do something.

If you have an additional question, please e-mail CoGe Support. If you have a problem, chances are someone else is having the same problem. Your questions help drive the documentation of CoGe, and we will usually incorporate your questions into a tutorial or some other part of CoGepedia.

What is your data policy

Long story short: your data is your data. We do our best to keep it safe and secure, but make no promises. Read more...