CoGepedia:Current events

From CoGepedia
Jump to navigation Jump to search

Chocolate genome added: from the International Cacao Genome Sequencing Consortium

Jan. 26th 2011

You can view this genome: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=10997


Version 2 of the Maize Genome, Now With Gene Models

Both the 50x super masked and unmasked versions of the B73_refgen2 maize genome are now updated with the new gene models released by maizesequence.org over thanksgiving break. The new genome annotation consists of 110,028 genes, many with alternative transcripts, which can be broken down as followes:

  • 29,082 transposon related genes
  • 17,615 putative pseudogenes
  • 63,276 "real" genes. Please note while these genes were annotated as "protein coding" in the current release, they include predicted microRNA genes.

Maintenance Complete

Sept. 16th 2010

CoGe's servers have successfully be moved to a new rack space. Thanks to James, Bao, and Brent for making this happen.

Pending CoGe Maintenance

Sept. 15th 2010

We have received word from the UC Data Center which houses CoGe that we need to move our servers to a new rack space. This should only take an hour or two. Our tentative schedule time for the move is:

Sept 16th 2010 at 1pm (PCT)

We apologize for any inconvenience this may cause any of CoGe's users.

SynMap update

Aug. 26th 2010

Organisms selected in SynMap have links in their taxonomic descriptions. If you click on a term in the taxonomic description, that term is automatically entered into the organism description search. All organisms with a matching taxonomic term will be displayed. This makes it faster to find organisms related to the one in which you are interested.


OrganismView update

Aug. 26th 2010

OrganismView now has more links for finding information about an organism, and to internal CoGe tools.

External searches under organism information:

  • NCBI
  • Wikipedia
  • Google

Internal CoGe links:

  • CodeOn: automatically generates a table of amino acid usage as a function of the GC content of CDS sequences.
  • SynMap: (under Genome information) automatically loads SynMap with both genomes specified to the one selected. This makes is quick to start generating whole genome comparisons and syntenic dotplots.

Home Page update

Aug. 26th 2010

CoGe's homepage menu "Latest Genomes" now has links to search for the organism name in

This makes it quicker to find information on an organism, specifically if you have no idea what it is. Helpful considering that there are nearly 9,000 organisms in CoGe.

CoGeBlast update

Aug. 26th 2010

CoGeBlast now has support for specifying blastn, tblastx, lastz, megablast, and discontinuous megablast when searching with nucleotide sequences.

10,000th genome loaded!

Aug. 4th 2010

Brassica rapa has been added to CoGe and represents the 10,000th genome loaded in CoGe. Its sequence was generated by the BGI located in China. This relative of Arabidopsis is a wonderful addition to sequenced plant genomes. Their lineage share a series of whole genome duplication events (commonly known as alpha, beta, and gamma -- the latter happening prior to the radiation of the eudicots). Since their divergence, Brassica rapa has had a triploidy while Arabidopsis has had none.

Genome update from NCBI

June 28th 2010

A new update of genomes from NCBI has finished. This includes genomes from all domains of life. CoGe now has genomic sequence from 8,872 organisms comprising 9,999 genomes. There is also a new option on the homepage to list the most recently added genomes.

SIP 2010 workshop syllabus

June 23rd 2010

The syllabus for a day-long workshop on how to use CoGe for the Society for Invertebrate Pathology's conference (SIP 2010) is now available. This workshop focuses on:

  1. Getting an overview of how CoGe is designed for allowing scientists to create their own open-ended analyses
  2. Learning what the various tools in CoGe do and they to use them
  3. Working through specific sets of example problems focused on analyzing two groups of organisms important for invertebrate pathology: baculoviruses and Bacillus thuringiensis

The workshop's syllabus is available: SIP2010

CoGe's update progress

June 18th 2010

The switch to the new server went as smoothly as I could have hoped.

Besides from new hardware (which should greatly accelerate many of CoGe's analyses and improve system stability), this installation welcomes a new version of CoGe too!

This new version of CoGe has:

  1. Update UI
  2. Various feature extensions on existing tools
  3. Updated algorithms (new blast API with support for the megablast families, LastZ)
  4. New database additions
  5. Update of core modules for database API
  6. New configuration files that will help deployment of CoGe to new sites

Please contact Eric Lyons if you find any bugs!

Today is the day

June 17th 2010

Going to through the switch today. Expect some downtime with CoGe and some support systems being temporarily off line.

New CoGe Server Update

June 10th 2010

It appears that most of the software updates and migration to the new server are working. We have deployed the new server to the UC Data Center, but due to some complications with rack-space, IP address allocation, sub-nets, firewalls, etc., things may be in flux for a while. We've had to take our development server (aka toxic) off line and put the new server on its IP address till those things get sorted out. In the meanwhile, we will plan on making the switch to production on the new server soon (hopefully next week). When this happens, expect CoGe to be offline for a couple of hours, but we will do our best to keep downtime to a minimum.

New CoGe Server is being readied!

June 2nd 2010

We have our new server for CoGe! Its deployment will not only include new performance improvements due to more computing power, but all several changes and additions to CoGe:

  1. new user interface
  2. new algorithm options
  3. new structure of the underlying code-base to make it is easier to redeploy (in anticipation of eventually getting the code-base released to those interested)

We are planning on moving the new server to the UC data center this Fri. After some more testing and bug hunting, we will switch our current production server's IP address to this machine. There is a high chance that there will be some downtime for CoGe during this switch and we will post announcements as to when this change will happen! In the meanwhile, if anyone is interested in testing new CoGe, please e-mail Eric Lyons.

SGRP: (Sanger Institute) yeast genomes added to CoGe

May 18th 2010

75 Yeast genomes from SGRP (Saccharomyces Genome Resequencing Project) have been added to CoGe. For a complete list of Organisms, please see SGRP: Sanger Institute Yeast Genomes.

CoGe post on The OpenHelix

May 5th 2010

Eric Lyons wrote a piece about CoGe for The OpenHelix Blog

Version 2 of Maize B73 genome added to CoGe

May 3rd 2010

This release does not yet have annotations (yet)!

You can view the genome in CoGe at: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=9106

This sequences was obtained from: http://www2.genome.arizona.edu/genomes/maize

And can read about differences in the assembly between versions 1 and 2: here.

Version 2 of Vitis vinifera (grapevine) genome added to CoGe

Apr. 10th 2010

You can view the genome in CoGe at: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=9048

Version 2 with 12x coverage was obtained from Genoscope.

There are some changes to the assembly with new contig orders and additional sequence added to the pseudomolecules which can been seen here.

New NCBI Genome Update. CoGe surpasses 8,900 genomes from 8,200 organisms

Apr. 9th 2010

Finished an update from NCBI. However, this is not a complete listing of all genomes available at NCBI due to some API problems getting some genomes. You can read about this problem below.

Version 3 of Medicago truncatula added to CoGe

Apr. 9th 2010

You can view the genome in CoGe at: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=8976

Syntenic comparison of version 3 to version 2 shows extensive changes in the primary sequence. Some chromosomes have had their sequence substantially updated.

Prunus persica (peach tree) added to CoGe

Apr. 9th 2010

You can view its genome in CoGe at: http://www.genomevolution.org/CoGe/OrganismView.pl?oid=30980

Its genome was produced by the International Peach Genome Initiative and its sequence was obtained from phytozome. This genome is currently unpublished and therefore under the publication restrictions of the Fort Lauderdale Convention.

Peach is a eudicot in the Rosaceae family.

Automatic NCBI Genome Loader Update

Apr. 8th 2010

The automatic NCBI genome loader is running today. It has been a while since I last ran it after running into an API problem with NCBI's eutils tools three months ago. The issue is still unresolved and even after checking in every two weeks for a status update, I have yet to receive any word as to when the bug will be fixed. For those interested, here is my bug report sent at the end of January:

Issue (http://jira.be-md.ncbi.nlm.nih.gov/browse/HD-1843): 

               Key: HD-1843
           Summary: Unable to get some genomes using eutils
              Type: Task
            Status: In Progress
          Priority: Normal
           Assignee: Matten, Wayne  	
          Reporter: Nobody

Description:

Hi,

I've be checking which genomes are available from NCBI using eutils by getting a list of all the genome project ids (genomeprj)
and then retrieving their associated genome ids.  I've found that a lot of the recently deposited genomes (usually with
accessions CPXXXXXX) are have a genomeprj id but no associated genome id.  For example, genomeprj=30031.

It is listed in this list: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genomeprj&term=all%5Bfilter%5D&retmax=999999

But has no genome id: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?db=genome&dbfrom=genomeprj&id=30031

However, it does have an entry in genbank:
http://www.ncbi.nlm.nih.gov/nuccore/CP001637.1?ordinalpos=3&itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum

I am probably missing something obvious.  Can you help me figure out how to get a list of all the genomes at NCBI?  I am using
these data in an NSF funded  and publicly available comparative genomics platform (http://synteny.cnr.berkeley.edu/), and have
programs that check for new genomes and new versions of existing genomes from NCBI on a periodic basis.  It is important for
this system to be as up to date as possible with regards to the large number of genomes that are becoming available as there
are many researchers using this tool for their work.

Thanks in advance for your help,
-Eric Lyons

If anyone has any solutions to this problem, please contact me.

Major bug fix in SynMap

Mar. 26th 2010

While testing the prior bug fix, I discovered that SynMap wasn't working on genomic sequence comparisons (as opposed to CDS sequence comparisons). This was due to the new analytical pipeline's data processing requiring unique names for each blast hit. Otherwise, multiple hits to the same sequence name would get removed as a local duplicate. As all hits to a genomic sequence were named according to the chromosome, all such hits were flagged as local duplicates and removed from the analysis.

As always, if you find a problem in CoGe, feel free to email Eric Lyons and let him know what you've found. There are now too many options and buttons to click in CoGe for me to test with each update.

Minor bug fix in SynMap

Mar. 25th 2010

With SynMap's new analytical pipeline, there are still some bugs to be worked through. Hopefully got one today in the script that converted blast input files to bed format, which is required for the program to find local duplicates in the compared genomes. These local duplicates are removed from the algorithm for finding collinear series of putative homologous genes used to infer syntenic regions. Also, these local duplicate files are displayed in the download section of the results in case they are wanted for other analyses.

Hosting local tiny URL encoding

Mar. 24th 2010

Replaced using tinyurl.com for a local installation of a URL hashing and redirecting service. Makes generating these faster and allows for customized names. Note: the tinyurls will still work.

Sequenced plant genomes

Mar. 13th 2010

James Schnable has created a page detailing all of the sequenced plant genomes including:

  • overview of their genomic content
  • publications
  • status of completion
  • interesting factoids (e.g. The average US American eats 25lbs of bananas a year.)

Read about them here: Sequenced plant genomes

The JGI's Manihot esculenta (cassava) genome has been added

Mar. 13th 2010

This genome from the JGI brings CoGe up-to-date with phytozome v5.0.

You can access cassava in CoGe here, and get more information from phytozome.

The JGI's Cucumis sativus (cucumber) genome has been added

Mar. 12th 2010

You can access it in CoGe here. Or get more information about it from phytozome. This is apparently a distinct sequence from the one in Nature Genetics last November. That sequence was from "'Chinese long' inbred line 9930" this version comes from the inbred Gy14. More details here

SynMap updated

Mar. 12th 2010

After a month of work, SynMap has undergone several significant changes, incorporating new algorithms written by Haibao Tang and Brent Pedersen:

  • new merging function for overlapping and neighboring diagonals (program: quota alignment)
  • new method for detected tandem gene duplicates
  • better reporting of all intermediate files used in the analysis, including tandem duplicates

These changes have also hoped to increase the stability of SynMap, which due to its long pipeline, has been known to crash for some genomes and/or specific parameter configurations. Please let Eric Lyons know if you have any problems with an analysis. Please send along the names of the organisms/genomes compared and a copy of the log file produced by each SynMap run (if possible).

Persistent GEvo bug fixed

Mar. 11th 2010

A long-stranding, but intermittent and annoying bug in GEvo has finally fixed. This (hopefully) solves the problem where once in a while, GEvo will return blank results to its interactive viewer, gobe. The crux of the bug, and why it was intermittent (and hence difficult to reproduce and trouble-shoot), was a race condition between asynchronous client javascript code and server perl code. Perl was responsible for generating a random session id for the analysis, but it occasionally failed to return that id to the client code before the analysis was sent back to the server for processing. When this happened, the processing analysis received a default id and multiple analyses could be merged if the default id had been used within that past 24 hours (the length of time an analysis stays on the server before being deleted). When Gobe tried to process the results, the stored data and what was specified for initialization did not match, thus causing gobe to fail and return blank results. The solution: have javascript generate the analysis session id so there is no chance of a delay before the analysis is sent to the server for processing.

However, if anyone does come across this bug again (or any others), please let me know: Eric Lyons

Rice Version 6.1 loaded

Mar. 10th 2010

You can view it in GenomeView. This was retrieved from MSU's Rice Genome Annotation Project.

The classic set of Maize Genes

Mar. 9th 2010

The classical maize gene list

James Schnable manually evaluate ~460 classic maize genes available from MaizeGDB and NCBI, determined their genomic positions in the maize genome, and found their syntenic regions within maize (from its most recent whole genome duplication event), sorghum, rice, and brachypodium. This list contains links to compare these syntenic regions using GEvo.

New plant genomes in CoGe

Feb. 10th 2010

Mimulus guttatus (monkey flower): http://synteny.cnr.berkeley.edu/CoGe/OrganismView.pl?oid=30760 Mimulus is an outgroup to the rosids (in the sister group, the asterids)

Populus trichocarpa (Poplar; cotton wood): http://synteny.cnr.berkeley.edu/CoGe/OrganismView.pl?oid=324 Version 2 of poplar!

Both are from the JGI.

MaizeGDB links to GenomeView

Feb. 8th 2010 MaizeGDB is now linking to CoGe's GenomeView so maize researchers can find maize-sorghum syntenic gene sets and quickly perform syntenic analyses using GEvo. For an example view from MaizeGDB's genome broswer:

http://gbrowse.maizegdb.org/cgi-bin/gbrowse/maize/?name=chr1:1000000..1200000

For instructions on how to perform this workflow: MaizeGDB and CoGe

For more information on maize-sorghum syntenic analyses: Maize-Sorghum genome analyses

For a quick video walk through of the new connections: MaizeGDB_and_CoGe.27s_Maize-Sorghum_Orthologies

Syntelog visualization in GenomeView

Feb. 5th 2010 GenomeView has been updated to auto-detect genomic features with annotations that are links to GEvo. These links provide an analysis of a genomic feature (e.g. gene) to previously identified syntologous sets of features. Currently, this has been implemented using syntelogs from maize and sorghum, but with the code in place, we will expand annotations for genomic features from other organisms for which we generated syntologous gene sets. For an example of this visualization in GenomeView please see: | this GenomeView of sorghum. Also, for an expanded list of glyphs used in GenomeView please refer to these examples.

Easy exporting and downloading of genomes

Jan. 16th 2010 OrganismView has new options for easily downloading the sequences of a genome in fasta format and retrieving all of its annotations in an GFF file. To access, just search for an organism and genome of interest, and look for the links under "Genome Information".

FastaView is linked to phylogeny.fr for one-click phylogenetics

Jan. 10th. 2010

We've linked to phylogeny.fr for quick and easy phylogenentic tree reconstruction. Now, you can build a list of fasta sequences and display them in FastaView, select protein or DNA sequences, edit them if necessary (e.g. add or remove sequences manually), and press a button to send them off to phylogeny.fr for:

  1. multiple sequence alignment (MUSCLE)
  2. maximum likelihood phylogenetic tree reconstruction (PhyML)
  3. tree visualization (TreeDyn)

For an example, use this link to FastaView and press the button "phylogeny.fr" at the bottom of the screen.

Special thanks to Haibao Tang for pointing out this incredible web resource!

Haibao Tang joins the Freeling lab

Jan. 4th 2010

Haibao Tang, an expert in plant comparative genomics and genome evolution, as well as a great python programmer, has joined the Freeling lab. His input and contributions will be most valued!

New tutorials added

Jan. 4th 2010

New tutorials have been added:

Linked to ProSite for protein domain searching

Dec. 24th 2009

FastaView is now linked to ProSite when viewing a protein sequence for protein domain searching. See this FastaView example and click on the link at the bottom of the page.

Improved implementation of DAGChainer in SynMap

Dec. 15th 2009

Thanks again to Brent Pedersen for some fantastic programming. He discovered that DAGChainer's C++ code's makefile did not include the -O3 optimization, rewrote the input/output methods of the compiled binary to read from STDIN instead of a file, and rewrote the perl front-end in python. Together, these changes increase CoGe's DAGChainer implementation in SynMap between 2-4 fold.

You can download his code at: svn co http://bpbio.googlecode.com/svn/trunk/scripts/dagchainer

CoGe Workshop being taught at SIP 2010

Nov. 30th 2009

Genomics: What every invertebrate pathologist needs to know. http://www.sip2010.org/index.php/Bioinformatics-Workshop.html

CoGe on OpenHelix and James and the Giant Corn

Nov. 18th 2009

Phillipe Lamesch from TAIR passed along a link to openhelix.com highlighting CoGe's tool GEvo. They put together a nice video showing GEvo. They, in turn, found this on a posting at the blog of James and the Giant Corn who had used GEvo for a grant proposal.

Maize Pseudomolecule Assembly with Gene Models Released

Oct. 20th 2009

Thanks to maizesequence.org for providing the sequence and annotations. The current pseudomolecule assembly of maize has been loaded into CoGe.

CoGe surpasses 7000 organisms in its database!

More fun for everyone!

NCBI Genome Loader Updated

CoGe's automated NCBI genome loader has been updated and is once again checking NCBI regularly for new and updated genomes. You can get a snapshot of the number or organisms and genomic sequence in CoGe by checking its homepage, search for your genome of interest using OrganismView.

CoGe is linked to TARGeT: Tree Analysis of Related Genes and Transposons

You can send a set of fasta sequence generated by FastaView directly to TARGeT.

New version of Gobe release!

Read general announcement Gobe. Major feature: transparent wedges are drawn to connect regions of sequence similarity.

Version 3 of CoGe is released!

Read general announcement CoGe version 3.