Apr. 25th 2012
CoGe was down/offline yesterday for two reasons:
While the CoGe team tries to keep as much uptime as possible, this type of downtime does happen once and a while. Our apologies to everyone whose work was interrupted or delayed due to this.
Mar. 28th 2012
Last (http://last.cbrc.jp/) has been added as a comparison algorithm in SynMap. Its performance is phenomenal! This is still under testing, so please let us know if you have any problems with it. Also, special thanks to Haibao Tang for writing the parallelized adapter for Last that is used by SynMap. Without this program, the integration would not have happened as quickly, easily, or smoothly.
Mar. 24th 2012
I heard that there was a secret message in the JCVI synthetic genome: Mycoplasma mycoides JCVI-syn1.0. Using CoGe, the DNA containing the secret messages was identified and decoded. Here is the walk-through of how this was done: Mycoplasma mycoides JCVI-syn1.0 Decoded.
For those interested in doing the puzzle, this article has a good summary of the challenge:
And you will probably need the original article (and the Supplementary Data):
Mar. 2nd 2012
iPlant has a forums site available: http://forums.iplantcollaborative.org
CoGe, being part of the "Powered by iPlant" program, has a section on there for users to post questions about how to do various tasks, about CoGe in general, and provide suggestions. I'll be posting questions that are emailed to me there, but this will hopefully be a good place for people to ask questions, find answers, and help one another.
Powered by iPlant Forum: https://forums.iplantcollaborative.org/viewforum.php?f=8
Feb. 11th 2012
Mike Freeling from UC Berkeley has found an interesting bug in BlastN where a relatively large blast hit (HSP) appears/disappears depending on the amount of sequence compared between Arabidopsis and Brassica. James Schnable from UC Berkeley further characterized this by identifying a comparison that differs in 1 nucleotide (over ~750) that causes this effect. You can see images of this blast error, characterization of the blast, an breakdown of parameters used here: GEvo Blastn Bug
Feb. 4th 2012
CoGe's entire system has been migrated to the new server hosted by the [iplantcollaborative.org iPlant Collaborative]. This include
Please contact us if you come across any problems!
Feb. 3rd 2012
Update on genomes available from Phytozome.
The genomes of
have both been added to iPlant CoGe. Head over and check them out. <-- But remember these genomes are protected by Fort Lauderdale for the next twelve months or until you see the genome paper.
Are we missing plant genomes you'd like to be studying? Let us know!.
Dec. 18th 2011
The data security model of CoGe has been updated. This includes creating CoGe Groups which permits the creation of user groups. These user groups may access a private set of genomes that is not accessible to other users of CoGe.
To use this, you will need to create an account with iPlant in order to be a registered CoGe user:
Dec. 4th 2011
Work is nearing completion for a new version of CoGe. While there are many minor improvements, additions, and changes to the tools, the major improvements are on the backend of the system including:
Since the holidays are coming and usage of CoGe tends to decrease, hopefully any bugs won't affect too many people while they are fixed. The migration of the domain names registered to CoGe will change once the server has been reasonably tested. Other CoGe services will migrate after that (e.g. this wiki).
CoGe domain names:
Nov. 29th, 2011
The International Initiative for Pigeonpea Genomics has released the pigeon pea genome.
The pigeon-pea genomes may be accessed in CoGe: http://genomevolution.org/CoGe/OrganismView.pl?oid=34028
Please see this link for a syntenic dotplot between pigeonpea and medicago: http://genomevolution.org/r/49ua This syntenic dotplot has the syntenic gene pairs' evolution distance colored to differentiate orthologous and out-paralogous syntenic regions.
Nov. 28th, 2011
The NCBI genome loading program for CoGe has been updated as is currently adding thousands of genomes from NCBI. Keeping CoGe current with all of the genomes at NCBI has been a challenge as their underlying data model for storing and organizing genomes evolves. The new program crawls all of NCBI's BioProjects searching for those with genomes and associated sequence. Prior to this data load there were approximately 12,100 genomes from 10,600 organisms. Approximately 40% of NCBI's BioProjects have been crawled and the current genome stats are:
Genomic Features: 99,814,749
For those that are curious, CoGe has maintained a MySQL DB transaction rate of 2000-3000 per second (majority writes/inserts) for the past 24 hours, thanks in no small part to its SSD configuration.
Note: After more performance monitoring, peak DB transactions top 9000 per second during heavy use from the genome loading programs and website activity.
Nov. 22nd, 2011
Which direction does the DNA spin? Depending on how your mind is interpreting the dark and light colored dots of the DNA molecule as being "near" or "far", the helix can spin in both directions.
Thanks to Don McCarty for pointing this out.
Nov. 19th, 2011
[www.ensembl.org Ensembl] version 64 genomes of Lamprey, Anole, and Frog have been added to CoGe:
Petromyzon marinus (lamprey): http://genomevolution.org/CoGe/OrganismView.pl?oid=30737 Xenopus (Silurana) tropicalis (western clawed frog): http://genomevolution.org/CoGe/OrganismView.pl?oid=33964 Anolis carolinensis (green anole): http://genomevolution.org/CoGe/OrganismView.pl?oid=33828
Thanks to Bill Spollen for requesting these genomes.
Nov. 10th 2011
The CoGePedia Sequenced plant genomes page has been updated with the latest published genomes, including the just published genomes of both pot and pidgeon pea! In addition, we have added two new pages that may be of interest to those who (like me) are constantly having to pull together introduction sections and can't remember what the right citation for well known genomic information is:
Both pages are clearly works in progress so please continue to contact us if we've missed genomes, whole genome duplications, or citations which should be on the list.
Nov. 3rd 2011
7:00 (PCT USA) 14:00 (GMT)
Last night I ran a repair table on the main database for CoGe. This apparently ran into some problems and failed. I am currently hunting down the problem, and the main CoGe site is currently off-line. Technically, the tools are all available, but some of them are not working. The problem appears to be located in the "locations" table of the [CoGe database]. This table records the locations for all of CoGe's genomic features. For anyone that needs to get some work done with CoGe, they are welcome to use the development server hosted at:
This version of CoGe has been under development to federate CoGe's user authentication system with the authentication system provided by the iPlant Collaborative. As such, there has been many code changes dealing with registered users and accessing restricted/private genomes. These changes are NOT fully tested and may cause some problems. Also, the development server is using an out-of-date version of the main CoGe database (though most of the genomes should be there). If you use the development server and run into any of these problems, please feel free to send Eric Lyons an email. I'd appreciate the reporting of any bugs as well as your patience with the current situation.
In case of catastrophic failure of the main database, please know that in addition to the development server, there is a full backup of the main CoGe database. These are generated weekly.
Also, thanks to Ben Field for notifying me of the problem. I deeply appreciate the help of community members in alerting me to problems with the site as well as suggestions for making it better.
Oct. 24th 2011
A comprehensive open-access tutorial on using CoGe has been published in Maydica: http://www.maydica.org/articles/56_1763.pdf
Of all the major plant groups, the grasses, with the complete genomes of five species, are the best positioned to take advantage of comparative genomics to obtain insight into functional genetic elements. Of all the grasses, maize is the best characterized in terms of genetics, development, and evolution. We provide several examples of how the web-based comparative genomics system CoGe may be used to aid in the interpretation of the maize genome sequence. These examples include verifying gene models, identifying differences between genome as- semblies, identifying conserved non-coding sequences, identifying syntenic regions between species and poly- ploidies, and identifying homeologs within maize and orthologs between maize and other grass genomes. In addition, a comprehensive list of orthologous gene sets is provided between maize and Sorghum, foxtail millet, rice, and Brachypodium.
While the article focuses on the maize genome as its primary genome, the methods are applicable to any genome.
Sept. 29th 2011
Phil Stinard identified an error in incorrectly assigning classical maize genes as being present in B73. Thanks to Mary Schaeffer for passing along this information and James Schnable for correcting these in the Classical Maize Gene and Syntelog List.
The following genes are now assigned as being not present in the B73:
Sept. 12th, 2011
There are a couple of new options available in SynMap:
Force dotplot to be a square: You can find this option under the "Display Options" Tab with the line "Dotplot axes relations".
SVG Version of the Dotplot: There will be a new file, "SVG Version of the Syntenic Dotplot" to download in the "Links and Downloads" section of the results. This file will only appear if some form of synonymous rates are calculated and visualized (available under the "Analysis Options" tab").
Thanks to James Schnable for creating the SVG program for SynMap!
Sept. 3rd, 2011
Genome published: http://www.nature.com/nature/journal/v475/n7355/full/nature10158.html
The genome added was doubled the monoploid S. tuberosum Group Phureja clone DM1-3 516R44 (DM):
Please note: this version of the genome does not have annotations available.
Thanks to Will Spooner for the notification!
Sept. 3rd, 2011
Genome published: http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.919.html#/group-1
Sequenced by: BGI
Brassica rapa has had a hexaploidy event subsequent to the most recent tetraploidy event in the Arabidopsis lineage.
Thanks to Will Spooner for the notification!
Aug. 23rd, 2011
SynMap has the option to assembled one genome against another using syntenic. Such syntneic path assemblies may be used to create a pseudoassembly of a genome when only a contig level assembly exists. SynMap makes generating these pseudoassemblies easy to do. Such a pseudoassembly of the 175,000 Cannabis sativa genome was performed against the peach genome ( read here to learn why peach was chosen). This pseudoassembly was reloaded back into CoGe and permits using CoGe's tools to compare the Cannabis genome at multiple levels of resolution.
To see this example: Cannabis sativa cultivar Chemdawg (marijuana)
Pseudoassemblies may be quite useful as more genomes are sequences on the cheap. Such sequencing project yield low-quality draft genomes that are usually assembled into several tens of thousands of contigs, and pseudoassemblies permit the rapid generation of large sequences that are easier to use in comparative genomic analyses.
Aug. 22nd, 2011
The genome of the extremophile Cannabis sativa cultivar Chemdawg (marijuana) has been added to CoGe: http://genomevolution.org/CoGe/OrganismView.pl?oid=33804
This genome was sequenced by Medicinal Genomics (located in the Netherlands). It was sequenced with one lane of the Illumina HiSeq (2x100) platform and assembled with CLCbio’s workbench. Additional information about the assembly and genome may be found: http://www.medicinalgenomics.com/the-c-sativa-genome/
You can access Cannabis in CoGe: http://genomevolution.org/CoGe/OrganismView.pl?oid=33804
Cannabis is a member of the plant order Rosales. Of sequenced genomes in that order, the peach genome is a fantastic comparator. The reason for this is due to its high-quality sequence and assembly, and its genomic evolutionary history that does not contain any whole genome duplication event subsequent to the eudicot paleohexaploidy shared by nearly all dicots (at least the eurosids and the astrids). As such, its genome structure is probably very similar to the common ancestor of order Rosales, and perhaps the eudicots as a whole. This likely ancestral state of the peach genome makes it quite suitable for generating a pseudoassembly of highly fractured, low quality genome assemblies such as this Cannabis genome. CoGe's tool SynMap has an algorithm to tile contigs along any other "reference" genome in CoGe.
The Syntenic path assembly of Cannabis to the peach genome may be viewed: http://genomevolution.org/wiki/index.php/Syntenic_path_assembly#Cannabis_sativa_.28marijuana.29_v._Prunus_persica_.28peach.29
This shows the Cannabis genome sequence contains nearly the entire gene content of Peach.
Aug. 17th, 2011
The genome of the extremophile crucifer Eutrema parvulum (Thellungiella parvula) has been added to CoGe: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12242
You can read about this genome in this Nature Genetics Letter: http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.889.html
Aug. 16th, 2011
Version 2.1 of Setaria italica has been added to CoGe. This genome was obtained from JGI/phytozome: http://phytozome.net
Unmasked version: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12240 Masked version: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12241
Thanks to Gina Turco for the request.
Aug. 11th, 2011
Version 1.1 of Fragaria vesca (woodland strawberry) has been added to CoGe http://genomevolution.org/CoGe/OrganismView.pl?dsgid=12186 .
This version contains gene models with permits more fun with syntenic dotplots: http://genomevolution.org/r/3wdb
This dotplot is strawberry versus peach. Besides from be a great summer fruit salad, this dotplot colors syntenic gene pairs based on their synonymous mutation values. From it, it is easy to see neither genome has had an independent whole genome duplication since the eudicot paleohexaploidy event.
Thanks to Aaron Liston for requesting this genome.
Aug. 3rd, 2011 You can get all your water flea genomics here: http://genomevolution.org/CoGe/OrganismView.pl?oid=33760
Thanks to Mike Freeling for the request.
July 29th, 2011
Additional bugs were squashed today due to the major code update to CoGe's internal services. Part of the update included further modularization of the web-services from backend services. A few programs the ancillary support programs for CoGe's web-services were not correctly being passed the base configuration file for a given web-deployment and were therefore crashing. This has been corrected, but please email Eric Lyons if any problems are encountered.
July 29th, 2011
GenomeList has been updated to:
Example GenomeList link: http://genomevolution.org/r/3v8n
July 27th, 2011
CoGe has undergone a major update of its web-based system today. A few bug fixes and feature enhancements mixed in, with the major one being the addition of GenomeList for creating a list of genomes, getting an overview of their genomic content, and then sending the list to other tools (e.g. CoGeBlast).
Behind the scenes was a further modularization of the web-interface from the backend support services and modules. The primary reason for this is to enable to creation of multiple CoGe installations. There has been a few requests by people for a clade/group of organisms specific installation of CoGe. With iPlant's cyberinfrastructure support, this should be possible (providing the code-base supports it).
There were some sticking points this morning migrating server specific changes from the iPlant development server to the main CoGe server, but hopefully this didn't affect too many people. However, there is a high-likelihood of additional bugs in the system that I failed to catch! Please email Eric Lyons if you find any problem.
Otherwise, we are hoping to make a full migration to iPlant's resources in the near future. iPlant's coge server is being upgraded with some additional attached storage for continual growth of the platform.
July 4th, 2011
You can find its genome in OrganismView: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=11942
This is a very rough genome (50,000+ contigs; the largest is 470KB; 13 larger than 300KB). However, the syntenic path assembly in SynMap with the option to remove any contig that doesn't have a syntenic signal makes identifying sytnenic regions a breeze (see the above link).
See this example of micro-synteny as seen in GEvo: http://genomevolution.org/r/3oxa
Thanks to: Haibao Tang, Devin O'Connor, and Jim Leebens-Mack for requesting this genome.
July 14th, 2011
The masked version of the Palm genome has been created and added to CoGe: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=11947
Thanks to Haibao Tang for providing the masking procedure.
You can find its genome here: http://genomevolution.org/CoGe/OrganismView.pl?oid=33537 (masked and unmasked sequence)
Thanks to Josquin Tibbits for recommending this genome!
June 30th, 2011
The "High Quality" sequences generated by the 1001genomes project for the resequencing of several arabidopsis strains has been added to CoGe. This includes:
Thanks to Maggie Woodhouse for this suggestion!
June 22nd, 2011
OrganismView has a minor update for where the lists of genomic features are displayed. The old version would display the summary list of genomic features below all the information panels. This would mean that each time a summary list was generated, it would replace the prior one. For example, if you retrieved the list first for the entire genome and second for a particular chromosome. Now, each information panel's genomic feature list appears to the right of the information summary. This allows the entire genome's feature list to be display simultaneously with the chromosome's feature list.
June 21st, 2011
The entire set of sequences and associated annotations for Coccidioides has been added to CoGe. These soil fungi are pathogenic and can cause coccidioidomycosis, aka valley fever, in humans. The original data may be obtained from: http://www.broadinstitute.org/annotation/genome/coccidioides_group/MultiHome.html
And accessed through OrganismView: http://genomevolution.org/CoGe/OrganismView.pl?org_desc=Coccidioides
Thanks to Marc Orbach for suggesting and locating these genomes.
June 12th, 2011
The UC Berkeley Data Center power upgrade went smoothly. CoGe has booted up and is back online.
June 3rd, 2011
CoGe will be down on June 12th due to maintenance on the power infrastructure at the UC Berkeley Data Center. We will do our best to bring CoGe back up as soon as possible.
Here is their announcement:
Description: The [UC Berkeley] campus data center has been a valuable resource for campus computing for the past seven years. Demand for this highly secure, highly available, and network-redundant facility continues to rise. The current facility has reached its power and cooling capacity and Capital Projects has initiated a major renovation project intended to increase each of these capacities, while also integrating newer, more efficient systems to help the campus achieve its long-term energy conservation goals.
As part of this effort, the replacement of some core components of the data center’s power infrastructure is required. For safety reasons, a full power outage to the data center is scheduled for Sunday, June 12, 2011, from 7:00 am to 3:00 pm. The data center will rely entirely on outside air, rather than air conditioning, to provide cooling for the duration of this period. A minimal number of systems with broad campus impact, including CalMail, CalAgenda, and the campus home page, will be provided with temporary power during this outage. In the unlikely event that the data center air temperature exceeds a level appropriate for the safe operation of equipment, some of these systems may need to be shut down as well.
The list of widely used systems that are intended to remain available is below. This list is still being finalized, so additional systems may be added as campus needs require. This list will not include systems for which departments have made separate arrangements.
May 6th 2011
The genomes of:
Have been added to CoGe. These were sequenced by JGI.
May 6th 2011
James Schnable has updated the phylogeny of angiosperms for sequenced plant genomes.
Apr. 19th 2011
Here is the outline/syllabus of the workshop help at Berkeley hosted by the iPlant Collaborative, the Department of Plant and Microbial Biology, QB3-CGRL (Computational Genomics Resource Laboratory), ARS-Plant Gene Expression Center, and the Freeling lab: 2011 Berkeley Workshop
This outline contains links to specific analyses used in the workshop.
Mar. 31st 2011
Here is a fun example of a mitochondria genome being inserted into a plant chromosome: Horizontal transfer of mitochondria genome: Horizontal transfer of mitochondria genome
Mar. 29th 2011
For those times when scrolling to the top of the screen to find the "Run GEvo Analysis!" button is too much work, a second button has been added at the bottom of the configuration box. This is quite useful when comparing >6 genomic regions.
Thanks to David Braun for this suggestion!
Mar. 29th 2011
Thanks to Damon Lisch for pointing out a bug in FeatView that was exposed by Firefox v4. This bug was also affecting Google Chrome (but not Safari). Please let Eric Lyons know of any problems you have running Firefox v4 (or other problems in general).
Mar. 11th 2011
Mar. 7th 2011
You can now select to compare protein sequences between genomes with annotated protein coding features (CDS).
Thanks to Angelique D'Hont for the suggestion.
Mar. 2nd 2011
You can find Cochliobolus heterostrophus C5 in OrganismView: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=11258
Both masked (by JGI) and unmasked version of the genome are available.
For a syntenic dotplot between C. heterostrophus to Pyrenophora tritici-repentis strain Pt-1C-BFP (the closest relative I could find in CoGe) please follow: http://genomevolution.org/r/2m0n
This is a neat syntenic dotplot showing extensive synteny and intrachromosomeal rearrangements (though these are both contig level assemblies).
Thanks to Daniel Lawrence for request.
Feb. 26th 2011
After a couple of requests, SynMap now has an option to sort chromosomes by name instead of by size. You can read how to set this option here.
for this suggestion.
Feb. 22nd 2011
If you have a CoGe installation, access to the main CoGe server, or just curious to know what is needed to load a genome into CoGe, here is a page on how to load genomes into CoGe. This is all run from the command line, and when CoGe's user permission data management system matures, this procedure will be made available via the web.
Feb. 19th 2011
You can see the genome in CoGe at: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=11227
This was one of the first big genomes sequenced using only Next Generation Sequencing Technology and assembled de novo. As a result, the assembly is rather poor compared to a fully assembled genome like the dog genome. However, through comparative genomics with SynMap, identifying syntenic regions and determining that nearly full coverage was obtained is as easy as a few mouse clicks: syntenic path assembly of the WGS panda genome to the fully sequenced dog genome. This will be quite useful as more and more large genomes are sequenced using these techniques (fast, cheap, and still very useful!)
Feb. 19th 2011
Technically, there is no reason why CoGe can't store metagenomes. Its core data model stores a collection of sequences that, thus far, has been organized into a genome, but can accommodate any collection of sequences. So the first metagenome was loaded into CoGe from NCBI:
And can be seen in CoGe: http://genomevolution.org/CoGe/OrganismView.pl?oid=32988
Feb. 18th 2011
Feb. 10th 2011
For a complete list of PAG sessions: http://www.intl-pag.org/19/19-workshops.html
Eric Lyons, iPlant Collaborative and University of Arizona, Tuscon AZ (firstname.lastname@example.org)
Eric Lyons, iPlant Collaborative and the University of Arizona, Tuscon AZ (email@example.com)
Feb. 4th 2011
Thanks to CIRAD for sharing their cacao gene models. These have been added to the Theobrama cacao genome in CoGe: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=10997 .
For an example of how these gene models may be used in whole genome comparisons, see this analysis between chocolate and peach: Chocolate-peach syntenic dotplots. It shows how the evolutionary distance between sytnenic gene pairs may be visualized to differentiate between orthologous syntenic regions derived from the divergence of these lineages, and out paralogous syntenic regions derived from their shared paleohexaploidy ancestry.
Jan. 27th 2011
Version 10 of the Arabidopsis thaliana genome has been added to CoGe: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=11022
Thanks to all the work by the folks at TAIR
For a syntenic dotplot of version 9 versus version 10 of Arabidopsis thaliana (with the evolutionary distances of syntenic gene pairs calculated) see: http://genomevolution.org/r/2hiz
Jan. 26th 2011
The genome of Theobroma cacao has been published: http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.736.html
You can view this genome in CoGe at: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=10997
To view some Syntenic dotplots of Cacao: Cacao syntenic dotplots
Of note, this genome has not had any whole genome duplication events since the paleohexaploidy event at the base of the eurosids.
Both the 50x super masked and unmasked versions of the B73_refgen2 maize genome are now updated with the new gene models released by maizesequence.org over thanksgiving break. The new genome annotation consists of 110,028 genes, many with alternative transcripts, which can be broken down as followes:
Sept. 16th 2010
CoGe's servers have successfully be moved to a new rack space. Thanks to James, Bao, and Brent for making this happen.
Sept. 15th 2010
We have received word from the UC Data Center which houses CoGe that we need to move our servers to a new rack space. This should only take an hour or two. Our tentative schedule time for the move is:
Sept 16th 2010 at 1pm (PCT)
We apologize for any inconvenience this may cause any of CoGe's users.
Aug. 26th 2010
Organisms selected in SynMap have links in their taxonomic descriptions. If you click on a term in the taxonomic description, that term is automatically entered into the organism description search. All organisms with a matching taxonomic term will be displayed. This makes it faster to find organisms related to the one in which you are interested.
Aug. 26th 2010
OrganismView now has more links for finding information about an organism, and to internal CoGe tools.
External searches under organism information:
Internal CoGe links:
Aug. 26th 2010
CoGe's homepage menu "Latest Genomes" now has links to search for the organism name in
This makes it quicker to find information on an organism, specifically if you have no idea what it is. Helpful considering that there are nearly 9,000 organisms in CoGe.
Aug. 26th 2010
CoGeBlast now has support for specifying blastn, tblastx, lastz, megablast, and discontinuous megablast when searching with nucleotide sequences.
Aug. 4th 2010
Brassica rapa has been added to CoGe and represents the 10,000th genome loaded in CoGe. Its sequence was generated by the BGI located in China. This relative of Arabidopsis is a wonderful addition to sequenced plant genomes. Their lineage share a series of whole genome duplication events (commonly known as alpha, beta, and gamma -- the latter happening prior to the radiation of the eudicots). Since their divergence, Brassica rapa has had a triploidy while Arabidopsis has had none.
June 28th 2010
A new update of genomes from NCBI has finished. This includes genomes from all domains of life. CoGe now has genomic sequence from 8,872 organisms comprising 9,999 genomes. There is also a new option on the homepage to list the most recently added genomes.
June 23rd 2010
The syllabus for a day-long workshop on how to use CoGe for the Society for Invertebrate Pathology's conference (SIP 2010) is now available. This workshop focuses on:
The workshop's syllabus is available: SIP2010
June 18th 2010
The switch to the new server went as smoothly as I could have hoped.
Besides from new hardware (which should greatly accelerate many of CoGe's analyses and improve system stability), this installation welcomes a new version of CoGe too!
This new version of CoGe has:
Please contact Eric Lyons if you find any bugs!
June 17th 2010
Going to through the switch today. Expect some downtime with CoGe and some support systems being temporarily off line.
June 10th 2010
It appears that most of the software updates and migration to the new server are working. We have deployed the new server to the UC Data Center, but due to some complications with rack-space, IP address allocation, sub-nets, firewalls, etc., things may be in flux for a while. We've had to take our development server (aka toxic) off line and put the new server on its IP address till those things get sorted out. In the meanwhile, we will plan on making the switch to production on the new server soon (hopefully next week). When this happens, expect CoGe to be offline for a couple of hours, but we will do our best to keep downtime to a minimum.
June 2nd 2010
We have our new server for CoGe! Its deployment will not only include new performance improvements due to more computing power, but all several changes and additions to CoGe:
We are planning on moving the new server to the UC data center this Fri. After some more testing and bug hunting, we will switch our current production server's IP address to this machine. There is a high chance that there will be some downtime for CoGe during this switch and we will post announcements as to when this change will happen! In the meanwhile, if anyone is interested in testing new CoGe, please e-mail Eric Lyons.
May 18th 2010
May 5th 2010
Eric Lyons wrote a piece about CoGe for The OpenHelix Blog
May 3rd 2010
This release does not yet have annotations (yet)!
You can view the genome in CoGe at: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=9106
This sequences was obtained from: http://www2.genome.arizona.edu/genomes/maize
And can read about differences in the assembly between versions 1 and 2: here.
Apr. 10th 2010
You can view the genome in CoGe at: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=9048
Version 2 with 12x coverage was obtained from Genoscope.
There are some changes to the assembly with new contig orders and additional sequence added to the pseudomolecules which can been seen here.
Apr. 9th 2010
Finished an update from NCBI. However, this is not a complete listing of all genomes available at NCBI due to some API problems getting some genomes. You can read about this problem below.
Apr. 9th 2010
You can view the genome in CoGe at: http://genomevolution.org/CoGe/OrganismView.pl?dsgid=8976
Syntenic comparison of version 3 to version 2 shows extensive changes in the primary sequence. Some chromosomes have had their sequence substantially updated.
Apr. 9th 2010
You can view its genome in CoGe at: http://www.genomevolution.org/CoGe/OrganismView.pl?oid=30980
Its genome was produced by the International Peach Genome Initiative and its sequence was obtained from phytozome. This genome is currently unpublished and therefore under the publication restrictions of the Fort Lauderdale Convention.
Peach is a eudicot in the Rosaceae family.
Apr. 8th 2010
The automatic NCBI genome loader is running today. It has been a while since I last ran it after running into an API problem with NCBI's eutils tools three months ago. The issue is still unresolved and even after checking in every two weeks for a status update, I have yet to receive any word as to when the bug will be fixed. For those interested, here is my bug report sent at the end of January:
Issue (http://jira.be-md.ncbi.nlm.nih.gov/browse/HD-1843): Key: HD-1843 Summary: Unable to get some genomes using eutils Type: Task Status: In Progress Priority: Normal Assignee: Matten, Wayne Reporter: Nobody Description: Hi, I've be checking which genomes are available from NCBI using eutils by getting a list of all the genome project ids (genomeprj) and then retrieving their associated genome ids. I've found that a lot of the recently deposited genomes (usually with accessions CPXXXXXX) are have a genomeprj id but no associated genome id. For example, genomeprj=30031. It is listed in this list: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genomeprj&term=all%5Bfilter%5D&retmax=999999 But has no genome id: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?db=genome&dbfrom=genomeprj&id=30031 However, it does have an entry in genbank: http://www.ncbi.nlm.nih.gov/nuccore/CP001637.1?ordinalpos=3&itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum I am probably missing something obvious. Can you help me figure out how to get a list of all the genomes at NCBI? I am using these data in an NSF funded and publicly available comparative genomics platform (http://synteny.cnr.berkeley.edu/), and have programs that check for new genomes and new versions of existing genomes from NCBI on a periodic basis. It is important for this system to be as up to date as possible with regards to the large number of genomes that are becoming available as there are many researchers using this tool for their work. Thanks in advance for your help, -Eric Lyons
If anyone has any solutions to this problem, please contact me.
Mar. 26th 2010
While testing the prior bug fix, I discovered that SynMap wasn't working on genomic sequence comparisons (as opposed to CDS sequence comparisons). This was due to the new analytical pipeline's data processing requiring unique names for each blast hit. Otherwise, multiple hits to the same sequence name would get removed as a local duplicate. As all hits to a genomic sequence were named according to the chromosome, all such hits were flagged as local duplicates and removed from the analysis.
As always, if you find a problem in CoGe, feel free to email Eric Lyons and let him know what you've found. There are now too many options and buttons to click in CoGe for me to test with each update.
Mar. 25th 2010
With SynMap's new analytical pipeline, there are still some bugs to be worked through. Hopefully got one today in the script that converted blast input files to bed format, which is required for the program to find local duplicates in the compared genomes. These local duplicates are removed from the algorithm for finding collinear series of putative homologous genes used to infer syntenic regions. Also, these local duplicate files are displayed in the download section of the results in case they are wanted for other analyses.
Mar. 24th 2010
Replaced using tinyurl.com for a local installation of a URL hashing and redirecting service. Makes generating these faster and allows for customized names. Note: the tinyurls will still work.
Mar. 13th 2010
James Schnable has created a page detailing all of the sequenced plant genomes including:
Read about them here: Sequenced plant genomes
Mar. 13th 2010
Mar. 12th 2010
You can access it in CoGe here. Or get more information about it from phytozome. This is apparently a distinct sequence from the one in Nature Genetics last November. That sequence was from "'Chinese long' inbred line 9930" this version comes from the inbred Gy14. More details here
Mar. 12th 2010
After a month of work, SynMap has undergone several significant changes, incorporating new algorithms written by Haibao Tang and Brent Pedersen:
These changes have also hoped to increase the stability of SynMap, which due to its long pipeline, has been known to crash for some genomes and/or specific parameter configurations. Please let Eric Lyons know if you have any problems with an analysis. Please send along the names of the organisms/genomes compared and a copy of the log file produced by each SynMap run (if possible).
Mar. 11th 2010
However, if anyone does come across this bug again (or any others), please let me know: Eric Lyons
Mar. 10th 2010
Mar. 9th 2010
James Schnable manually evaluate ~460 classic maize genes available from MaizeGDB and NCBI, determined their genomic positions in the maize genome, and found their syntenic regions within maize (from its most recent whole genome duplication event), sorghum, rice, and brachypodium. This list contains links to compare these syntenic regions using GEvo.
Feb. 10th 2010
Mimulus guttatus (monkey flower): http://synteny.cnr.berkeley.edu/CoGe/OrganismView.pl?oid=30760 Mimulus is an outgroup to the rosids (in the sister group, the asterids)
Populus trichocarpa (Poplar; cotton wood): http://synteny.cnr.berkeley.edu/CoGe/OrganismView.pl?oid=324 Version 2 of poplar!
Both are from the JGI.
Feb. 8th 2010 MaizeGDB is now linking to CoGe's GenomeView so maize researchers can find maize-sorghum syntenic gene sets and quickly perform syntenic analyses using GEvo. For an example view from MaizeGDB's genome broswer:
For instructions on how to perform this workflow: MaizeGDB and CoGe
For more information on maize-sorghum syntenic analyses: Maize-Sorghum genome analyses
For a quick video walk through of the new connections: MaizeGDB_and_CoGe.27s_Maize-Sorghum_Orthologies
Feb. 5th 2010 GenomeView has been updated to auto-detect genomic features with annotations that are links to GEvo. These links provide an analysis of a genomic feature (e.g. gene) to previously identified syntologous sets of features. Currently, this has been implemented using syntelogs from maize and sorghum, but with the code in place, we will expand annotations for genomic features from other organisms for which we generated syntologous gene sets. For an example of this visualization in GenomeView please see: | this GenomeView of sorghum. Also, for an expanded list of glyphs used in GenomeView please refer to these examples.
Jan. 16th 2010 OrganismView has new options for easily downloading the sequences of a genome in fasta format and retrieving all of its annotations in an GFF file. To access, just search for an organism and genome of interest, and look for the links under "Genome Information".
Jan. 10th. 2010
We've linked to phylogeny.fr for quick and easy phylogenentic tree reconstruction. Now, you can build a list of fasta sequences and display them in FastaView, select protein or DNA sequences, edit them if necessary (e.g. add or remove sequences manually), and press a button to send them off to phylogeny.fr for:
For an example, use this link to FastaView and press the button "phylogeny.fr" at the bottom of the screen.
Special thanks to Haibao Tang for pointing out this incredible web resource!
Jan. 4th 2010
Haibao Tang, an expert in plant comparative genomics and genome evolution, as well as a great python programmer, has joined the Freeling lab. His input and contributions will be most valued!
Jan. 4th 2010
New tutorials have been added:
Dec. 24th 2009
Dec. 15th 2009
Thanks again to Brent Pedersen for some fantastic programming. He discovered that DAGChainer's C++ code's makefile did not include the -O3 optimization, rewrote the input/output methods of the compiled binary to read from STDIN instead of a file, and rewrote the perl front-end in python. Together, these changes increase CoGe's DAGChainer implementation in SynMap between 2-4 fold.
You can download his code at: svn co http://bpbio.googlecode.com/svn/trunk/scripts/dagchainer
Nov. 30th 2009
Genomics: What every invertebrate pathologist needs to know. http://www.sip2010.org/index.php/Bioinformatics-Workshop.html
Nov. 18th 2009
Phillipe Lamesch from TAIR passed along a link to openhelix.com highlighting CoGe's tool GEvo. They put together a nice video showing GEvo. They, in turn, found this on a posting at the blog of James and the Giant Corn who had used GEvo for a grant proposal.
Oct. 20th 2009
Thanks to maizesequence.org for providing the sequence and annotations. The current pseudomolecule assembly of maize has been loaded into CoGe.
More fun for everyone!
CoGe's automated NCBI genome loader has been updated and is once again checking NCBI regularly for new and updated genomes. You can get a snapshot of the number or organisms and genomic sequence in CoGe by checking its homepage, search for your genome of interest using OrganismView.
You can send a set of fasta sequence generated by FastaView directly to TARGeT.
Read general announcement Gobe. Major feature: transparent wedges are drawn to connect regions of sequence similarity.
Read general announcement CoGe version 3.