Linking to GEvo

From CoGepedia
Revision as of 12:37, 24 August 2009 by Elyons (Talk | contribs)

Jump to: navigation, search

It's very easy to link into GEvo, which lets you build URLs that specify sequences to be pulled from CoGe's database or from NCBI using an accession.

Quick Example

PlantGDB has done this for maize bacs and an example link deconstructed is:

http://synteny.cnr.berkeley.edu/CoGe/GEvo.pl?accn1=Os02g51290;dr1up=50000;dr1down=50000;accn2=Sb04g027950;dr2up=50000;dr2down=50000;gbaccn3=AC190647;num_seqs=3;

(X) refers to the sequence number where the first sequence is 1, the second is 2, the third is 3, etc.

  • accn(X): retrieve sequence from CoGe's database using an accession.
  • dr(X)up: the amount of sequence to the left of the accession (in base pairs)
  • dr(X)down: the amount of sequence to the right of the accession (in base pairs)
  • gbaccn(X): the genbank accession to retrieve a sequence from NCBI.
  • num_seqs=(Y): the number of sequences being submitted

So, the above links specifies three sequences:

  • Seq1: database retrieval on name Os02g51290 with 50,000bp to the left and right
  • Seq2: database retrieval on name Sb04g027950 with 50,000bp to the left and right
  • Seq3: NCBI retrieval for accession AC190647

All options

CoGe Database Retrieval

(X) refers to the sequence number where the first sequence is 1, the second is 2, the third is 3, etc.

  • accn(X): retrieve sequence from CoGe's database using an accession.
  • fid(X): CoGe's database id for a specific genomic feature. This is probably not useful for most people.
  • x(X): Specify a genomic coordinate/position for anchoring a sequence to be extracted
  • chr(X): specify a specific chromosome (needed when using a genomic coordinate/position to specify an anchor point for extract sequence)
  • dsid(X): CoGe's database id for a specific dataset. This or dsgid is needed when specifying a genomic position. This information can be found using OrganismView.
  • dsgid(X): CoGe's database id for a specific dataset_group. This or dsid is needed when specifying a genomic position. This information can be found using OrganismView.
  • dr(X)up: the amount of sequence to the left of the accession (in base pairs)
  • dr(X)down: the amount of sequence to the right of the accession (in base pairs)

=Masking sequence

You can mask various portions of a sequence sent to GEvo. To set a mask:

  • mask(X): set a genomic region to have some of it sequence masked. Valid values are:
    • cds: mask all coding sequence
    • rna: mask all RNA sequence (rRNA, tRNA, mRNA, miRNA, etc.)
    • non-cds: mask everything that is NOT coding sequence (introns and UTRs will be masked)
    • non-genic: masking everything that is NOT a gene