Linking to GEvo

From CoGepedia
Revision as of 13:02, 24 August 2009 by Elyons (Talk | contribs)

Jump to: navigation, search

It's very easy to link into GEvo, which lets you build URLs that specify sequences to be pulled from CoGe's database or from NCBI using an accession.

Quick Example

PlantGDB has done this for maize bacs and an example link deconstructed is:

http://synteny.cnr.berkeley.edu/CoGe/GEvo.pl?accn1=Os02g51290;dr1up=50000;dr1down=50000;accn2=Sb04g027950;dr2up=50000;dr2down=50000;gbaccn3=AC190647;num_seqs=3;

(X) refers to the sequence number where the first sequence is 1, the second is 2, the third is 3, etc.

  • accn(X): retrieve sequence from CoGe's database using an accession.
  • dr(X)up: the amount of sequence to the left of the accession (in base pairs)
  • dr(X)down: the amount of sequence to the right of the accession (in base pairs)
  • gbaccn(X): the genbank accession to retrieve a sequence from NCBI.
  • num_seqs=(Y): the number of sequences being submitted

So, the above links specifies three sequences:

  • Seq1: database retrieval on name Os02g51290 with 50,000bp to the left and right
  • Seq2: database retrieval on name Sb04g027950 with 50,000bp to the left and right
  • Seq3: NCBI retrieval for accession AC190647

All options

(X) refers to the sequence number where the first sequence is 1, the second is 2, the third is 3, etc.

CoGe Database Retrieval

  • accn(X): retrieve sequence from CoGe's database using an accession.
  • fid(X): CoGe's database id for a specific genomic feature. This is probably not useful for most people.
  • x(X): Specify a genomic coordinate/position for anchoring a sequence to be extracted
  • chr(X): specify a specific chromosome (needed when using a genomic coordinate/position to specify an anchor point for extract sequence)
  • dsid(X): CoGe's database id for a specific dataset. This or dsgid is needed when specifying a genomic position. This information can be found using OrganismView.
  • dsgid(X): CoGe's database id for a specific dataset_group. This or dsid is needed when specifying a genomic position. This information can be found using OrganismView.
  • drup(X): the amount of sequence to the left of the accession in base pairs (DEFAULT: 10000)
  • drdown(X): the amount of sequence to the right of the accession in base pairs (DEFAULT: 10000)

Retrieval from NCBI

  • gbaccn(X): Specify a GenBank accession and the entry will be automatically retrieved from NCBI
  • gbstart(X): The start position in the sequence retrieved from NCBI (DEFAULT: 1)
  • gblength(X): The length of the sequence to be extracted starting at the position specified by gbstart(X). If blank, will use the entire sequence (DEFAULT: blank)

Additional Options

  • rev(X): set a sequence to be reverse complemented in the analysis. Set to "1" to turn on. (VALUES: 1, 0)
  • ref(X): set a sequence to be used as a "reference" sequence to which all other sequences are compared. By default, this is automatically turned on for each sequence and is usually used to turn off (ref1=0) a sequence being used as a reference sequence. Set to "0" to turn off. (VALUES: 1, 0)
  • mask(X): set a genomic region to have some of it sequence masked. Valid values are:
    • cds: mask all coding sequence
    • rna: mask all RNA sequence (rRNA, tRNA, mRNA, miRNA, etc.)
    • non-cds: mask everything that is NOT coding sequence (introns and UTRs will be masked)
    • non-genic: masking everything that is NOT a gene
  • pad_gs: The amount, in base pairs, to add to both sides of all specified genomic regions (DEFAULT: 0)
  • prog: Which sequence comparison algorithm (program) to use for the analysis. Valid values are:
    • blastn: BlastN: DNA-DNA Local Alignment Algorithm. Good for finding small regions of conserved sequence.
    • blastz: BlastZ: DNA-DNA Local Alignment Algorithm. Good for finding large regions of conserved sequence.
    • CHAOS: Chaos: DNA-DNA Local Alignment Algorithm. Good for finding small regions of conserved sequence. Uses fuzzy matches so it can seed its alignment on small sequences than BlastN. However, it is slower than BlastN.
    • DiAlign_2: DiAlign2: DNA-DNA Global Alignment Algorithm. Global alignment can be seeded using local alignment algorithm. Good for alignment the entire sequence.
    • LAGAN: Lagan: DNA-DNA Glocal Alignment. Using a hybrid alignment approach.
    • tblastx: TBlastX: Translated DNA-Translated DNA Local Alignment Algorithm. Good for finding small regions of divergent, but evolutionarily conserved, genomic sequence where protein translated sequence is more conserved than DNA sequence.

Image parameters

These options change aspects of the graphical results

  • iw: width in pixels of the genomic region panel graphics (DEFAULT: 1000)
  • fh: height in pixels of features drawn on genomic region panel (DEFAULT:20)
  • padding: height in pixels of space between features drawn on separate tracks in genomic region panel (DEFAULT: 2)
  • gc: color background of genomic region panel based on GC content of sequence. GC rich regions are colored greener, AT rich regions are colored whiter (VALUES: 0, 1; DEFAULT: 0)
  • nt: color background of genomic region panel based on masked and unsequenced content of sequence. Unsequenced ("N") are colored orange, masked ("X") is colored purple (VALUES: 0, 1; DEFAULT: 1)