Linking to GEvo

From CoGepedia
Revision as of 13:42, 24 August 2009 by Elyons (Talk | contribs)

Jump to: navigation, search

It's very easy to link into GEvo, which lets you build URLs that specify sequences to be pulled from CoGe's database or from NCBI using an accession.

Quick Example

PlantGDB has done this for maize bacs and an example link deconstructed is:

http://synteny.cnr.berkeley.edu/CoGe/GEvo.pl?accn1=Os02g51290;dr1up=50000;dr1down=50000;accn2=Sb04g027950;dr2up=50000;dr2down=50000;gbaccn3=AC190647;num_seqs=3;

(X) refers to the sequence number where the first sequence is 1, the second is 2, the third is 3, etc.

  • accn(X): retrieve sequence from CoGe's database using an accession.
  • dr(X)up: the amount of sequence to the left of the accession (in base pairs)
  • dr(X)down: the amount of sequence to the right of the accession (in base pairs)
  • gbaccn(X): the genbank accession to retrieve a sequence from NCBI.
  • num_seqs=(Y): the number of sequences being submitted

So, the above links specifies three sequences:

  • Seq1: database retrieval on name Os02g51290 with 50,000bp to the left and right
  • Seq2: database retrieval on name Sb04g027950 with 50,000bp to the left and right
  • Seq3: NCBI retrieval for accession AC190647

All options

(X) refers to the sequence number where the first sequence is 1, the second is 2, the third is 3, etc.

CoGe Database Retrieval

  • accn(X): retrieve sequence from CoGe's database using an accession.
  • fid(X): CoGe's database id for a specific genomic feature. This is probably not useful for most people.
  • x(X): Specify a genomic coordinate/position for anchoring a sequence to be extracted
  • chr(X): specify a specific chromosome (needed when using a genomic coordinate/position to specify an anchor point for extract sequence)
  • dsid(X): CoGe's database id for a specific dataset. This or dsgid is needed when specifying a genomic position. This information can be found using OrganismView.
  • dsgid(X): CoGe's database id for a specific dataset_group. This or dsid is needed when specifying a genomic position. This information can be found using OrganismView.
  • drup(X): the amount of sequence to the left of the accession in base pairs (DEFAULT: 10000)
  • drdown(X): the amount of sequence to the right of the accession in base pairs (DEFAULT: 10000)

Retrieval from NCBI

  • gbaccn(X): Specify a GenBank accession and the entry will be automatically retrieved from NCBI
  • gbstart(X): The start position in the sequence retrieved from NCBI (DEFAULT: 1)
  • gblength(X): The length of the sequence to be extracted starting at the position specified by gbstart(X). If blank, will use the entire sequence (DEFAULT: blank)

Additional Options

  • rev(X): set a sequence to be reverse complemented in the analysis. Set to "1" to turn on. (VALUES: 1, 0)
  • ref(X): set a sequence to be used as a "reference" sequence to which all other sequences are compared. By default, this is automatically turned on for each sequence and is usually used to turn off (ref1=0) a sequence being used as a reference sequence. Set to "0" to turn off. (VALUES: 1, 0)
  • mask(X): set a genomic region to have some of it sequence masked. Valid values are:
    • cds: mask all coding sequence
    • rna: mask all RNA sequence (rRNA, tRNA, mRNA, miRNA, etc.)
    • non-cds: mask everything that is NOT coding sequence (introns and UTRs will be masked)
    • non-genic: masking everything that is NOT a gene
  • pad_gs: The amount, in base pairs, to add to both sides of all specified genomic regions (DEFAULT: 0)
  • prog: Which sequence comparison algorithm (program) to use for the analysis. Valid values are:
    • blastn: BlastN: DNA-DNA Local Alignment Algorithm. Good for finding small regions of conserved sequence.
    • blastz: BlastZ: DNA-DNA Local Alignment Algorithm. Good for finding large regions of conserved sequence.
    • CHAOS: Chaos: DNA-DNA Local Alignment Algorithm. Good for finding small regions of conserved sequence. Uses fuzzy matches so it can seed its alignment on small sequences than BlastN. However, it is slower than BlastN.
    • DiAlign_2: DiAlign2: DNA-DNA Global Alignment Algorithm. Global alignment can be seeded using local alignment algorithm. Good for alignment the entire sequence.
    • LAGAN: Lagan: DNA-DNA Glocal Alignment. Using a hybrid alignment approach.
    • tblastx: TBlastX: Translated DNA-Translated DNA Local Alignment Algorithm. Good for finding small regions of divergent, but evolutionarily conserved, genomic sequence where protein translated sequence is more conserved than DNA sequence.

Image parameters

These options change aspects of the graphical results

  • iw: width in pixels of the genomic region panel graphics (DEFAULT: 1000)
  • fh: height in pixels of features drawn on genomic region panel (DEFAULT:20)
  • padding: height in pixels of space between features drawn on separate tracks in genomic region panel (DEFAULT: 2)
  • gc: color background of genomic region panel based on GC content of sequence. GC rich regions are colored greener, AT rich regions are colored whiter (VALUES: 0, 1; DEFAULT: 0)
  • nt: color background of genomic region panel based on masked and unsequenced content of sequence. Unsequenced ("N") are colored orange, masked ("X") is colored purple (VALUES: 0, 1; DEFAULT: 1)
  • cbc: color coding sequence based on the percent GC in the wobble position of the codon. coding regions that are GC rich in the wobble position are colored green, AT rich are colored red, and those that are ~50/50 GC/AT are colored yellow. (VALUES: 0, 1; DEFAULT: 0)
  • skip_feat_overlap: are overlapping features (e.g. genes) auto detected and drawn above one another. WARNING: This can be slow! Set to 0 to turn on (sorry for the backwards logic) (VALUES: 0, 1; DEFAULT: 1)
  • skip_hsp_overlap: are overlapping HSPs (e.g blast hits) auto detected and drawn above one another. WARNING: This can be slow! Set to 0 to turn on (sorry for the backwards logic) (VALUES: 0, 1; DEFAULT: 1)
  • hsp_overlap_limit: Don't drawn HSPs if more than (VALUE) occur at the same location. This prevents simple sequence repeats or other duplicated sequence from being drawn. Set to 0 to turn off, otherwise any number will be the limit. (DEFAULT: 0)
  • hsp_size_limit: Don't draw HSPs smaller than (VALUE) base pairs. Set to 0 to turn off (DEFAULT: 0)
  • show_cns: Draw previously annotated conserved non-coding sequences (probably identified by the Freeling lab). These will most likely be found in plant genomes. (VALUES: 0, 1; DEFAULT: 0)
  • show_gene_space: Draw previously annotated gene spaces (probably identified by the Freeling lab). This will most likely be found in plant genomes. (VALUES: 0, 1; DEFAULT: 0)
  • show_contigs: Draw contigs as red boxes. Useful when proofing genome assemblies. (VALUES: 0, 1; DEFAULT: 0)
  • feat_labels: Draw the name on genomic features. Valid values are:
    • staggered: Labels drawn alternating on features at top, middle, and bottom of feature. This helps readability when there are densely drawn features.
    • linear: Labels drawn in middle of all features.
    • 0 or blank: No features are drawn. (DEFAULT)
  • hsp_labels: Drawn the HSP number on regions of similar sequence
    • staggered: Labels drawn alternating on features at top, middle, and bottom of feature. This helps readability when there are densely drawn features.
    • linear: Labels drawn in middle of all features.
    • 0 or blank: No features are drawn. (DEFAULT)
  • draw_model: draw a subset of gene models. Valid values are:
    • full: draw everything (DEFAULT
    • gene: only genes
    • mRNA: only mRNAs
    • CDS: only coding regions
    • RNA: all RNAs (tRNA, rRNA, mRNA, siRNA, etc)

Help for parameters not listed here

If there is an option that you can select or configure in GEvo that is not listed here, please contact Eric Lyons.