Conserved Non-Coding Sequence

From CoGepedia
Revision as of 15:45, 18 January 2012 by Diane (Talk | contribs)

Jump to: navigation, search

Conserved noncoding sequences (CNS) are regions of the genome which do not code for proteins, yet show significantly slower rates of sequence change than truly nonfunctional sequences. Current evidence suggests that many or most conserved noncoding sequences are involved in regulating the expression of neighboring genes.[1]

CNS in plants tend to be much smaller than those found in animals.

Peach-Chocolate-example.png

An example of some five prime conserved noncoding sequences identified by comparing syntenic orthologs in the genomes of peach and chocolate using the CNS Discovery Pipeline. To regenerate this analysis click here . Another CNS example in the grasses.

Examples of CNS with experimentally defined function

Vgt1

Vgt1 (Vegetative to generative transition 1) was a quantitative trait locus identified based on studied of changes in flowering time within maize mapping populations. The quantitative trait was fine mapped to one a pair of conserved noncoding sequences which regulate a AP2-like gene called ZmRap2.7. In the late flowering allele of this QTL the CNS has disrupted by the insertion of a MITE transposon, resulting in lower expression of ZmRap2.7. This regulatory region was located 70 KB upstream of the gene it regulates (ZmRap2.7).

Salvi, S. et al. (2007). Conserved noncoding genomic sequences associated with a flowering-time quantitative trait locus in maize. Proceedings of the National Academy of Sciences 104: 11376 -11381.

Knotted1

GEvo comparison of knotted1 to syntenic orthologs in rice and sorghum. GEvo link: http://genomevolution.org/r/4bv3

Knotted1 is a gene involved in the regulation of meristem identity and is the tenth most studied gene in maize. Knockouts of knotted1 tend to be lethal so much of the early work characterizing this gene was conducted on dominant mutants which show ectopic expression of knotted one in leaves resulting in very cool phenotypes as leaf cells take on a somewhat meristematic identity.[2]

These dominant mutant phenotypes are caused by transposon insertions landing within a 310 bp region of the largest intron of knotted1 [2]. This region, a negative regulated of knotted1 has been shown to contain a cluster of conserved noncoding sequences (identified by comparing knotted1 to its ortholog in rice.)[3]


Lateral Suppressor

Lateral Suppressor (LAS) is a tightly regulated gene expressed specifically at the adaxial boundary of newly initiating leaf primordia. In 2011 Bodo Raatz and co-workers reported that this very specific expression pattern was regulated by a 3' prime (downstream) enhancer/suppressor which induces expression of reporter genes within the usual expression domain of LAS and represses their expression in other contexts. The authors also showed the the equivalent sequence in tomato (an asterid) retains the same function and that this conserved noncoding sequence is present downstream of orthologs of LAS in both eudicots and grasses (monocots).

Raatz, B. et al. (2011). Specific expression of LATERAL SUPPRESSOR is controlled by an evolutionarily conserved 3′ enhancer. The Plant Journal 68: 400-412.


ATML1

Arabidopsis ATML1 and PDF2 genes encode HD-ZIP homeodomain proteins that are expressed in the epidermal cell layer and are required for epidermal specification. Abe et al (2001) identified a 6 bp L1 box sequence present in the promoters of L1-layer-specific genes, including ATML1, and showed that the L1 box can be bound in vitro by the ATML1 protein, and when mutated or deleted, L1 cell-layer specific expression is abolished. The importance of the L1 box for epidermal-specific expression was confirmed by Takada and Jurgens (2007).

From our CNS pipeline we independently identified a 45 bp CNS present in ATML1 and its homeolog PDF2 that is conserved in orthologous genes from other eudicots (grape, peach, and Columbine) as well as in rice. The L1 box is present within this CNS.


Abe, M. et al. (2001) Identification of a cis-regulatory element for L1 layer-specific gene expression, which is targeted by an L1-specific homeodomain protein. The Plant Journal 26: 487-494.

Takada, S. and G. Jürgens (2007) Transcriptional regulation of epidermal cell fate in the Arabidopsis embryo. Development 134: 1141-1150.


Squamosa promoter binding protein

Arabidopsis Squamosa Promoter-Binding Protein-like transcription factors SPL3, SPL4, and SPL5 mediate flowering time via activation of floral meristem identity genes. Expression of SPL transcription factors was shown by Gandikota et al (2007) to be post-transcriptionally regulated via translational repression by miR156. Altering the miRNA response element (MRE) in an SPL3 transgene resulted in an early flowering phenotype.

From our CNS pipeline we identified a 21 bp sequence in the 3' UTR of the SPL4 and SPL5 genes that is conserved in orthologous genes from grape, peach and Columbine. This conserved CNS coincides with the miRNA-responsive element for miRNA156/157.

Gandikota, M. et al. (2007) The miRNA 156/157 recognition element in the 3' UTR of the Arabidopsis SBP box gene SPL3 prevents early flowering by translational inhibition in seedlings. The Plant Journal 49: 683-693.


THIC

THIC (Os03g47610) is a rice gene required for thiamine biosynthesis. It has a conserved region in its 3' UTR that has been shown by Wachter et al (2007) to bind thiamine pyrophosphate (TPP), a product of the thiamine biosynthetic pathway. When TPP levels are low, the 3' UTR, including the conserved aptamer, is spliced out resulting in a short 3' UTR and high THIC expression; under high TPP levels, TPP binds to the aptamer preventing splicing from occurring, resulting in a long 3' UTR and low THIC expression.

TPP-sensing riboswitches have been found in the 5' UTR, intron, and 3' UTR of a variety of organisms, including both prokaryotic and eukaryotic organisms. Besides rice, the TPP-binding aptamer in the 3' UTR is found in the moss Physcomitrella patens, the conifer Pinus taeda, and the eudicot Arabidopsis thaliana.

Wachter, A. et al. (2007) Riboswitch control of gene expression in plants by splicing and alternative 3' end processing of mRNAs. The Plant Cell 19: 3437-3450.


Identifying CNS

Different groups have developed different criteria for what constitutes a CNS. For our own research we define a CNS as a blast hit at least as significant as a 15 base pair exact match present at a syntenic location relative to pair of homeologous or orthologous genes. For these criteria to work, the two genomes being compared should have a modal synonymous substitution rate of between .5 and .99. In more closely related genomes sequences which are not functionally constrained can still retain significant sequence similarity (carry over), while in genomes with greater divergence, base pair substitutions often render even functional CNS undetectable.

In animals, where conserved noncoding elements tend to be larger and evolve at a (relative to plants) glacial rate, different parameters may apply.

To automate the process of identifying and filtering conserved noncoding sequences, the Freeling lab has developed the CNS Discovery Pipeline.

  1. Freeling, M. and Subramaniam, S. (2009). Conserved noncoding sequences (CNSs) in higher plants. Curr. Opin. Plant Biol 12: 126-132.
  2. 2.0 2.1 Greene, B. et al. (1994). Mutator Insertions in an Intron of the Maize knotted1 Gene Result in Dominant Suppressible Mutations. Genetics 138: 1275 -1285.
  3. Inada, D.C. et al. (2003). Conserved Noncoding Sequences in the Grasses. Genome Res 13: 2030 -2041.