Conserved noncoding sequences (CNS) are regions of the genome which do not code for proteins, yet show significantly slower rates of sequence change than truly nonfunctional sequences. Current evidence suggests that many or most conserved noncoding sequences are involved in regulating the expression of neighboring genes.
CNS in plants tend to be much smaller than those found in animals.
An example of some five prime conserved noncoding sequences identified by comparing syntenic orthologs in the genomes of peach and chocolate using the CNS Discovery Pipeline. To regenerate this analysis click here . Another CNS example in the grasses.
Vgt1 (Vegetative to generative transition 1) was a quantitative trait locus identified based on studied of changes in flowering time within maize mapping populations. The quantitative trait was fine mapped to one a pair of conserved noncoding sequences which regulate a AP2-like gene called ZmRap2.7. In the late flowering allele of this QTL the CNS has disrupted by the insertion of a MITE transposon, resulting in lower expression of ZmRap2.7. This regulatory region was located 70 KB upstream of the gene it regulates (ZmRap2.7).
Salvi, S. et al. (2007). Conserved noncoding genomic sequences associated with a flowering-time quantitative trait locus in maize. Proceedings of the National Academy of Sciences 104: 11376 -11381.
Knotted1 is a gene involved in the regulation of meristem identity and is the tenth most studied gene in maize. Knockouts of knotted1 tend to be lethal so much of the early work characterizing this gene was conducted on dominant mutants which show ectopic expression of knotted one in leaves resulting in very cool phenotypes as leaf cells take on a somewhat meristematic identity.
These dominant mutant phenotypes are caused by transposon insertions landing within a 310 bp region of the largest intron of knotted1 . This region, a negative regulated of knotted1 has been shown to contain a cluster of conserved noncoding sequences (identified by comparing knotted1 to its ortholog in rice.)
Lateral Suppressor (LAS) is a tightly regulated gene expressed specifically at the adaxial boundary of newly initiating leaf primordia. In 2011 Bodo Raatz and co-workers reported that this very specific expression pattern was regulated by a 3' prime (downstream) enhancer/suppressor which induces expression of reporter genes within the usual expression domain of LAS and represses their expression in other contexts. The authors also showed the the equivalent sequence in tomato (an asterid) retains the same function and that this conserved noncoding sequence is present downstream of orthologs of LAS in both eudicots and grasses (monocots).
Raatz, B. et al. (2011). Specific expression of LATERAL SUPPRESSOR is controlled by an evolutionarily conserved 3′ enhancer. The Plant Journal 68: 400-412.
Arabidopsis ATML1 and PDF2 genes encode HD-ZIP homeodomain proteins that are expressed in the epidermal cell layer and are required for epidermal specification. Abe et al (2001) identified a 6 bp L1 box sequence present in the promoters of L1-layer-specific genes, including ATML1, and showed that the L1 box can be bound in vitro by ATML1 and PDF2 proteins, and when mutated or deleted, L1 cell-layer specific expression is abolished. The importance of the L1 box for epidermal-specific expression was confirmed by Takada and Jurgens (2007).
From our CNS pipeline we independently identified a 45 bp CNS present in ATML1 and its homeolog PDF2 that is conserved in orthologous genes from other eudicots (grape, peach, and Columbine) as well as in rice. The L1 box is present within this CNS.
Abe, M. et al. (2001) Identification of a cis-regulatory element for L1 layer-specific gene expression, which is targeted by an L1-specific homeodomain protein. The Plant Journal 26: 487-494.
Abe, M. et al. (2003) Regulation of shoot epidermal cell differentiation by a pair of homeodomain proteins in Arabidopsis. Development 130: 635-643.
Takada, S. and Jürgens, G. (2007) Transcriptional regulation of epidermal cell fate in the Arabidopsis embryo. Development 134: 1141-1150.
Arabidopsis AGL15, a member of the MIKC subgroup of MADS-domain transcription factors, preferentially accumulates during embryogenesis starting as early as the octant stage of embryo development. AGL15 can induce target genes or repress target genes via recruitment of an HDAC complex. Target genes include several B3 domain embryogenic transcription factor regulators, TIR1, an F box protein mediating auxin degradation, and GA oxidase.
Negative autoregulation of AGL15 was suggested by alterations in expression of a GUS reporter in response to increases in AGL15 accumulation or to alterations increasing its strength of transcriptional activation (Zhu and Perry 2005). Three potential AGL15 binding sites were identified. One of these, CArG3, coincides with a 16 bp CNS independently identified using our CNS pipeline, and conserved in other eudicots (grape, peach, chocolate, and Columbine). AGL15 present in Arabidopsis and Brassica napus extracts binds CArG3 and site-directed mutagenesis of CArG3 significantly decreases expression of a GUS reporter without altering its spatial pattern of activity.
Zhu and Perry (2005) Control of expression and autoregulation of AGL15, a member of the MADS-box family Plant J. 41: 583-594.
Arabidopsis SHI/STY RING-like zinc finger proteins act as transcriptional activators of auxin biosynthetic genes. A 14-15 bp conserved region in the promoter of five SHI/STY genes contains a functional GCC box that is required for transcriptional activation by the AP2/ERF transcription factor DRNL (Eklund et al. 2011). Mutations in the GCC box abolish expression of a STY1pro:GUS fusion in the aerial organs of adult plants.
From our CNS pipeline we identified a 20 bp sequence in the 5' UTR or 5' proximal of three Arabidopsis SHI/STY family members (SHI, STY1, and STY2). This sequence, which includes the GCC box, is conserved in grape, peach, Columbine, and the rice gene Os06g49830.
Eklund, D.M. et al. (2011) Expression of Arabidopsis SHORT INTERNODES/STYLISH family genes in auxin biosynthesis zones of aerial organs is dependent on a GCC box-like regulatory element. Plant Physiology 157: 2069-2080.
YABBY genes are transcription factors with zinc finger and high mobility group-related domains. The YAB genes FIL and YAB3 are expressed on the abaxial side of emerging lateral primordia and may be involved in abaxial cell-type specification. Using promoter deletion and mutational analysis, Watanabe and Okada (2003) identified a 12 bp sequence present in the promoters of FIL and YAB3 genes that suppresses expression from the adaxial side.
From our CNS pipeline we identified a 67 bp CNS sequence 5' distal to FIL/YAB3 orthologs from grape, peach, chocolate, and Columbine. The 12 bp sequence identified by Watanabe and Okada is contained within this CNS.
Watanabe, K. and Okada, K. (2003) Two discrete cis elements control the abaxial side-specific expression of the FILAMENTOUS FLOWER gene in Arabidopsis. Plant Cell 15: 2592-2602.
Negative regulation of HD-ZIP III genes by miR165 and miR166 is critical to a number of fundamental plant developmental pathways, including temporal regulation of floral stem cells, establishment of xylem patterning in the root, and establishment of leaf polarity. In roots a gradient of miR165/166 levels arises from non-cell-autonomous radial movement of miR165/166 in both directions from the endodermis.
In leaves MIR165 and MIR166 are transcribed in the abaxial epidermis. Two conserved subsequences in the 5' UTR have been identified that are necessary for repressing adaxial expression (Yao et al. 2009). miR165a promoter:GUS fusions lacking this 40 bp conserved sequence lose abaxial specificity. Mutational analysis shows that the first subsequence is necessary for repressing adaxial expression.
From our CNS pipeline we identified a 24 bp sequence in the 5' UTR of MIR165a and MIR165b that is also conserved in the orthologous Columbine gene. This sequence coincides with the two subsequences identified by Yao et al.
Yao, X. et al. (2009) Two types of cis-acting elements control the abaxial epidermis-specific transcription of the MIR165a and MIR166a genes. FEBS Letters 583: 3711-3717.
Arabidopsis Squamosa Promoter-Binding Protein-like transcription factors SPL3, SPL4, and SPL5 mediate flowering time via activation of floral meristem identity genes. Expression of SPL transcription factors was shown by Gandikota et al (2007) to be post-transcriptionally regulated via translational repression by miR156. Altering the miRNA response element (MRE) in an SPL3 transgene resulted in an early flowering phenotype.
From our CNS pipeline we identified a 21 bp sequence in the 3' UTR of the SPL4 and SPL5 genes that is conserved in orthologous genes from grape, peach, chocolate and Columbine. This conserved CNS coincides with the miRNA-responsive element for miRNA156/157.
Gandikota, M. et al. (2007) The miRNA 156/157 recognition element in the 3' UTR of the Arabidopsis SBP box gene SPL3 prevents early flowering by translational inhibition in seedlings. The Plant Journal 49: 683-693.
MYB44 is a core stress-responsive transcription factor. MPK3 activation by multiple biotic and abiotic stresses results in the phosphorylation of the bZIP transcription factor, VIP1. Once phosphorylated, VIP1 becomes localized to the nucleus where it activates a number of stress-dependent genes, including MYB44, by binding to VRE elements (ACNGCT). Pitzschke et al have shown that a 137 bp MYB44 promoter fragment containing 3 VRE elements is sufficient for VIP1-dependent induction of a GUS reporter gene, and mutation of these elements abolishes VIP-dependent induction. They also show that VIP1 binding to VREs is enhanced in vivo when the MPK3 pathway is stimulated.
From our CNS pipeline we identified a 26 bp sequence containing a VRE element that is conserved in the 5' proximal region of the Arabidopsis MYB44 and MYB77 promoters as well as in the promoters of orthologous genes from Columbine and chocolate.
Pitzschke, A. et al. (2009) VIP1 response elements mediate mitogen-activated protein kinase 3-induced stress gene expression. Proc. Natl. Acad. Sci. 106: 18414-18419.
Light-regulation of the chloroplast-localized glyceraldehyde-3-phosphate dehydrogenase subunit A (GAPA) gene has been proposed (Jeong and Shih 2003) to be controlled by the binding of a GATA transcription factor to a GATA motif in the 5'-proximal promoter region. GATA-1 binds in vitro to a -47 to -66 DNA fragment, and mutations in each of the two GATA motifs abolishes binding.
From our CNS pipeline we identified a conserved 23 bp sequence in the 5' UTR of the Arabidopsis GAPA duplicate gene pair AT1G12900 and AT3G26650. This CNS is conserved in orthologous genes from grape, peach, and Columbine and coincides with the Arabidopsis gene fragment experimentally determined to bind GATA-1.
Jeong, M-J. and Shih, M-C. (2002) Interaction of a GATA factor with cis-acting elements involved in light regulation of nuclear genes encoding chloroplast glyceraldehyde-3-phosphate dehydrogenase in Arabidopsis. Biochemical and Biophysical Research Communications 300: 555-562.
THIC (Os03g47610) is a rice gene required for thiamine biosynthesis. It has a conserved region in its 3' UTR that has been shown by Wachter et al (2007) to bind thiamine pyrophosphate (TPP), a product of the thiamine biosynthetic pathway. When TPP levels are low, the 3' UTR, including the conserved aptamer, is spliced out resulting in a short 3' UTR and high THIC expression; under high TPP levels, TPP binds to the aptamer preventing splicing from occurring, resulting in a long 3' UTR and low THIC expression.
TPP-sensing riboswitches have been found in the 5' UTR, intron, and 3' UTR of a variety of organisms, including both prokaryotic and eukaryotic organisms. Besides rice, the TPP-binding aptamer in the 3' UTR is found in the moss Physcomitrella patens, the conifer Pinus taeda, and the eudicot Arabidopsis thaliana.
Wachter, A. et al. (2007) Riboswitch control of gene expression in plants by splicing and alternative 3' end processing of mRNAs. The Plant Cell 19: 3437-3450.
Different groups have developed different criteria for what constitutes a CNS. For our own research we define a CNS as a blast hit at least as significant as a 15 base pair exact match present at a syntenic location relative to pair of homeologous or orthologous genes. For these criteria to work, the two genomes being compared should have a modal synonymous substitution rate of between .5 and .99. In more closely related genomes sequences which are not functionally constrained can still retain significant sequence similarity (carry over), while in genomes with greater divergence, base pair substitutions often render even functional CNS undetectable.
In animals, where conserved noncoding elements tend to be larger and evolve at a (relative to plants) glacial rate, different parameters may apply.
To automate the process of identifying and filtering conserved noncoding sequences, the Freeling lab has developed the CNS Discovery Pipeline.