Plant Physiology | DeepDyve

journal article

LitStream Collection

2006 Plant Physiology

Do All Protein Phosphatases 2C Negatively Regulate ABA Signal Transduction? Phosphorylation/dephosphorylation events mediated by a complex network of protein kinases and protein phosphatases play a critical role in abscisic acid (ABA) signaling. Protein phosphatases type 2C (PP2Cs) have been identified as a major component of ABA signaling based on pioneering studies of the ABA-insensitive abi1-1 and abi2-1 mutants. Currently, at least four PP2Cs, ABI1, ABI2, PP2CA, and HAB1, are known to serve as negative regulators of ABA signaling in Arabidopsis (Arabidopsis thaliana). Saez et al. (pp. 1389–1399) have generated double knockout mutants of PP2Cs to determine whether PP2Cs are redundant or additive in their functions. The phenotypic effects observed in single hab1-1, abi1-2, and abi1-3 mutants were notably reinforced in double mutants, which showed both enhanced responsiveness to ABA and drought avoidance. Transpirational water loss under drought conditions was also noticeably reduced in the double mutants as compared to the single parental mutants. These results reveal a cooperative negative regulation of ABA signaling by ABI1 and HAB1. Thus, the combined inactivation of specific PP2Cs involved in ABA signaling could potentially provide an approach for improving crop performance under drought stress conditions. It now appears, however, that not all PP2Cs serve as negative regulators of ABA signaling. Reyes et al. (pp. 1414–1424) present some surprising results concerning an ABA-induced PP2C (FsPP2C2) that had previously been isolated from beech (Fagus sylvatica) seeds. Since transgenic work is not possible in beech, the authors overexpressed FsPP2C2 in Arabidopsis to provide genetic evidence concerning FsPP2C2 function in seed dormancy and other plant responses. Unlike other PP2Cs described to date, the constitutive expression of FsPP2C2 in Arabidopsis, under the cauliflower mosaic virus 35S promoter, produced enhanced sensitivity to ABA and abiotic stress in seeds and vegetative tissues, as well as a dwarf phenotype and delayed flowering. Moreover, all these effects were reverted by GA3 application. In marked contrast to other plant protein phosphatases 2C that have been demonstrated to act as negative regulators of ABA signaling, these results support the hypothesis that FsPP2C2 is a positive regulator of ABA. Moreover, these results indicate the existence of potential cross-talk between ABA signaling and GA biosynthesis. Retinoblastoma-Related Proteins: Developmental Switches As cells exit the shoot apical meristem, the heterotrophic cells of the meristem rapidly gain an autotrophic capability by synthesizing and assembling components of the chloroplast. At the same time, cells undergo enlargement via vacuolization. Despite significant advances in the characterization of the transcriptional networks involved in meristem maintenance and leaf determination, our understanding of the actual mechanism of meristem cell differentiation remains limited. Using a microinduction technique, Wyrzykowska et al. (pp. 1338–1348) show that a local, transient overexpression of a retinoblastoma-related (RBR) protein in the shoot apical meristem is sufficient to trigger cells in the meristem to undergo the initial stages of differentiation in tobacco (Nicotiana tabacum). Interestingly, the cessation of meristem growth and the partial differentiation of the meristem cells persisted long after a short pulse of RBR expression, suggesting that RBR caused an irreversible change in cell behavior. The cytological demonstration that the overexpression of RBR protein induces cell differentiation is complemented by the finding that the overexpression of RBR also up-regulates a photosynthetic gene and down-regulates cell cycle genes as well as at least one gene involved in maintaining meristem identity. Taken together with the recent demonstration that the RBR protein plays a major role in restricting stem cell differentiation in the root apical meristem, these findings contribute to an emerging picture of RBR protein as a central part of the mechanism controlling meristem cell differentiation. Pollen Tube Secretion and Evanescent Wave Microscopy Although the technique of evanescent wave microscopy (EWM) was developed more than two decades ago, it has recently proven to be a very useful tool for the study of secretion. By restricting fluorescence excitation to the vicinity of a dielectric interface and thus suppressing out-of-focus background fluorescence from deeper within the specimen, EWM allows for improved detection and superior depth discrimination of fluorescent structures. Moreover, due to the light confinement of evanescent wave excitation, photobleaching and phototoxic reactions are generally minor compared to conventional epi-excitation or confocal laser scanning microscopy. Wang et al. (pp. 1591–1603) have used EWM to visualize secretory vesicle motions in living pollen tubes of Meyer spruce (Picea meyeri) after labeling the vesicles with the endocytotic/exocytotic tracer FM4-64. This amphiphilic styryl dye has been used previously to investigate endocytosis in living fungal hyphae. Two-dimensional trajectories of individual vesicles were obtained from the resulting time-resolved image stacks and were used to characterize the vesicles in terms of their average fluorescence and mobility. The velocity and direction of vesicle motions, frame-to-frame displacement, and vesicle trajectories were also calculated. Analysis of individual vesicles revealed that two types of motion are present, and that vesicles in living pollen tubes exhibit complicated behaviors and oscillations that differ from simple Brownian motion. Furthermore, disruption of the actin cytoskeleton had a much more pronounced effect on vesicle mobility than did disruption of the microtubules, suggesting that the actin cytoskeleton plays a primary role in vesicle mobility. A Novel Polyamine Oxidase The polyamines putrescine (Put), spermidine (Spd), and spermine (Spm) are small aliphatic amines commonly found in both prokaryotic and eukaryotic cells. In higher plants, polyamines are key players in a number of plant developmental processes and have been implicated in plant responses to various abiotic stresses and plant-pathogen interactions. In the polyamine back-conversion pathway that has been elucidated in animals, Spm and Spd are first acetylated by Spd/Spm N1-acetyltransferase and then oxidized by polyamine oxidase (PAO) to produce Spd and Put, respectively. To date, the only types of PAOs that have been characterized in plants seem to be involved in the terminal catabolism of polyamines and not in the animal-type polyamine back-conversion pathway. In this issue, Tavladoraki et al. (pp. 1519–1532) present evidence that an animal-like PAO homolog does exist in higher plants, suggesting that a polyamine back-conversion pathway may exist in plants. A database search within the Arabidopsis genome sequence showed the presence of a gene (AtPAO1) encoding for a putative PAO with 45% amino acid sequence identity with maize (Zea mays) PAO. The AtPAO1 cDNA was isolated and cloned in a vector for heterologous expression in Escherichia coli. The purified recombinant protein was shown to be a flavoprotein able to oxidize Spm, norspermine, and N1-acetylspermine. Analysis of the reaction products showed that AtPAO1 produces Spd from Spm, demonstrating a substrate oxidation mode similar to that of animal PAO. To the authors' knowledge, AtPAO1 is the first plant PAO reported to be involved in a polyamine back-conversion pathway. Phloem Loading of Exogenous Salicylic Acid Salicylic acid (SA) plays an important role in plant defense against pathogen attack by functioning as an endogenous signal in the transmission of systemic acquired resistance (SAR). SA has been demonstrated to move from inoculated leaves to other tissues by phloem transport. The mechanism by which SA is transported across membranes, however, is poorly understood. Rocher et al. (pp. 1684–1693) have evaluated the ability of exogenous SA to accumulate in the castor bean (Ricinus communis) phloem by chemical analyses of phloem sap collected from the severed apical part of seedlings. Time-course experiments indicated that SA was transported to the root system via the phloem and redistributed upward in small amounts via the xylem. According to the authors, the involvement of two long-distance transport pathways helps to explain several seeming discrepancies in the literature concerning SA distribution within the plant in response to biotic stress and exogenous SA application. Phloem loading of SA at 1, 10, or 100 μm was dependent on the pH of the cotyledon incubating solution, and accumulation in the phloem sap was the highest (about 10-fold) at the most acidic pH values tested (pH 4.6 and 5.0). Moreover, SA, in terms of its pKa value and octanol/water partitioning coefficient, is nearly ideal for phloem systemicity by way of the ion-trap mechanism. However, SA uptake still occurred at pH values close to neutrality, i.e. when SA is predicted to be only in its dissociated form. Moreover, the analog 3,5-dichlorosalicylic acid, which previous models have predicted to be nonmobile, also moved in the sieve tubes. These discrepancies and other data suggest the possible involvement of a pH-dependent carrier system translocating aromatic monocarboxylic acids, in addition to the ion trap mechanism, in the loading of SA into the phloem. Cytosolic Triacylglycerol Biosynthesis in Oilseeds Vegetable oils are the major source of edible lipids, accounting for more than 75% of the total lipids consumed across the world. The global demand for plant oils has intensified efforts to genetically modify the organism to enhance oil yield. During triacylglycerol (TAG) biosynthesis, acyl-CoA:diacylglycerol acyltransferase (DGAT) catalyzes the final step that acylates diacylglycerol to form TAG. It is well known that TAG biosynthesis occurs in microsomal membranes, but Saha et al. (pp. 1533–1543) report the presence of TAG biosynthetic machinery in the cytosol of developing peanut (Arachis hypogaea) cotyledons. The authors identified and purified a soluble DGAT, the activity of which was NaF insensitive and acyl-CoA dependent, from immature peanuts. The isolated gene shared less than 10% identity with the previously identified DGAT1 and 2 families. Expression of peanut cDNA in E. coli resulted in the formation of labeled TAG and wax ester from [14C]acetate. Several observations indicate that the identified DGAT is cytosolic in nature. (1) The activity is associated with 150,000g supernatant. (2) The enzyme is purified by successive column chromatographic separations without detergent. (3) The isolated gene (AhDGAT) has neither membrane-spanning regions nor signal sequence peptide sequences. These data suggest that the cytosol is an alternative site for TAG biosynthesis in oilseeds. The identified pathway may present opportunities for the bioengineering of oil-yielding plants for increased oil production. Author notes www.plantphysiol.org/cgi/doi/10.1104/pp.104.900199. © 2006 American Society of Plant Biologists This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

journal article

LitStream Collection

ADD COLOR!

Ort, Donald R.

2006 Plant Physiology

doi: 10.1104/pp.104.900200pmid: N/A

I am excited to announce two new initiatives that will make adding color to your Plant Physiology article more affordable…even free! Free online color will be available for all accepted papers submitted after October 1, 2006. With the online use of Plant Physiology growing at a rapid rate, online color has, or very soon will, replace print color in importance. This new feature will allow you to have color images in the online version of your article and black and white in the print version for no charge. Online-only color adds value when color is not critical for data interpretation but aids in presentation. The use of online-only color will be subject to editorial review to ensure that color adds significance to the image or value to the reader. Additionally, starting in January 2007, charges for the first printed color image in Plant Physiology articles will be waived for those corresponding authors who are American Society of Plant Biologists members. This is in addition to the already offered discount in page charges for ASPB members. For example, a corresponding author who is an ASPB member would pay $550 for a 10-page article with one color image, whereas a nonmember would pay $1,100 for the same article. In addition to a discount in page charges and one free color image, membership in the Society also comes with a discounted rate on all ASPB publications, free electronic access to Plant Physiology and The Plant Cell, and a discount on registration fees for ASPB meetings. Annual membership in the Society is $115 for regular members (http://www.aspb.org/membership/). I also wanted to give you an update on Plant Physiology Open Access (OA), which is an author option permitting immediate free access for all users to the online publication of your article. For about what we used to pay for reprints, you can select this option and have your article immediately accessible to anyone who has an Internet connection. Is there evidence OA drives higher citation by accelerating recognition and dissemination of research findings? Since we introduced this option with the December 2005 issue, OA articles have on average been accessed about 10% more often and downloaded approximately 20% more often than the non-OA articles published in the same volumes. Although it is too early for citation data to be meaningful, we believe that this early recognition will translate into an increase in article citations. A recent bibliometric analysis (Eysenbach, 2006) of OA versus non-OA papers published over a 6-month period in Proceedings of the National Academy of Sciences supports this hope. Even in a journal widely available in research libraries, OA articles were found to be twice as likely to be cited in the first 4 to 10 months compared to non-OA articles. While it is still too early to have a full picture, the study projected based on citation information out to 16 months that the early recognition is being sustained and resulting in more total citations over time. Before closing, I can't resist bragging on behalf of Plant Physiology about recently released citation and publication data. Plant Physiology's Impact Factor is now 6.114. Plant Physiology is the most highly cited plant biology journal with 39,766 cites in 2005. Plant Physiology is the fastest plant biology journal from submission to publication online in less than 10 weeks. Plant Physiology is available FREE to the developing world through HINARI and AGORA. LITERATURE CITED Eysenbach G ( 2006 ) Citation advantage for open access articles. PLoS Biol 4 : e157 Author notes www.plantphysiol.org/cgi/doi/10.1104/pp.104.900200 © 2006 American Society of Plant Biologists This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

journal article

LitStream Collection

Genevestigator. Facilitating Web-Based Gene-Expression Analysis

Grennan, Aleel K.

2006 Plant Physiology

doi: 10.1104/pp.104.900198pmid: 16896229

Microarray data contains a wealth of information. With many journals requiring this data to be deposited in public databases as a condition of publishing, a good deal of information is now publicly available. Data generated from a specific experiment could be of interest to other researchers investigating very different questions and information gleaned from multiple experiments can be combined, increasing the surety of the results. Public microarray databases can be used to determine what happens to your gene(s) of interest during a specific growth stage or under stress conditions. Conversely, the databases can be mined to look at genome-wide changes in gene expression. Using a database is also effective in terms of time and money to pare down large candidate gene lists before validating function. Many tools are available for analyzing publicly available databases. One such tool, Genevestigator, was presented in our September 2004 issue in the article “GENEVESTIGATOR. Arabidopsis Microarray Database and Analysis Toolbox” by Zimmermann et al. As of July 2006, it had been cited 144 times according to Thompson ISI (Thompson ISI Web of Science, http://www.isinet.com). BACKGROUND Genevestigator (https://www.genevestigator.ethz.ch/) is a publicly available microarray database coupled with expression-data analysis tools. The online analysis tools allow a large range of questions to be asked about gene expression during developmental stages, stress conditions, or by tissue/organ specificity for either specific genes or for exploring more global expression patterns. The program was validated with several genes with known expression patterns. In all instances the expected gene-expression pattern was obtained using the expression-data analysis tools, demonstrating that the tools produced accurate, reproducible results. As more microarray data becomes available and the size of the dataset increases, the quality of the information from these programs will continue to improve. Currently, there are 2,620 Arabidopsis microarray chips available for query on the Genevestigator Web site. The database was initially set up to focus on a single organism (Arabidopsis) and to utilize data generated on the same platform (Affymetrix GeneChip) to ensure obtaining high-quality results using the analysis tools, which the authors believe will allow the “identification of biologically meaningful expression patterns of individual genes” (pp. 2621–2622). Thus, at present, comparing array data from other species with those from Arabidopsis is not possible with Genevestigator, but the database is being extended to other organisms such as mouse, for which 3,110 arrays are already available (Laule et al., 2006). When array analysis tools rely on information from public databases, care must be exercised when selecting which data to include in the database, as data from microarrays potentially can be of low technical quality either due to microarray itself or sample quality. Low-quality data will have a negative impact on results by giving erroneous associations and can also lead to problems with reproducibility (Rensink and Buell, 2005). Thus, the quality of data, along with annotation, needs to be assessed before inclusion. To overcome this potential problem, all of the array data included in the Genevestigator database have been manually assessed. Another important consideration is the signal intensity of a given gene. Genes that are weakly expressed will have higher background and can give false reports. When selecting which chips to use for analysis, the numbers available is included and should be taken into consideration when interpreting results from any of the tools in the toolbox. The original Genevestigator toolbox contained six analysis tools enabling users to make queries about signal intensity values for individual genes or to take a more “genome-centric” approach for chosen criteria and get a list of genes expressed under those conditions (Zimmermann et al., 2004). With the 2005 update, the database was expanded, the existing tools upgraded, and a new tool (Mutant Surveyor) was added. Documentation and FAQ sections were also updated (Zimmermann et al., 2005). The Documentation section contains additional information about the tools and the data, as well as a section on tips, pitfalls, and precautions to help avoid common misconceptions about results from the tools. Currently, the toolbox consists of eight analysis tools, briefly outlined below. Since the available arrays include data using both wild-type and mutant plants, users have the option to select all available arrays, wild-type only, or Columbia wild-type only, with the obvious exception of the Mutant Surveyor tool. The tool Digital Northern can answer either of the following questions: “How is my gene of interest (or a set of genes) expressed throughout selected experiments?” or “In which arrays is my gene of interest most strongly expressed?” The user selects GeneChip experiments from a menu that fits their criteria of developmental stage, organ type, or environmental factors, and inputs up to 10 gene identifiers. The resulting signal intensity data is returned in either graph or tabular form. To investigate how two genes are coexpressed over selected arrays, the signal intensity values of two genes are compared in Gene Correlator within the user-selected experiments. Gene Atlas answers the questions “How strongly is my gene of interest expressed in different organs or tissues?” and “Which genes are expressed preferentially in a selection of organs or tissues?” In this tool, the organs are organized into groups each containing subgroups of specific organs. Selecting the main organ group would include all the chips of the subgroup in addition to all those from whole organ extractions. The subgroups would be of the specific subcategory only. An example would be the group “seedlings.” By selecting “seedlings,” chips representing whole seedlings, as well as those from the subgroups of cotyledons, hypocotyls, and radicles, would be included. In contrast, if the subcategory “hypocotyl” is selected, only those chips containing hypocotyl material would be include. The Gene Chronologer tool addresses queries about the expression of a gene of interest at a specific growth stage or, more globally, which genes are expressed during a growth stage. Growth stages are grouped into 10 subcategories from seed germination to plant senescence. Each growth stage has the number of chips available, and users are cautioned to exercise care in the interpretation of genes that are not heavily represented. Response Viewer is used for making queries based on single genes or global queries on which genes respond to a specific or combined stresses. The corresponding control for the stress exposure is also available. Experiments where multiple treatments were used are not included. Gene Atlas, Gene Chronologer, and Response Viewer do not allow multiple gene queries; only a single gene identifier can be entered. Meta-Analyzer is similar to the other three tools in that the expression profiles of genes from organ type, stress response, or growth stage can be investigated, but it also allows multiple genes to be queried simultaneously. Mutant Surveyor was added in 2005 and demonstrates how a mutation can alter expression of a gene of interest affected (Zimmermann et al., 2005). The Gene Annotator tool provides ontologies and annotations for genes. The gene ontology annotations are from The Arabidopsis Information Resource (TAIR), and the user can select from biological process, molecular function, or cellular component. THE IMPACT This program was conceived with the goal of helping researchers put their gene-expression data into context, allowing them to validate hypotheses and generate new ones, enabling further, more directed research into gene function. This technical validation of gene expression with the microarray databases, although it does not validate gene function, confirms gene expression and can identify candidate genes for further studies of gene function (Clarke and Zhu, 2006). Information from Genevestigator has been used for just such instances, to support experimental findings on gene expression, as well as to demonstrate where a gene is expressed or confirm gene expression in a particular tissue type. It has also been used to determine the expression of a gene in mutant backgrounds (McGrath et al., 2005). Another goal of Zimmermann et al. was that Genevestigator would allow the building of hypotheses about gene expression. A study on folate transport into chloroplasts by Bedhomme et al. (2005) used Genevestigator in tandem with quantitative RT-PCR analysis to determine that their gene of interest is constitutively expressed at all growth stages, allowing them to hypothesize when the phenotype of the null mutant could be detected. Expression data from Genevestigator have been used with expression data obtained from Massively Parallel Signature Sequencing (MPSS), a quantitative measure of gene expression from a particular tissue. McCormack et al. (2005) used primary sequences available for known calmodulin and calmodulin-like genes, and compared expression data from the MPSS database (Meyers et al., 2004; http://mpss.udel.edu/at/) with that from the microarray database available through Genevestigator to determine expression patterns during development, as well as organ specificity and stimulus response. Although there were some discrepancies, both techniques yielded similar findings. Genevestigator was also used in parallel with MPSS and whole-genome arrays to demonstrate the expression of galacturonosyltransferase (GalAT) superfamily members to add support to a hypothesis about pectin synthesis (Sterling et al., 2006). Galacturonic acid is a main component of pectin and is present in all three types of pectin. Although the activity of GalATs has been detected, no gene had been identified for a GalAT that was enzymatically verified. Genevestigator was one of the bioinformatics tools that aided in the functional identification of a GalAT involved in the biosynthesis of the pectin homogalacturonan. CONCLUDING REMARKS There are other online tools available for analyzing Arabidopsis microarray data, such as TAIR (Rhee et al., 2003), MAPMAN (Thimm et al., 2004), and The Botany Array Resource (Toufighi et al., 2005). As more array data become available for other plants, so are online analysis options such as BarleyBase (Shen et al., 2005) and Sol Genomics Network for members of the Solanaceae family (Mueller et al., 2005). Each of these online analysis suites and Genevestigator offer different tools and variations in the databases. As more gene-expression data become available and statistical methods for comparing data originated from different technologies advances, the accuracy of “virtual laboratories” will continue to improve, enabling further advancement of functional genomics. LITERATURE CITED Bedhomme M, Hoffmann M, McCarthy EA, Gambonnet B, Moran RG, Rebeille F, Ravanel S ( 2005 ) Folate metabolism in plants: an Arabidopsis homolog of the mammalian mitochondrial folate transporter mediates folate import into chloroplasts. J Biol Chem 280 : 34823 –34831 Clarke JD, Zhu T ( 2006 ) Microarray analysis of the transcriptome as a stepping stone towards understanding biological systems: practical considerations and perspectives. Plant J 45 : 630 –650 Laule O, Hirsch-Hoffmann M, Hruz T, Gruissem W, Zimmermann P ( 2006 ) Web-based analysis of the mouse transcriptome using Genevestigator. BMC Bioinformatics 7 : 311 McCormack E, Tsai YC, Braam J ( 2005 ) Handling calcium signaling: Arabidopsis CaMs and CMLs. Trends Plant Sci 10 : 383 –389 McGrath KC, Dombrecht B, Manners JM, Schenk PM, Edgar CI, Maclean DJ, Scheible WR, Udvardi MK, Kazan K ( 2005 ) Repressor- and activator-type ethylene response factors functioning in jasmonate signaling and disease resistance identified via a genome-wide screen of Arabidopsis transcription factor gene expression. Plant Physiol 139 : 949 –959 Meyers BC, Vu TH, Tej SS, Ghazal H, Matvienko M, Agrawal V, Ning JC, Haudenschild CD ( 2004 ) Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing. Nat Biotechnol 22 : 1006 –1011 Mueller LA, Solow TH, Taylor N, Skwarecki B, Buels R, Binns J, Lin C, Wright MH, Ahrens R, Wang Y, et al ( 2005 ) The SOL Genomics Network. A comparative resource for Solanaceae biology and beyond. Plant Physiol 138 : 1310 –1317 Rensink WA, Buell CR ( 2005 ) Microarray expression profiling resources for plant genomics. Trends Plant Sci 10 : 603 –609 Rhee SY, Beavis W, Berardini TZ, Chen GH, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M, et al ( 2003 ) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 31 : 224 –228 Shen L, Gong J, Caldo RA, Nettleton D, Cook D, Wise RP, Dickerson JA ( 2005 ) BarleyBase—an expression profiling database for plant genomics. Nucleic Acids Res (Database issue) 33 : D614 –D618 Sterling JD, Atmodjo MA, Inwood SE, Kolli VSK, Quigley HF, Hahn MG, Mohnen D ( 2006 ) Functional identification of an Arabidopsis pectin biosynthetic homogalacturonan galacturonosyltransferase. Proc Natl Acad Sci USA 103 : 5236 –5241 Thimm O, Blasing O, Gibon Y, Nagel A, Meyer S, Kruger P, Selbig J, Muller LA, Rhee SY, Stitt M ( 2004 ) MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J 37 : 914 –939 Toufighi K, Brady SM, Austin R, Ly E, Provart NJ ( 2005 ) The Botany Array Resource: e-Northerns, Expression Angling, and promoter analysis. Plant J 43 : 153 –163 Zimmermann P, Hennig L, Gruissem W ( 2005 ) Gene-expression analysis and network discovery using Genevestigator. Trends Plant Sci 10 : 407 –409 Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W ( 2004 ) GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol 136 : 2621 –2632 Author notes www.plantphysiol.org/cgi/doi/10.1104/pp.104.900198. © 2006 American Society of Plant Biologists This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

journal article

LitStream Collection

Genome-Wide Analysis of Basic/Helix-Loop-Helix Transcription Factor Family in Rice and Arabidopsis

Li, Xiaoxing; Duan, Xuepeng; Jiang, Haixiong; Sun, Yujin; Tang, Yuanping; Yuan, Zheng; Guo, Jingkang; Liang, Wanqi; Chen, Liang; Yin, Jingyuan; Ma, Hong; Wang, Jian; Zhang, Dabing

2006 Plant Physiology

doi: 10.1104/pp.106.080580

journal article

LitStream Collection

Cosecretion of Protease Inhibitor Stabilizes Antibodies Produced by Plant Roots

Komarnytsky, Slavko; Borisjuk, Nikolai; Yakoby, Nir; Garvey, Alison; Raskin, Ilya

2006 Plant Physiology

doi: 10.1104/pp.105.074419pmid: 16896231

Abstract A plant-based system for continuous production of monoclonal antibodies based on the secretion of immunoglobulin complexes from plant roots into a hydroponic medium (rhizosecretion) was engineered to produce high levels of single-chain and full-size immunoglobulins. Replacing the original signal peptides of monoclonal antibodies with a plant-derived calreticulin signal increased the levels of antibody yield 2-fold. Cosecretion of Bowman-Birk Ser protease inhibitor reduced degradation of the immunoglobulin complexes in the default secretion pathway and further increased antibody production to 36.4 μg/g root dry weight per day for single-chain IgG1 and 21.8 μg/g root dry weight per day for full-size IgG4 antibodies. These results suggest that constitutive cosecretion of a protease inhibitor combined with the use of the plant signal peptide and the antibiotic marker-free transformation system offers a novel strategy to achieve high yields of complex therapeutic proteins secreted from plant roots. Therapeutic recombinant proteins have been produced in many different hosts, both prokaryotic and eukaryotic (Fischer et al., 1999). Each of them provides a unique set of advantages and can be tailored to the production of a target protein, depending on the specific requirements imposed by the manufacturing process. When the protein of interest originates from a eukaryotic source, the manufacturing method of choice primarily depends on the yield, codon usage, solubility, and set of complex posttranslational modifications required for structural integrity and biological activity of the protein (Higgins and Hames, 1999). Most first-generation recombinant proteins were well-characterized peptides, such as insulin and other hormones, which functioned as therapeutic agents just as they normally would (Gibbons, 1991). Many second- and third-generation recombinant products, however, are complex monoclonal antibodies (mAbs) that require multiple processing steps to preserve their original bioactivity. Therefore, high costs and limited production capacities remain the major obstacles to many long-term therapies based on mAb treatments (Maloney et al., 1997). In general, plant-based systems compare favorably with alternative expression platforms, both in terms of quality and cost of complex therapeutic proteins. After a routine transformation protocol was developed for plants, two research groups successfully expressed full-size recombinant antibodies in tobacco (Nicotiana tabacum) leaf tissue (During, 1988; Hiatt et al., 1989). Since then, a variety of antibody fragments and/or full-length mAbs have been produced in plants (Stoger et al., 2002). Biologically active mAbs require a number of assembly steps and posttranslational modifications that are carried out in the endoplasmic reticulum (ER). Once the recombinant protein is directed to the ER, it is generally secreted to the apoplast following the default secretion pathway (Deneke et al., 1990), targeted to the vacuole (Frigerio et al., 2002), or retained in the ER by the addition of the KDEL C-terminal sequence (Conrad and Fiedler, 1998). Proteases released during plant tissue harvesting, extraction, and downstream protein purification often result in antibody degradation (Ma et al., 1994; Sharp and Doran, 2001). Using the nondestructive secretion process that provides high yields of recombinant proteins over the lifetime of a plant and facilitates downstream purification can circumvent this manufacturing challenge. Two related plant production systems have been designed recently to achieve a nondestructive production process utilizing rhizosecretion (Borisjuk et al., 1999) or guttation (Komarnytsky et al., 2000). The rhizosecretion of a functional murine mAb from the roots of previously transformed tobacco plants, resulting in a mean antibody yield of 12 μg/g root dry weight per day, was demonstrated subsequently (Drake et al., 2003). Here, we describe an optimized antibiotic-free transformation and rhizosecretion system for stable high-yield production of complex proteins based on the pRYG transformation vector (Komarnytsky et al., 2004). The system was engineered to provide enhanced levels of tissue-specific expression of the human single-chain IgG1 and full-length IgG4 immunoglobulin complexes and improve production rates based on the use of plant-derived signal peptides. Additionally, we demonstrate that cosecretion of the Bowman-Birk Ser protease inhibitor (BBI) into the plant growth medium significantly enhances antibody stability and yield. RESULTS AND DISCUSSION Speed of development, as well as increasing stability and yield of the target protein, are the most important factors if plants are to become a system for the commercial manufacturing of therapeutic recombinant proteins (Peeters et al., 2001). The utilization of the pRYG transformation vector harboring a cluster of rol genes is a fast and effective method for generating transgenic plants without the introduction of antibiotic resistance (Komarnytsky et al., 2004). This vector rapidly induces a large number of independently transformed adventitious roots, enabling efficient screening of individual root clones, selection of the best producers, and subsequent regeneration of fertile plants from them (Gaume et al., 2003). To estimate the efficiency and production capacity of the system, we attempted a rhizosecretion of both single-chain and full-length human mAbs. Selection of Genetic Elements for Single-Chain IgG1 Production To fully capitalize on rhizosecretion capacity, we have used an amplification-promoting sequence (aps; known to stabilize and enhance expression levels of heterologous genes in plants as described by Borisjuk et al. [2000]) linked to the strong tissue-specific mas2′ promoter (Leung et al., 1991) to localize the target gene expression within the root tissue of the transgenic plant, where rhizosecretion occurs (Fig. 1, A and B Figure 1. Open in new tabDownload slide Transformation vectors designed for root-specific expression of a single-chain IgG1, full-length IgG4, and constitutive expression of BBI. A and B, Schematic illustrations of a single-chain (sc) IgG1 expression cassette containing the original signal peptide (A) or cIgG1, c (B). C, Full-length IgG4 expression cassette in which mAb light-chain (lc) and heavy-chain (hc) sequences are separated by a cluster of root proliferation (rol) genes. D, Secreted BBI expression cassette inserted next to the kanamycin-selection marker gene (nptII). E, Cytosolic BBI expression cassette inserted next to the nptII gene. Individual arrows identify the orientation of regulatory genetic elements, such as aps; a root-specific promoter, mas2′; a constitutive CaMV 35S promoter; and three promoters located in the cluster of the rol genes. Terminator sequences are not shown. Figure 1. Open in new tabDownload slide Transformation vectors designed for root-specific expression of a single-chain IgG1, full-length IgG4, and constitutive expression of BBI. A and B, Schematic illustrations of a single-chain (sc) IgG1 expression cassette containing the original signal peptide (A) or cIgG1, c (B). C, Full-length IgG4 expression cassette in which mAb light-chain (lc) and heavy-chain (hc) sequences are separated by a cluster of root proliferation (rol) genes. D, Secreted BBI expression cassette inserted next to the kanamycin-selection marker gene (nptII). E, Cytosolic BBI expression cassette inserted next to the nptII gene. Individual arrows identify the orientation of regulatory genetic elements, such as aps; a root-specific promoter, mas2′; a constitutive CaMV 35S promoter; and three promoters located in the cluster of the rol genes. Terminator sequences are not shown. ). The 5′-untranslated region of a single-chain IgG1 gene was also modified to include the CCACC Kozak motif (Kozak, 1986) immediately upstream of the initiation codon. Another important element, which is often overlooked in expression studies, is the signal peptide that directs a target protein to the ER and further into the default secretion pathway. Although protein translocation, folding, and assembly are believed to be conserved between plants and animals, considerable variability has been noted in the levels of protein production, depending on the source of the signal peptide used (compare Hiatt et al., 1989; Hein et al., 1991; De Neve et al., 1993). To estimate the effect of different signal peptides on protein accumulation in the plant growth medium, an original single-chain IgG1 expression cassette (Fig. 1A) and its modified variant carrying the plant-derived calreticulin signal peptide (Borisjuk et al., 1998) cloned in frame with the single-chain IgG1 gene (Fig. 1B) were used to construct pRYG-based transformation vectors and to generate transgenic tobacco plants. For further comparison studies, we selected individual transgenic lines that expressed the single-chain IgG1 transcripts at similar levels, as confirmed by northern-blot analysis (data not shown). The antibody concentrations in the plant growth medium were determined by ELISA after excised axenic tobacco shoots were allowed to root, then transferred to fresh medium for 7 d. Production rates as high as 9.7 μg/g root dry weight per day were observed for single-chain IgG1 protein targeted by the original secretion peptide (n = 30). On the contrary, cIgG1-directed secretion of single-chain IgG1 using pRYG(single-chain cIgG1) vector (n = 32) resulted in a 2-fold increase in antibody production rates (P < 0.05; Fig. 2A Figure 2. Open in new tabDownload slide Production of the single-chain (sc) IgG1 containing either original or modified signal peptide to direct the immunoglobulin complex to the default secretion pathway. A, ELISA quantification of average immunoglobulin production after IgG1 (n = 30) and cIgG1 (n = 32) plants were grown in fresh medium for 7 d (mean ± sem; *, P < 0.05). B, Western-blot analysis of a single-chain IgG1 in hydroponic medium samples under nonreducing conditions. Total soluble proteins in the root supernatants of the transgenic plant lines secreting the original antibody (IgG1) or the modified variant for which the signal peptide sequence was changed to plant-based cIgG1 were separated by PAGE and probed with the peroxidase-conjugated goat anti-human IgG (H + L) antibody. Figure 2. Open in new tabDownload slide Production of the single-chain (sc) IgG1 containing either original or modified signal peptide to direct the immunoglobulin complex to the default secretion pathway. A, ELISA quantification of average immunoglobulin production after IgG1 (n = 30) and cIgG1 (n = 32) plants were grown in fresh medium for 7 d (mean ± sem; *, P < 0.05). B, Western-blot analysis of a single-chain IgG1 in hydroponic medium samples under nonreducing conditions. Total soluble proteins in the root supernatants of the transgenic plant lines secreting the original antibody (IgG1) or the modified variant for which the signal peptide sequence was changed to plant-based cIgG1 were separated by PAGE and probed with the peroxidase-conjugated goat anti-human IgG (H + L) antibody. ). To further characterize the single-chain IgG1 rhizosecreted into the hydroponic medium, the root supernatant proteins were separated on SDS-PAGE under both reducing and nonreducing conditions and subjected to western-blot analysis. Under reducing conditions, a major protein band of about 45 kD was detected corresponding to the expected molecular mass of the single-chain IgG1 monomer (data not shown). Under nonreducing conditions, two bands of about 85 and 45 kD were detected corresponding to the expected sizes of dimerized single-chain IgG1 and its monomer unit. Additional bands of various molecular masses were also observed, especially in transgenic plants producing higher levels of the modified single-chain IgG1 (Fig. 2B). When compared with immunoglobulin complexes secreted in cell culture (Sharp and Doran, 2001) or from roots of previously transformed plants (Drake et al., 2003), these molecular mass distribution patterns most likely suggest extracellular degradation of the antibody in the apoplast and plant growth medium. Protective Effect of BBI on Antibody Accumulation and Stability Extracellular degradation significantly reduces the levels of functional immunoglobulin complexes once they are synthesized and assembled (Sharp and Doran, 2001). In addition to being metabolically wasteful, protein degradation fragments contaminate the final product with nonfunctional proteins that are difficult to separate. Although antibody degradation can be partially prevented by continuous recovery on purification columns, this procedure is laborious and expensive. Therefore, there is a need to develop strategies that reduce extracellular degradation of the secreted antibody in the apoplast and in the hydroponic medium. An attempt to use externally supplied bacitracin, a small toxic peptide of microbial origin, to prevent degradation of the immunoglobulin complexes released from the plant cell achieved little success (Sharp and Doran, 1999). In this study, we hypothesized that codirection of a recombinant protease inhibitor into the default secretion pathway used by the recombinant antibody may partially protect the assembled immunoglobulin complexes at all stages of the secretion process, including the ER, apoplast, and hydroponic medium. Initially, we evaluated the protective effect of soybean (Glycine max) BBI (Birk, 1985) on the antibody degradation in vitro under various physiological conditions. BBI has a strong Ser protease inhibitory activity; the protein is available commercially and the gene encoding this protein can be effectively expressed (Yakoby and Raskin, 2004). Once exogenously supplied to the antibody solution in vitro, BBI provided some stabilization effect to the human IgG1 antibody kept on a rotary shaker at room temperature in the dark for 24 h, as measured by ELISA. A much stronger protective effect was observed when antibody solution was subjected to light or when external protease was supplied to the medium (P < 0.05 or P < 0.01; Fig. 3A Figure 3. Open in new tabDownload slide Protective effect of BBI protease inhibitor on antibody stability under various conditions, including dark, light, and protease treatment. A, ELISA quantification of exogenously added IgG1 antibody remaining in the supernatants after 24 h in the presence of 0.1 μm BBI or BSA (mean ± sem; *, P < 0.05; **, P < 0.01 compared with the respective control). Sample collected immediately at the beginning of the experiment (IgG1) was used as a 100% reference point. B, Representative western-blot analysis of the same supernatants under nonreducing conditions probed with peroxidase-conjugated goat anti-human IgG (H + L) antibody. Figure 3. Open in new tabDownload slide Protective effect of BBI protease inhibitor on antibody stability under various conditions, including dark, light, and protease treatment. A, ELISA quantification of exogenously added IgG1 antibody remaining in the supernatants after 24 h in the presence of 0.1 μm BBI or BSA (mean ± sem; *, P < 0.05; **, P < 0.01 compared with the respective control). Sample collected immediately at the beginning of the experiment (IgG1) was used as a 100% reference point. B, Representative western-blot analysis of the same supernatants under nonreducing conditions probed with peroxidase-conjugated goat anti-human IgG (H + L) antibody. ). Western-blot analysis of the media samples further confirmed the protective effect (Fig. 3B). BBI has previously been shown to act as a potent, yet selective, tissue radioprotector due to the presence of its chromophore (Dittmann et al., 2005), which can also explain the greater level of antibody protection observed with light exposure. As expected, exogenously supplied equimolar amounts of bovine serum albumin (BSA), which lacks protease inhibitory activity, had a significantly reduced protective effect on the antibody under the same conditions. Putative Interaction between BBI and Immunoglobulin Complexes The nature of BBI-induced protection of immunoglobulin complexes is unknown. To test the hypothesis that BBI stabilizes the antibodies by direct binding to the immunoglobulin complexes, as reported for the inter-α-trypsin inhibitor and the IgG complexes in human serum (Salier et al., 1983), the ELISA wells were coated with 1 μg/mL BBI or BSA, subsequently blocked with 5% nonfat dry milk, and incubated with various concentrations of monoclonal human IgG1. Measurable binding between BBI and IgG1 was observed; no such interaction was noted when ELISA wells were coated with BSA (P < 0.05; Fig. 4 Figure 4. Open in new tabDownload slide Antibody binding to immobilized BBI protein. IgG1 antibody concentrations were measured by ELISA at OD430 using peroxidase-conjugated goat anti-human IgG (H + L) antibody and the trace metal-buffered substrate (mean ± sem; *, P < 0.05). Figure 4. Open in new tabDownload slide Antibody binding to immobilized BBI protein. IgG1 antibody concentrations were measured by ELISA at OD430 using peroxidase-conjugated goat anti-human IgG (H + L) antibody and the trace metal-buffered substrate (mean ± sem; *, P < 0.05). ). Therefore, it is possible that BBI stabilizes antibodies by direct interaction with the IgG molecules. However, the fact that the BBI molecule interacts with the immunoglobulins is only indirect evidence of the putative mechanism behind its protection effect. The observed interaction was weak and did not increase linearly with the increased concentration of the immunoglobulins in the medium, suggesting that an interaction between the molecules alone is not sufficient for explanation of the observed effect. Understanding whether physical interaction between a protease inhibitor and a target heterologous protein is required for protection or whether the presence of a protease inhibitor in the medium interferes with downstream purification of a recombinant protein remains to be elucidated. Directing BBI to the Default Secretion Pathway To develop a transgenic tobacco line producing BBI protein directed into the default secretion pathway, the gene encoding for the BBI protease inhibitor (Yakoby and Raskin, 2004) was fused in frame to the plant-derived calreticulin signal peptide (Borisjuk et al., 1998). The resulting expression cassette was ligated into the pBin19-derived transformation vector under the control of the constitutive cauliflower mosaic virus (CaMV) 35S promoter and the aps element (Fig. 1D). An identical cytosolic BBI expression cassette lacking the signal peptide was used to generate transgenic plants that produce, but do not secrete, BBI (Fig. 1E). To confirm successful transformation events, independent transgenic lines (n = 8–10) from each group were analyzed for the presence of BBI mRNA using northern-blot analysis (Fig. 5A Figure 5. Open in new tabDownload slide Analysis of transgenic tobacco lines expressing BBI. A, Northern-blot quantification of BBI mRNA expression and western-blot detection of cytosolic (lanes 1–5) or secreted (lanes 6–10) form of BBI in plant tissue of individual tobacco transgenic lines after the apoplast liquid was extracted. B, Western-blot detection of multimeric BBI complexes in the apoplast of the BBI-secreting plants under nonreducing conditions. C, Western-blot detection of the secreted BBI protein in root exudates under reducing conditions using rabbit anti-BBI polyclonal antibodies. Figure 5. Open in new tabDownload slide Analysis of transgenic tobacco lines expressing BBI. A, Northern-blot quantification of BBI mRNA expression and western-blot detection of cytosolic (lanes 1–5) or secreted (lanes 6–10) form of BBI in plant tissue of individual tobacco transgenic lines after the apoplast liquid was extracted. B, Western-blot detection of multimeric BBI complexes in the apoplast of the BBI-secreting plants under nonreducing conditions. C, Western-blot detection of the secreted BBI protein in root exudates under reducing conditions using rabbit anti-BBI polyclonal antibodies. , top), as well for accumulation of BBI protein in the cytoplasm after the apoplast liquid was extracted from the plant tissue. Plants engineered to secrete BBI showed negligible amounts of recombinant BBI in the cytoplasm, thus confirming the correct targeting of the recombinant protein (Fig. 5A, bottom). Three transgenic plant lines that expressed high levels of BBI mRNA were grown to maturity, and the presence of functional BBI in the apoplast liquid was determined by protease inhibitory assay after axenic transgenic tobacco shoots were excised, allowed to root, and transferred to fresh medium for 7 d (data not shown) and western-blot analysis (Fig. 5B). The proteins in root exudates were separated by reducing SDS-PAGE gel and analyzed by western blot using rabbit polyclonal antibodies produced against BBI. The molecular mass of a BBI monomer (8 kD) was observed in transgenic lines secreting higher amounts of BBI protein, whereas, under natural conditions, BBI exists as numerous multimers of higher molecular mass (Fig. 5C). No significant amount of BBI was detected in the hydroponic medium of plants engineered to produce a cytosolic version of the protein (data not shown). BBI-secreting transgenic line 9 was used as a master plant line to study the cosecretion of the human single-chain IgG1 and full-length IgG4 antibodies, whereas cytosolic BBI transgenic line 5 was used to generate the control plant coexpressing BBI and immunoglobulins but not secreting BBI (Fig. 5). Stable High-Level Secretion of the Immunoglobulin Protein Complexes To estimate the effect of cosecretion of BBI and the immunoglobulin complex, the pRYG(single-chain cIgG1) and pRYG(cIgG4) vectors (Fig. 1, B and C) were transformed into the BBI-secreting tobacco line 9 and BBI-producing tobacco line 5 was used as a control. The typical root proliferation response was observed 2 weeks after inoculation, as expected (Fig. 6A Figure 6. Open in new tabDownload slide pRYG-based production of the single-chain IgG1 and full-length IgG4 immunoglobulin complexes in transgenic tobacco plants cosecreting BBI. A, Antibiotic-free selection of transgenic root clones based on the morphological root proliferation response of rol genes. B, Wild-type (WT) and BBI-secreting transgenic tobacco line 9 (BBI9) cultivated in sterile hydroponic medium. BBI9 plant was slightly smaller than its nontransgenic control of the same age; however, its root system was more branched than the roots of a nontransformed plant. C and D, Antibody production rates were estimated for the single-chain IgG1 (lanes 1 and 2) and full-length (lanes 3 and 4) IgG4 immunoglobulins produced by transgenic plant lines expressing the cytosolic (lanes 1 and 3) or secretory (lanes 2 and 4) form of BBI by ELISA quantification after plants were grown in fresh medium for 7 d (mean ± se; *, P < 0.05; C) or western-blot detection under nonreducing conditions after probing with the peroxidase-conjugated goat anti-human IgG (H + L) antibody (D). Figure 6. Open in new tabDownload slide pRYG-based production of the single-chain IgG1 and full-length IgG4 immunoglobulin complexes in transgenic tobacco plants cosecreting BBI. A, Antibiotic-free selection of transgenic root clones based on the morphological root proliferation response of rol genes. B, Wild-type (WT) and BBI-secreting transgenic tobacco line 9 (BBI9) cultivated in sterile hydroponic medium. BBI9 plant was slightly smaller than its nontransgenic control of the same age; however, its root system was more branched than the roots of a nontransformed plant. C and D, Antibody production rates were estimated for the single-chain IgG1 (lanes 1 and 2) and full-length (lanes 3 and 4) IgG4 immunoglobulins produced by transgenic plant lines expressing the cytosolic (lanes 1 and 3) or secretory (lanes 2 and 4) form of BBI by ELISA quantification after plants were grown in fresh medium for 7 d (mean ± se; *, P < 0.05; C) or western-blot detection under nonreducing conditions after probing with the peroxidase-conjugated goat anti-human IgG (H + L) antibody (D). ); however, the constitutive coexpression of the protease inhibitor incurred some fitness costs on the transgenic tobacco plants as evident from smaller plant size (Fig. 6B) and lessened lifetime reproductive output (data not shown), similar to a previously described study (Zavala et al., 2004). The antibody concentration in the hydroponic medium was determined by ELISA after excised axenic tobacco shoots were allowed to root and transferred to fresh medium for 7 d. In this experiment, the rhizosecretion of the single-chain cIgG1 construct from the roots of BBI-secreting plant lines was 36.4 μg/g root dry weight per day of the antibody (n = 16), whereas full-length IgG4 antibody accumulated in the growth media at the rate of 21.8 μg/g root dry weight per day (n = 18), as determined by ELISA (Fig. 6C). Control plants that did not secrete BBI protein produced 18.9 μg/g root dry weight per day of the single-chain antibody (n = 11) and 8.2 μg/g root dry weight per day of the full-size antibody (n = 9). To further characterize the immunoglobulin complexes rhizosecreted into the hydroponic medium, the supernatant proteins from representative transgenic lines generated in this experiment were separated on SDS-PAGE under nonreducing conditions and subjected to western-blot analysis. One major band of about 85 kD was detected in the hydroponic medium of the BBI-secreting plants engineered to cosecrete IgG1, which is the expected size for a fully dimerized IgG1 antibody (Fig. 6D). The fully assembled IgG4 mAb was also detected in the root supernatant of BBI-secreting plants; however, two other bands of about 120 and 60 kD were also observed. This suggests that cosecreted BBI was not able to completely protect the full-length antibody, although the proteolytic pattern was greatly reduced as compared to previous studies (Sharp and Doran, 2001; Drake et al., 2003) or control groups of plant expressing, but not secreting, BBI (Fig. 6D). Also, IgG4 accumulated in the plant growth medium at a rate of 21.8 μg/g root dry weight per day, a 2-fold increase in the previously reported yield (Drake et al., 2003). Unlike previous reports on the secretion of mAbs in plant tissue culture or from the roots of the transformed plants, this study demonstrates an antibiotic-free transformation system that allows efficient introduction of both light and heavy chains of mAb and simultaneous modification of the plant root system to further capitalize on the rhizosecretion ability of rol-induced hairy roots (Gaume et al., 2003). Our study also provides crucial initial evidence for the potential of cosecreted protease inhibitors as effective tools to stabilize and enhance the yield of recombinant secreted proteins. This study also emphasizes the importance of carefully selected genetic elements in achieving stable, high-level expression and secretion of the target transgene. We believe that this approach could have potential applications in further development of plant-based systems for manufacturing complex therapeutic proteins and may provide a tool for efficient in vivo study of multiple protein targets. MATERIALS AND METHODS Vector Construction Genes encoding for human single-chain IgG1 (pcDNA3) as well as light (pDONRL) and heavy (pDONRH) chains of monoclonal IgG4 were kindly provided by Dr. Subinay Ganguly (Bristol-Myers Squibb). The single-chain IgG1 gene was amplified using the pair of primers that introduced the SalI site and the Kozak motif to the 5′ end of the amplified sequence and BsiEI site to the 3′ end (forward 5′-CCAGTCGACACCAATGGGTGTACT and reverse 5′-TTGCCGGCCGTCGCACTCATTTAC primer, respectively). The resulting PCR product was isolated from gel using the Qiaquick PCR purification kit (Qiagen) and cloned into the pCR2.1 plasmid using the TOPO TA cloning kit (Invitrogen). From there, the SalI-BsiEI fragment was cloned into the SalI-SacII sites of the pLit-aps-mas-β-glucuronidase (GUS) plasmid (Komarnytsky et al., 2004), effectively replacing the original GUS sequence and placing the single-chain IgG1 gene under the control of the aps element, mas2′ promoter, and nos terminator. At the final cloning step, the KpnI-BsiWI (blunted) fragment containing the entire expression cassette was cloned into the KpnI-EcoRI (blunted) sites of the pRYG plasmid to produce the pRYG(single-chain IgG1) transformation vector (Fig. 1A). When specified, a native immunoglobulin signal peptide was replaced with the plant-based calreticulin signal sequence, PCR amplified from a previously isolated cDNA clone (Borisjuk et al., 1998) using 5′-GTCGACGATCTCACAACAGTGG and 5′-CACGTGCATTGCTACCTCAGCGGA primers. The reverse primer contained a PmlI restriction site that was later used for in-frame fusion of the signal peptide to the IgG1 coding sequence. The modified sequence for the single-chain IgG1 gene was cloned into the pRYG transformation vector following the exact strategy outlined above for the original sequence, resulting in the pRYG(single-chain cIgG1) vector (Fig. 1B). Similar cloning steps have been repeated for replacing the signal peptides of light and heavy chains of full-length IgG4. At the last cloning step, both expression cassettes were inserted into the same transformation vector using the KpnI-EcoRI (blunted) sites of the pRYG plasmid to clone the light chain, and the XbaI (blunted) site of the pRYG plasmid to construct the final transformation vector pRYG(cIgG4), carrying both expression cassettes separated by the rol genes (Fig. 1C). To generate a transformation vector for cosecretion of BBI, the NruI (blunted)-XhoI fragment containing the previously constructed bbi gene was cloned in frame to cIgG1 using NcoI (blunted) and XhoI sites, therefore replacing the green fluorescent protein (GFP) coding region downstream of the CaMV 35S promoter in the pNB-car-GFP plasmid (Borisjuk et al., 2000). The resulting plasmid was then restricted with XbaI and HindIII to insert a CaMV 35S transcriptional terminator downstream of the bbi sequence. Finally, the complete expression cassette was cloned into the HindIII site of the pBin19-aps plasmid (Borisjuk et al., 2000) to construct the final transformation vector pBin-carBBI (Fig. 1D). pBin-BBI, a vector that expresses a cytosolic form of BBI protein, was constructed using the above strategy once cIgG1 was excised from the pNB-car-GFP plasmid prior to cloning (Fig. 1E). Plant Transformation The transgenic lines of tobacco (Nicotiana tabacum) expressing the secreted version of BBI were generated following the standard Agrobacterium-mediated transformation protocol using kanamycin-based selection (Horsch et al., 1985). Plant transformation and antibiotic-free selection for individual transgenic lines expressing either IgG1 or IgG4 were performed essentially as described earlier for the pRYG-based transformation system (Komarnytsky et al., 2004). Northern Blots and RT-PCR Total RNA was isolated from plants as described elsewhere (Chomczynski and Sacchi, 1987). Ten micrograms of total RNA were loaded and subjected to electrophoresis on a 1% denaturing agarose gel containing formaldehyde before capillary blotting onto a Hybond N+ nylon membrane (Amersham). Hybridizations with 32P-labeled DNA probes were performed according to published procedures (Sambrook et al., 1989). Apoplast Liquid Collection and Tissue Extraction Leaf apoplast liquid was collected after vacuum infiltration (Terry and Bonner, 1980) with an ice-cold buffer (50 mm Tris-HCl, 10 mm EDTA, pH 8.0). The remaining plant tissue was frozen in liquid nitrogen, homogenized, and extracted with the same buffer to recover the total soluble proteins present in the tissue after apoplast removal. All samples were used immediately or stored at −20°C. Western Blots and ELISA Total protein in the sample was determined by Bradford dye-binding assay (Bio-Rad). For PAGE protein analysis, protein samples were separated in Tris-Gly gels under reducing/nonreducing conditions. For western-blot analysis of immunoglobulin complexes, proteins separated by PAGE (20 μg) were transferred to a polyvinylidene difluoride membrane. The membrane was blocked with 5% nonfat dry milk and incubated with the goat anti-human IgG (H + L) conjugated to horseradish peroxidase (HRP; Pierce) for immunoglobulin detection. BBI was observed by treating the membranes with custom-raised rabbit anti-BBI polyclonal antibodies (Pocono Rabbit Farm and Lab) followed with murine anti-rabbit IgG-HRP conjugate (Promega). Protein bands were visualized by exposure to x-ray film (Kodak) after treatment with ECL chemiluminescence substrate (Amersham). The concentration of the IgG1 and IgG4 immunoglobulin complexes in the plant growth medium were determined using a sandwich ELISA. In short, 96-well ELISA plates were coated with 1 μg/mL goat anti-human IgG (H + L; Rockland). The plates were blocked with PTB buffer (phosphate-buffered saline, 1% BSA, 0.05% Tween 20) and incubated with the samples or various concentrations of the reference standard (human IgG1-κ; Sigma). The plates were then incubated with the goat anti-human IgG (H + L) conjugated to HRP (Pierce) and stained by adding trace metal-buffered substrate (Amersham). The plates were monitored in a microplate reader (HTS-7000; Perkin-Elmer) at OD650 and, once the reaction was stopped by 2 m H2SO4, were subsequently read at OD450. Protease Inhibitory Activity Assays Inhibitory activity against trypsin and chymotrypsin was visualized by staining of plant growth medium samples spotted on the surface of the polyvinylidene difluoride membrane (normalized for total soluble protein content). After soaking in 100 μg/mL enzyme in 50 mm Tris-HCl, pH 8.0, for 10 min at 37°C, the freshly prepared reaction mixture containing 0.75 mmN-acetyl-dl-phenyl-alanine-β-naphthylester, 1 mmO-dianisidine tetrazotized dye, 10% N,N-dimethylformamide in 50 mm Tris-HCl, pH 8.0, was applied to the surface of the membranes and the clear zone indicative of the inhibitory activity was recorded. All chemicals were purchased from Sigma, unless noted otherwise. Antibody Stability Monoclonal human IgG1 (Sigma) was added to 10 mL of phosphate-buffered saline in triplicate 100-mL shake flasks at the concentration of 1 μg/mL. The flasks were subsequently incubated at 25°C on the orbital shaker (120 rpm) under dark/light conditions as specified by individual treatments. When necessary, BBI or BSA (negative control) was added to the flask in the final concentration of 0.1 μm at the same time, whereas trypsin was supplied at the final concentration of 1 μg/mL. Samples were taken immediately after antibody addition and after 24 h to be analyzed by ELISA and western blotting. Antibody Binding Assay To evaluate the potential binding activity of the immunoglobulin complexes to BBI, 96-well ELISA plates were coated with 1 μg/mL BBI or BSA (negative control). The plates were then blocked with 5% nonfat dry milk and incubated with various concentrations of monoclonal human IgG1 (Sigma). The plates were subsequently incubated with the goat anti-human IgG (H + L) conjugated to HRP and stained following the procedure described above for the ELISA assay. ACKNOWLEDGMENTS We thank Dr. Stanton Gelvin (Purdue University) for the mas2′ promoter and Dr. Thomas Schmulling (Freie Universitat Berlin) for the cluster of the rol genes. We are grateful to Ivan Jenkins for his technical assistance in the greenhouse. LITERATURE CITED Birk Y ( 1985 ) The Bowman-Birk inhibitor: trypsin- and chymotrypsin-inhibitor from soybeans. Int J Pept Protein Res 25 : 113 –131 Borisjuk N, Borisjuk L, Komarnytsky S, Timeva S, Hemleben V, Gleba Y, Raskin I ( 2000 ) Tobacco ribosomal DNA spacer element stimulates amplification and expression of heterologous genes. Nat Biotechnol 18 : 1303 –1306 Borisjuk N, Borisjuk L, Logendra S, Petersen F, Gleba Y, Raskin I ( 1999 ) Production of recombinant proteins in plant root exudates. Nat Biotechnol 17 : 466 –469 Borisjuk N, Sitailo L, Adler K, Malysheva L, Tewes A, Borisjuk L, Manteuffel R ( 1998 ) Calreticulin expression in plant cells: developmental regulation, tissue specificity and intracellular distribution. Planta 206 : 504 –514 Chomczynski P, Sacchi N ( 1987 ) Single-step method of RNA isolation by acid guanidium thiocyanate-phenol-chloroform extraction. Anal Biochem 162 : 156 –159 Conrad U, Fiedler U ( 1998 ) Compartment-specific accumulation of recombinant immunoglobulins in plant cells: an essential tool for antibody production and immuno-modulation of physiological functions and pathogen activity. Plant Mol Biol 38 : 101 –109 Deneke J, Botterman J, Deblaere R ( 1990 ) Protein secretion in plant cells can occur via a default pathway. Plant Cell 2 : 51 –59 De Neve M, De Loose M, Jacobs A, Van Houdt H, Kaluza B, Weidle U, Van Montagu M, Depicker A ( 1993 ) Assembly of an antibody and its derived antibody fragment in Nicotiana and Arabidopsis. Transgenic Res 2 : 227 –237 Dittmann K, Toulany M, Classen J, Heinrich V, Milas L, Rodemann HP ( 2005 ) Selective radioprotection of normal tissues by Bowman-Birk inhibitor (BBI) in mice. Strahlenther Onkol 181 : 191 –196 During K ( 1988 ) Wundinduzierbare Expression und Sekretion von T4 Lysozym und monoklonalen Antikorpern in Nicotiana tabacum. Doctoral dissertation, University of Koln, Germany Drake PM, Chargelegue DM, Vine ND, van Dolleweerd CJ, Obregon P, Ma JK ( 2003 ) Rhizosecretion of a monoclonal antibody protein complex from transgenic tobacco roots. Plant Mol Biol 52 : 233 –241 Fischer R, Drossard J, Emans N, Commandeur U, Hellwig S ( 1999 ) Towards molecular farming in the future: moving from diagnostic protein and antibody production in microbes to plants. Biotechnol Appl Biochem 30 : 101 –108 Frigerio L, Vine ND, Pedrazzini E, Hein MB, Wang F, Ma JK, Vitale A ( 2002 ) Assembly, secretion, and vacuolar delivery of a hybrid immunoglobulin in plants. Plant Physiol 123 : 1483 –1494 Hein MB, Tang Y, McLeod DA, Janda KD, Hiatt A ( 1991 ) Evaluation of immunoglobulins from plant cells. Biotechnol Prog 7 : 455 –461 Hiatt A, Cafferkey R, Bowdish K ( 1989 ) Production of antibodies in transgenic plants. Nature 342 : 76 –78 Higgins SJ, Hames BD ( 1999 ) Protein Expression. Oxford University Press, New York Horsch RB, Fry JE, Hoffmann NL, Eichholtz D, Rogers SG, Fraley RT ( 1985 ) A simple and general method for transferring genes into plants. Science 227 : 1229 –1231 Gaume A, Komarnytsky S, Borisjuk N, Raskin I ( 2003 ) Rhizosecretion of recombinant proteins from plant hairy roots. Plant Cell Rep 21 : 1188 –1193 Gibbons A ( 1991 ) Biotech pipeline: bottleneck ahead. Science 254 : 369 –370 Komarnytsky S, Borisjuk NV, Borisjuk LG, Alam MZ, Raskin I ( 2000 ) Production of recombinant proteins in tobacco guttation fluid. Plant Physiol 124 : 927 –934 Komarnytsky S, Gaume A, Garvey A, Borisjuk N, Raskin I ( 2004 ) A quick and efficient system for antibiotic-free expression of heterologous genes in tobacco roots. Plant Cell Rep 22 : 765 –773 Kozak M ( 1986 ) Point mutations define a sequence flanking the AUG initiator that modulates translation by eukaryotic ribosomes. Cell 44 : 283 –297 Leung J, Fukuda H, Wing D, Schell J, Masterson R ( 1991 ) Functional analysis of cis-elements, auxin response and early developmental profiles of the mannopine synthase bidirectional promoter. Mol Gen Genet 230 : 463 –474 Ma JK, Lehner T, Stabila P, Fux CI, Hiatt A ( 1994 ) Assembly of monoclonal antibodies with IgG1 and IgA heavy chain domains in transgenic tobacco plants. Eur J Immunol 24 : 131 –138 Maloney DG, Grillo-Lopez AJ, White CA, Bodkin D, Schilder RJ, Neidhart JA, Janakiraman N, Foon KA, Liles TM, Dallaire BK, et al ( 1997 ) IDEC-C2B8 (Rituximab) anti-CD20 monoclonal antibody therapy in patients with relapsed low-grade non-Hodgkin's lymphoma. Blood 90 : 2188 –2195 Peeters K, De Wilde C, De Jaeger G, Angenon G, Depicker A ( 2001 ) Production of antibodies and antibody fragments in plants. Vaccine 19 : 2756 –2761 Salier JP, Sesboue R, Hochstrasser K, Schonberger O, Martin JP ( 1983 ) Isolation and characterization of an inter-alpha-trypsin inhibitor IgG complex from human serum. Biochim Biophys Acta 742 : 206 –214 Sambrook J, Fritsch EF, Maniatis T ( 1989 ) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY Sharp JM, Doran PM ( 1999 ) Effect of bacitracin on growth and monoclonal antibody production by tobacco hairy roots and cell suspensions. Biotechnol Bioprocess Eng 4 : 253 –258 Sharp JM, Doran PM ( 2001 ) Characterization of monoclonal antibody fragments produced by plant cells. Biotechnol Bioengin 73 : 338 –346 Stoger E, Sack M, Fischer R, Christou P ( 2002 ) Plantibodies: applications, advantages and bottlenecks. Curr Opin Biotechnol 13 : 161 –166 Terry ME, Bonner BA ( 1980 ) An examination of centrifugation as a method of extracting an extracellular solution from peas, and its use for study of indoleacetic acid-induced growth. Plant Physiol 66 : 321 –325 Yakoby N, Raskin I ( 2004 ) A simple method to determine trypsin and chymotrypsin inhibitory activity. J Biochem Biophys Methods 59 : 241 –251 Zavala JA, Patankar AG, Gase K, Baldwin IT ( 2004 ) Constitutive and inducible trypsin protease inhibitor production incurs large fitness costs in Nicotiana attenuata. Proc Natl Acad Sci USA 101 : 1607 –1612 Author notes 1 This work was supported in part by grants from Phytomedics, Inc. (Dayton, NJ), Rutgers, the State University of New Jersey, and N.J. Agricultural Experiment Station, and in part by Vaadia-BARD (postdoctoral award no. FI–302–2000 to N.Y.). 2 Present address: Thomas Jefferson University, 1020 Locust St., Philadelphia, PA 19107. 3 Present address: Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544. * Corresponding author; e-mail [email protected]; fax 732–932–6535. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Slavko Komarnytsky ([email protected]). www.plantphysiol.org/cgi/doi/10.1104/pp.105.074419. © 2006 American Society of Plant Biologists This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

journal article

LitStream Collection

A Versatile and Reliable Two-Component System for Tissue-Specific Gene Induction in Arabidopsis

Brand, Lukas; Hörler, Mirjam; Nüesch, Eveline; Vassalli, Sara; Barrell, Philippa; Yang, Wei; Jefferson, Richard A.; Grossniklaus, Ueli; Curtis, Mark D.

2006 Plant Physiology

doi: 10.1104/pp.106.081299pmid: 16896232

Abstract Developmental progression and differentiation of distinct cell types depend on the regulation of gene expression in space and time. Tools that allow spatial and temporal control of gene expression are crucial for the accurate elucidation of gene function. Most systems to manipulate gene expression allow control of only one factor, space or time, and currently available systems that control both temporal and spatial expression of genes have their limitations. We have developed a versatile two-component system that overcomes these limitations, providing reliable, conditional gene activation in restricted tissues or cell types. This system allows conditional tissue-specific ectopic gene expression and provides a tool for conditional cell type- or tissue-specific complementation of mutants. The chimeric transcription factor XVE, in conjunction with Gateway recombination cloning technology, was used to generate a tractable system that can efficiently and faithfully activate target genes in a variety of cell types. Six promoters/enhancers, each with different tissue specificities (including vascular tissue, trichomes, root, and reproductive cell types), were used in activation constructs to generate different expression patterns of XVE. Conditional transactivation of reporter genes was achieved in a predictable, tissue-specific pattern of expression, following the insertion of the activator or the responder T-DNA in a wide variety of positions in the genome. Expression patterns were faithfully replicated in independent transgenic plant lines. Results demonstrate that we can also induce mutant phenotypes using conditional ectopic gene expression. One of these mutant phenotypes could not have been identified using noninducible ectopic gene expression approaches. Advances in inducible gene expression technologies will facilitate more precise functional analyses of endogenous and exogenous genes, revealing new roles for genes that act at multiple stages in the plant life cycle. Such analyses will assist the development of new, improved crop varieties. Conditional and cell type-specific gene expression systems allow precise functional complementation of mutants, disclosing the spatial and temporal significance of a gene's expression profile at different stages in development. Eukaryotic genomes have evolved through numerous rearrangements, producing duplicated genes with functional redundancy, a characteristic that is particularly evident in plants. In Arabidopsis (Arabidopsis thaliana; for review, see Curtis and Grossniklaus, 2005), duplications comprise more than 60% of the genome (Blanc et al., 2000). The functional significance of redundantly acting genes can be determined only by combining several loss-of-function mutations in genes of the same family, or by expressing one such gene ectopically (Eshed et al., 2001). Typically, genes are ectopically expressed using a constitutive and broadly active promoter, such as the cauliflower mosaic virus (CaMV) 35S promoter (Odell et al., 1985). This approach has been the basis of many gene function studies, including investigations of single-copy genes (Mizukami and Ma, 1992; Jack et al., 1994). Ubiquitous and constitutive gene expression can, however, result in lethality if a gene of critical importance to an early stage in development is misexpressed. Problems can also occur in complementation studies if a gene is expressed not only in the tissue type required, but also in other tissues (Laufs et al., 2003). The combination of cell type-specific complementation and loss-of-function mutation provides powerful tools to elucidate the role of genes, particularly those acting at multiple stages of development (Gross-Hardt et al., 2002), and can expose the noncell-autonomous nature of a gene's activity (An et al., 2004). Genetic complementation at early stages of development can be achieved using the promoter of another gene with a similar early expression profile (Gross-Hardt et al., 2002); however, alternative promoters are not always available. In such cases, a system that allows cell type-specific conditional complementation is invaluable. Such a system that could also allow conditional cell type-specific activation of randomly tagged genes would provide the versatility required to identify classes of mutants with tissue-specific effects or mutants with early lethal effects. Such mutants could be rescued in noninduced plants or plant sectors. None of the currently available systems provides such versatility (for review, see Curtis and Grossniklaus, 2006). Although inducible activation-tagging systems that allow temporal control of gene expression have been developed (Matsuhara et al., 2000; Zuo et al., 2000), none allows tissue-specific activity, which can be critically important, as demonstrated by the induced ectopic expression of expansin in restricted tissue (meristem fractions) leading to leaf formation (Pien et al., 2001). Here, gene activity was controlled by the careful, manual application of a chemical to a subset of meristematic cells. This would be impractical for experiments in less accessible cell types, such as reproductive tissues within the gynoecium. There are, of course, systems that transactivate genes using tissue-specific promoters, but these have limitations. The ethanol-inducible system (Deveaux et al., 2003; Maizel and Weigel, 2004), for example, is limited by the volatile nature of the inducer, which can cause unwanted gene activation in neighboring plants (Roslan et al., 2001), toxic effects on the induced plant (Roslan et al., 2001), and can be activated by endogenous inducers under low-oxygen conditions (Salter et al., 1998; Roslan et al., 2001). These features make it difficult to produce distinct sectors of induced and uninduced gene expression essential to determine the cell autonomy of a phenotype and to address hitherto intractable problems encountered during reproductive developmental studies (where gene activation can prevent production of viable seeds). Some systems, based on nonvolatile inducers, have shown leaky activity (Martinez et al., 1999; De Veylder et al., 2000) or an inability to reliably activate responder T-DNAs randomly inserted in the genome (Baroux et al., 2005). This prohibits their use in precise activation-tagging screens. We have solved these problems by developing a system that allows localized, conditional gene induction within sectors of the plant exposed to the inducer. The system conditionally activates randomly integrated responder T-DNAs (regardless of their insertion position) at a high frequency for use in activation-tagging screens and reliably restricts expression to predicted tissue types. This allows the isolation of mutant phenotypes affecting seed development, which could not easily be identified using conventional misexpression approaches. We enhanced our system by incorporating Gateway cloning sites (Hartley et al., 2000) so that researchers and biotechnologists can use the growing number of tissue-specific Gateway-compatible cis-elements (An et al., 2004) and full-length cDNAs (Gong et al., 2004) that are now available for a broad range of applications. RESULTS Our aim was to generate a reliable and versatile two-component tissue-specific inducible gene expression system to provide a method by which randomly tagged genes, or candidate genes, could be conditionally activated in restricted sectors of a plant in restricted tissue types. These demands have resulted in the production of a stringent system with broad applications (Fig. 1 Figure 1. Open in new tabDownload slide A schematic illustration of the Gateway-compatible constructs. A, Activator vector. B, Responder vectors. C, Activator/responder vector. The activator vector pMDC150 (A) contains a Gateway cloning cassette, flanked by unique AscI and PacI restriction recognition sites upstream of a CaMV 35S minimal promoter (min 35S) and chimeric transcription factor XVE. The inclusion of a minimal promoter allows the insertion of both enhancer sequences and promoters in the constructs (the inclusion of a minimal promoter does not interfere with the specificity of the promoters used in this study). The vector pMDC150 also contains a Nos promoter to drive the expression of a kanamycin-resistance gene for plant selection. All responder vectors (B) contain an XVE-responsive promoter (OlexA-TATA) upstream of a Gateway cloning cassette, also flanked by unique AscI and PacI restriction recognition sites for the easy diagnosis of DNA insertions. Vectors pMDC160, pMDC220, and pMDC221 contain a Nos promoter to drive the expression of the BAR- or hygromycin-resistance genes, respectively, for plant selection. The vectors pMDC160, pMDC220, and pMDC221 also contain the pBluescript vector sequence (CLONTECH), which can be used for plasmid rescue procedures because it encodes an ampicillin-resistance gene for bacterial selection and the ColE1 origin of replication. The pMDC220 vector also contains a second XVE-responsive promoter adjacent to the right border (RB) sequence, so that this vector can be used for conditional activation tagging experiments. The activator/responder vector pLB12 (C) contains both an activator unit and a responder unit, separated by a kanamycin-resistance gene driven by a Nos promoter for plant selection. The sequences and detailed maps of these vectors can be downloaded from http://www.unizh.ch/botinst/Devo_Website/curtisvector. LB, Left border; TE9, TE9 terminator; T3A, terminator; attR1 and attR2, att recombination sites; CMr, bacterial chloramphenicol resistance; ccdB, bacterial toxin gene for negative selection. Figure 1. Open in new tabDownload slide A schematic illustration of the Gateway-compatible constructs. A, Activator vector. B, Responder vectors. C, Activator/responder vector. The activator vector pMDC150 (A) contains a Gateway cloning cassette, flanked by unique AscI and PacI restriction recognition sites upstream of a CaMV 35S minimal promoter (min 35S) and chimeric transcription factor XVE. The inclusion of a minimal promoter allows the insertion of both enhancer sequences and promoters in the constructs (the inclusion of a minimal promoter does not interfere with the specificity of the promoters used in this study). The vector pMDC150 also contains a Nos promoter to drive the expression of a kanamycin-resistance gene for plant selection. All responder vectors (B) contain an XVE-responsive promoter (OlexA-TATA) upstream of a Gateway cloning cassette, also flanked by unique AscI and PacI restriction recognition sites for the easy diagnosis of DNA insertions. Vectors pMDC160, pMDC220, and pMDC221 contain a Nos promoter to drive the expression of the BAR- or hygromycin-resistance genes, respectively, for plant selection. The vectors pMDC160, pMDC220, and pMDC221 also contain the pBluescript vector sequence (CLONTECH), which can be used for plasmid rescue procedures because it encodes an ampicillin-resistance gene for bacterial selection and the ColE1 origin of replication. The pMDC220 vector also contains a second XVE-responsive promoter adjacent to the right border (RB) sequence, so that this vector can be used for conditional activation tagging experiments. The activator/responder vector pLB12 (C) contains both an activator unit and a responder unit, separated by a kanamycin-resistance gene driven by a Nos promoter for plant selection. The sequences and detailed maps of these vectors can be downloaded from http://www.unizh.ch/botinst/Devo_Website/curtisvector. LB, Left border; TE9, TE9 terminator; T3A, terminator; attR1 and attR2, att recombination sites; CMr, bacterial chloramphenicol resistance; ccdB, bacterial toxin gene for negative selection. ). Components of the Inducible Transactivation System The system comprises an activator unit and a responder unit. The activator T-DNA (pMDC150) contains the transcriptional activator, XVE (Zuo et al., 2000), with a minimal CaMV 35S promoter. The responder T-DNA contains an XVE-responsive promoter that can be used to misexpress candidate genes or reporter genes (i.e. pMDC160 and pMDC221) or to activate randomly tagged genes (i.e. pMDC220). Both activator and responder T-DNAs contain Gateway recombination sites (Fig. 1, A and B). The vector pLB12 contains both an activator unit and a β-glucuronidase (GUS) reporter (Jefferson et al., 1987) within a responder unit (Fig. 1C). Vector-Dependent Regulation of Gene Expression While developing this technology, we identified that responder T-DNAs containing the 35S promoter (i.e. a pCAMBIA-derived vector [http://www.cambia.org]), which regulates the antibiotic resistance marker, can lead to uninduced transgene expression in responder constructs (data not shown). This uninduced expression was observed even when a 3-kb fragment containing the entire pUC vector sequence (Invitrogen) was introduced between the 35S promoter and the responder cassette (data not shown). Similar interference by the 35S promoter has recently been reported in the pCAMBIA vector series (Yang et al., 2005) and the pPZP series (Yoo et al., 2005). No uninduced expression of reporter genes was observed with vectors pMDC160, pMDC221, pMDC220, or pLB12, which were derived from the pMoa vector series containing a Nos (nopaline synthase) promoter to regulate the antibiotic resistance marker. As a result, all plant vectors used to develop the two-component inducible system were derived from the pMoa vector series with the Nos promoter (rather than the 35S promoter) to regulate the expression of the selectable marker. Conditional Gene Expression To test the system, an enhancer fragment from the CaMV 35S promoter was inserted upstream of XVE, producing the activator T-DNA, pMDC150-35S. This was used to transactivate the GUS reporter in Arabidopsis plants previously transformed by a pMDC160-GUS responder construct. Sectors of induced gene activity were observed when leaf material was treated with 2 μ m 17-β-estradiol (0.01% Silwet 77) using an artist's paint brush and GUS stained 24 h later (Fig. 2 Figure 2. Open in new tabDownload slide Arabidopsis leaves showing sectors of induced GUS expression 24 h after induction with 2 μ m 17-β-estradiol (0.01% Silwet 77) using an artist's paint brush. Figure 2. Open in new tabDownload slide Arabidopsis leaves showing sectors of induced GUS expression 24 h after induction with 2 μ m 17-β-estradiol (0.01% Silwet 77) using an artist's paint brush. ). Varying the 17-β-estradiol exposure time resulted in altered reporter gene activity, peaking between 24 to 48 h. A similar peak in activity was reported using the PER8 vector (Zuo et al., 2000). Although GUS activity remains largely in the tissue directly beneath the site of 17-β-estradiol application, there is slow spreading, which results in a halo of weak GUS activity that gradually spreads throughout the entire leaf over a 72- to 96-h period (data not shown). Seedlings grown on 10 μ m 17-β-estradiol plates showed strong induced GUS activity throughout the plant tissue when compared to seedlings grown on mock-inoculated plates (which show no expression), demonstrating that 17-β-estradiol can permeate the aerial parts of the plant when only the roots are exposed to the inducer (Supplemental Fig. 1); however, when inflorescences are allowed to take up 17-β-estradiol by transpiration (2 μ m 17-β-estradiol in water), the inducer tends to promote most GUS activity in the vasculature and adjacent tissue, but will eventually permeate the stem, cauline leaves, and even the ovules within the gynoecium of the flower after 96 h of exposure (Supplemental Fig. 2). The chosen method of application will depend on the developmental stage of interest to be studied in the plant (i.e. 17-β-estradiol application in media would be most appropriate for early seedling development studies, whereas inducer application to inflorescence tissue by transpiration or topical application using a paint brush and a spreading agent [Silwet 77 or Break thru S240] might be more appropriate for floral or reproductive developmental studies). Efficiency of Transactivation The value of a two-component inducible transactivation system depends on its ability to deliver reliable and conditional tissue type-specific gene expression at a high frequency in independent transgenic plant lines, particularly when screening for inducible phenotypes in restricted cell types that result from activation-tagging approaches. A pMDC150-35S activator plant line was used to establish the frequency with which random pMDC160-GUS responder insertions can be activated (Table I Table I. Comparative expression analysis of independent activator and responder T-DNAs positioned throughout the genome Primary Plant Line Contains T-DNA . Supertransformed with T-DNA . Expected . Aberrant . None . Total Lines . PMDC150-35S activator pMDC160-GUS responder 20 0 3 23 PMDC150-SUC2 activator pMDC160-GUS responder 28 0 2 30 PMDC150-GL2 activator pMDC160-GUS responder 27 (2 weak) 0 3 30 Subtotal 75 0 8 83 PMDC160-GUS responder pMDC150-35S activator 46 0 2 48 PMDC160-GUS responder pMDC150-SUC2 activator 12 0 1 13 PMDC160-GUS responder pMDC150-TobRB7 activator 21 0 5 26 Subtotal 79 0 8 87 PMDC150-GL2 activator pMDC220-GUS tagging 1,040 (289a) 271 154 1,465 Total lines 1,194 271 170 1,635 Primary Plant Line Contains T-DNA . Supertransformed with T-DNA . Expected . Aberrant . None . Total Lines . PMDC150-35S activator pMDC160-GUS responder 20 0 3 23 PMDC150-SUC2 activator pMDC160-GUS responder 28 0 2 30 PMDC150-GL2 activator pMDC160-GUS responder 27 (2 weak) 0 3 30 Subtotal 75 0 8 83 PMDC160-GUS responder pMDC150-35S activator 46 0 2 48 PMDC160-GUS responder pMDC150-SUC2 activator 12 0 1 13 PMDC160-GUS responder pMDC150-TobRB7 activator 21 0 5 26 Subtotal 79 0 8 87 PMDC150-GL2 activator pMDC220-GUS tagging 1,040 (289a) 271 154 1,465 Total lines 1,194 271 170 1,635 a Plants with expression in trichomes and other tissues. These patterns of expression were also observed in some control transgenic plant lines containing pMDC163-GL2 promoter-GUS T-DNAs. Open in new tab Table I. Comparative expression analysis of independent activator and responder T-DNAs positioned throughout the genome Primary Plant Line Contains T-DNA . Supertransformed with T-DNA . Expected . Aberrant . None . Total Lines . PMDC150-35S activator pMDC160-GUS responder 20 0 3 23 PMDC150-SUC2 activator pMDC160-GUS responder 28 0 2 30 PMDC150-GL2 activator pMDC160-GUS responder 27 (2 weak) 0 3 30 Subtotal 75 0 8 83 PMDC160-GUS responder pMDC150-35S activator 46 0 2 48 PMDC160-GUS responder pMDC150-SUC2 activator 12 0 1 13 PMDC160-GUS responder pMDC150-TobRB7 activator 21 0 5 26 Subtotal 79 0 8 87 PMDC150-GL2 activator pMDC220-GUS tagging 1,040 (289a) 271 154 1,465 Total lines 1,194 271 170 1,635 Primary Plant Line Contains T-DNA . Supertransformed with T-DNA . Expected . Aberrant . None . Total Lines . PMDC150-35S activator pMDC160-GUS responder 20 0 3 23 PMDC150-SUC2 activator pMDC160-GUS responder 28 0 2 30 PMDC150-GL2 activator pMDC160-GUS responder 27 (2 weak) 0 3 30 Subtotal 75 0 8 83 PMDC160-GUS responder pMDC150-35S activator 46 0 2 48 PMDC160-GUS responder pMDC150-SUC2 activator 12 0 1 13 PMDC160-GUS responder pMDC150-TobRB7 activator 21 0 5 26 Subtotal 79 0 8 87 PMDC150-GL2 activator pMDC220-GUS tagging 1,040 (289a) 271 154 1,465 Total lines 1,194 271 170 1,635 a Plants with expression in trichomes and other tissues. These patterns of expression were also observed in some control transgenic plant lines containing pMDC163-GL2 promoter-GUS T-DNAs. Open in new tab ). Leaves were excised from transformants and analyzed histochemically for GUS activity, with and without induction by 2 μ m 17-β-estradiol (Fig. 3A Figure 3. Open in new tabDownload slide Cell-specific expression patterns of a GUS reporter gene after induction. Uninduced and induced GUS expression (uninduced shown on the left of each image, respectively) in Arabidopsis plant lines 24 h after induction (2 μ m 17-β-estradiol in 0.01% [v/v] ethanol) or mock induction (0.01% [v/v] ethanol). A, Leaves of a plant line containing pMDC150-35S and pMDC160-GUS (induced ubiquitous expression can be seen across the leaf). B and C, GUS expression in plant lines containing pMDC150-GL2 and pMDC160-GUS. B, 3-d-old seedlings (induced expression is restricted to the atrichoblast cells). C, Mature leaves of 15-d-old plants (induced expression restricted to the trichomes). D, Whole plants at 7-to-10-d old (left and middle) and a flower (right) from plant lines containing pMDC150-RolC and pMDC160-GUS (induced expression is restricted to the vascular tissue). E, Mature leaves (left and middle) and a flower (right), including petals, from plant lines containing pMDC150-SUC2 and pMDC160-GUS (induced expression is observed in the vascular tissue [companion cells]). F and G, Roots in mature plant lines containing pMDC150-TobRB7 and pMDC160-GUS (induced expression is restricted to the tissue above the root apical meristem [RAM]). H, Plant line containing pLB12-EASE (induced expression is observed in the egg apparatus). Bars = 0.5 mm (G) and 20 μm (H). Figure 3. Open in new tabDownload slide Cell-specific expression patterns of a GUS reporter gene after induction. Uninduced and induced GUS expression (uninduced shown on the left of each image, respectively) in Arabidopsis plant lines 24 h after induction (2 μ m 17-β-estradiol in 0.01% [v/v] ethanol) or mock induction (0.01% [v/v] ethanol). A, Leaves of a plant line containing pMDC150-35S and pMDC160-GUS (induced ubiquitous expression can be seen across the leaf). B and C, GUS expression in plant lines containing pMDC150-GL2 and pMDC160-GUS. B, 3-d-old seedlings (induced expression is restricted to the atrichoblast cells). C, Mature leaves of 15-d-old plants (induced expression restricted to the trichomes). D, Whole plants at 7-to-10-d old (left and middle) and a flower (right) from plant lines containing pMDC150-RolC and pMDC160-GUS (induced expression is restricted to the vascular tissue). E, Mature leaves (left and middle) and a flower (right), including petals, from plant lines containing pMDC150-SUC2 and pMDC160-GUS (induced expression is observed in the vascular tissue [companion cells]). F and G, Roots in mature plant lines containing pMDC150-TobRB7 and pMDC160-GUS (induced expression is restricted to the tissue above the root apical meristem [RAM]). H, Plant line containing pLB12-EASE (induced expression is observed in the egg apparatus). Bars = 0.5 mm (G) and 20 μm (H). ). Plant lines (87%), with responders inserted at 23 independent loci, gave inducible ubiquitous expression. The reciprocal experiment was also performed (Table I). Fourth-generation pMDC160-GUS responder plant lines were supertransformed with the pMDC150-35S activator. Here, 95.8% of plants showed predictable induced expression with activators inserted at 48 independent loci. Stringency of Gene Expression System The stringency of the two-component Gateway-compatible system was tested using pMDC150-35S activator lines supertransformed with the responder pMDC221, containing the cytotoxic diphtheria A-chain (DT-A; Maxwell et al., 1986; Harrison et al., 1991; pMDC221-DT-A). The DT-A gene kills cells by ribosylating elongation factor-2, leading to the inhibition of protein synthesis (Collier, 1967). Here, 13 independent transgenic plant lines containing both the pMDC150-35S activator and a pMDC221-DT-A responder showed no phenotypic effects in the absence of the 17-β-estradiol inducer. However, induction with as little as 2 μ m 17-β-estradiol leads to signs of cell death in all 13 plant lines (Fig. 4 Figure 4. Open in new tabDownload slide Stringent regulation of gene expression using the system is demonstrated by induction of DT-A in seedlings. Arabidopsis plant lines containing both the pMDC150-35S activator T-DNA and a pMDC221-DT-A responder T-DNA after 13 d of growth under uninduced conditions (A; mock inoculated Murashige and Skoog media) and under induced conditions (B; 2 μ m 17-β-estradiol in Murashige and Skoog medium). Figure 4. Open in new tabDownload slide Stringent regulation of gene expression using the system is demonstrated by induction of DT-A in seedlings. Arabidopsis plant lines containing both the pMDC150-35S activator T-DNA and a pMDC221-DT-A responder T-DNA after 13 d of growth under uninduced conditions (A; mock inoculated Murashige and Skoog media) and under induced conditions (B; 2 μ m 17-β-estradiol in Murashige and Skoog medium). ). This demonstrates the tight regulation of genes adjacent to the XVE-responsive promoter. The DT-A toxin has been used as a tool to ablate specific plant tissues (Weijers et al., 2003; Yang et al., 2005). By replacing the 35S promoter in pMDC150 with tissue-specific promoters, expression of the DT-A gene could be restricted to specific tissues allowing inducible cell type-specific ablation. Inducible Tissue-Specific Transactivation To test the system's ability to deliver inducible tissue-specific gene expression, five activator constructs were generated using promoter or enhancer elements with different tissue specificities. These included elements with vascular-specific (RolC from Agrobacterium tumefaciens and AtSUC2 companion cell specific; An et al., 2004), root-specific (NtTobRB7; An et al., 2004), trichome/atrichoblast-specific (AtGL2; Szymanski et al., 1998), and egg apparatus-specific (AtEASE; Yang et al., 2005) activities. These elements were all tested in the pMDC150 activator T-DNA, except for AtEASE, which was tested in the vector pLB12 (Fig. 1). The T-DNA pLB12-AtEASE contains both the activator and responder units, greatly facilitating expression analysis of AtEASE by placing both units within the same haploid cells of the T1 female gametophytes. This provides the most stringent of tests for the system because the egg apparatus is inaccessible, located at the micropylar end of the ovule, deep within the gynoecium, surrounded by the reproductive tissue of the flower. Each activator construct was tested for inducible cell type-specific activity. After induction, the pattern of expression was shown to be identical to those obtained with the same cis-elements in the GUS reporter construct pMDC163 (Curtis and Grossniklaus, 2003; data not shown) and to previously published patterns of expression (Szymanski et al., 1998; An et al., 2004; Yang et al., 2005). Inducible transactivation patterns of GUS expression are shown in Figure 3. Activation in Trichomes Inducible GUS activity observed in Arabidopsis transformed with pMDC150-GL2 and a pMDC160-GUS responder T-DNA was consistent with patterns of expression previously observed (Hung et al., 1998; Szymanski et al., 1998). The AtGL2 promoter fragment used in this study contained the 500-bp EcoRV/XbaI DNA fragment previously identified as necessary to direct GUS expression in differentiating hairless epidermal cells in the hypocotyls and roots (atrichoblasts). After induction, GUS expression was observed in the differentiating hairless epidermal cells that form the atrichoblast cells (Fig. 3B). In leaf primordia and developing leaves, pMDC150-GL2 was able to inducibly transactivate the GUS responder T-DNA in developing trichomes and in surrounding epidermal cells. At later stages in leaf development, this pattern of expression became more tightly restricted to trichome cells, with GUS activity also observed at the petiole and base of the leaf. In mature leaves, expression was restricted entirely to trichomes and was maintained throughout the lifetime of these cells (Fig. 3C). Similar patterns of expression (data not shown) were also observed in Arabidopsis plants transformed by the same AtGL2 promoter fragment inserted upstream of a GUS reporter in the vector pMDC163 (Curtis and Grossniklaus, 2003). Activation in Vascular Tissue The previously reported phloem-specific expression pattern of the RolC promoter (Booker et al., 2003) was faithfully reproduced using pMDC150-RolC and a pMDC160-GUS responder T-DNA. Upon induction, GUS activity was observed in the phloem of the roots, stem, leaves, and floral organs (such as sepals, petals, anther filaments, and style; Fig. 3D). The promoter fragment used (An et al., 2004) is nearly identical to that described by Booker and colleagues (2003). Similarly, when using the AtSUC2 promoter to drive XVE expression using pMDC150-SUC2, the GUS gene was conditionally expressed in companion cells, as described by Truernit and Sauer (1995; Fig. 3E). GUS expression was also detected in the phloem throughout the plant; but, whereas Truernit and Sauer (1995) observed no GUS activity in the petals, we observed induced activity in the vascular tissue of all floral organs, including petals (Fig. 3E). In contrast to the promoter fragment of Truernit and Sauer (1995), which extends 156 bp into the protein coding sequence, the promoter fragment we used (An et al., 2004) ends at the start codon of the AtSUC2 gene. Activation in Roots Arabidopsis plants, transformed with pMDC150-TobRB7 and pMDC160-GUS responder T-DNA, showed induced reporter gene expression in mature plants that mimicked the expression pattern described for tobacco (Nicotiana tabacum) plants transformed with TobRB7-GUS constructs (Yamamoto et al., 1991; Fig. 3F). GUS activity in mature roots was restricted to a region of the root above the root apical meristem (Fig. 3G). Activation in the Egg Apparatus Reporter gene expression in the egg apparatus was conditionally activated in Arabidopsis plants transformed with pLB12-AtEASE. The activation unit of this construct contains five tandem repeats of 77-bp AtEASE (a modified version of that previously described by Yang et al., 2005). Upon induction with 17-β-estradiol, GUS activity was observed in the egg apparatus (Fig. 3H). Induced expression was occasionally observed in the entire embryo sac, which is consistent with the previous report that AtEASE is sometimes active prior to cellularization of the female gametophyte (Yang et al., 2005). Efficiency of Tissue-Specific Transactivation To determine whether tissue-specific expression patterns are maintained in independent plant lines, regardless of the genomic position of pMDC160-GUS responder insertions, numerous transformants were generated. The plants used had activation T-DNAs in fixed positions in the genome, showing different patterns of XVE expression (pMDC150-SUC2 and pMDC150-GL2; Table I). Leaves were excised from transformants and analyzed histochemically for GUS activity, with and without induction by 2 μ m 17-β-estradiol; 91.7% of plant lines, with responders inserted at 60 independent loci, gave inducible expression in the expected tissue type (Table I). This confirms that our system could be used to create an inducible activation-tagging system to activate randomly tagged genes in specific tissues or cell types. The reciprocal experiment was also performed, using the AtSUC2 and NtTobRB7 promoters to investigate the ability of randomly inserted pMDC150-promoter T-DNAs to activate a pMDC160-GUS responder plant line (Table I). Fourth-generation responder plant lines were supertransformed with pMDC150-promoter activator T-DNAs. Again, leaves were excised from transformants and analyzed histochemically for GUS activity with and without induction by 17-β-estradiol. Here, 84.6% of plant lines, with pMDC150-promoter activators inserted at 39 independent loci, showed inducible expression in the expected tissue type (Table I). In a pilot study, we produced a T-DNA activation-tagging construct, pMDC220-GUS, and analyzed the GUS expression of 1,465 independent insertions for induced activity in trichomes of a plant line containing the pMDC150-GL2 activator T-DNA (Table I). Here, 71% of plant lines with randomly inserted activation-tagging constructs produced tissue-specific expression that faithfully mimicked both the previously described expression patterns (Szymanski et al., 1998) and that observed in control plant lines: 18.5% showed aberrant expression and 10.5% showed no expression. If we also take the results obtained with other specific promoters into account, our data suggest that 73% of transformants produce an inducible pattern of expression that faithfully mimics that of the promoter/enhancer selected to transactivate the responder T-DNA. A low number of insertions (10.4%) show no expression, suggesting that the activator and the responder are equally likely to insert into regions of the genome that affect their activity. Furthermore, a proportion (16.6%) of transgenic plant lines show aberrant expression, a percentage similar to that observed in experiments with promoter-GUS constructs. Induction of Specific cDNAs Using the 35S XVE Activator To demonstrate the value of the system for high-throughput gene analysis, cDNAs were inducibly expressed in plant tissues corresponding to the expression pattern of the CaMV 35S promoter during seedling development. These cDNAs were selected for their variety of clearly visible phenotypes early in development. They included KNOTTED-LIKE FROM ARABIDOPSIS 1 (KNAT1; Lincoln et al., 1994; Fig. 5 Figure 5. Open in new tabDownload slide Inducible gain-of-function phenotypes: Overexpression of KNAT1 leads to lobed leaf formation. A, 27 d of growth under uninduced conditions. B, 27 d of growth under induced conditions. A and B, Siblings from a plant line containing both the pMDC150-35S activator T-DNA and a pMDC221-KNAT1 responder T-DNA. Figure 5. Open in new tabDownload slide Inducible gain-of-function phenotypes: Overexpression of KNAT1 leads to lobed leaf formation. A, 27 d of growth under uninduced conditions. B, 27 d of growth under induced conditions. A and B, Siblings from a plant line containing both the pMDC150-35S activator T-DNA and a pMDC221-KNAT1 responder T-DNA. ), which produces lobed leaves when misexpressed using the 35S promoter, BABY BOOM (BBM; Boutilier et al., 2002; Fig. 6 Figure 6. Open in new tabDownload slide Inducible expression of BBM leads to the formation of somatic embryos on cotyledons and leaves. A, 13 d of growth under uninduced conditions (mock inoculated Murashige and Skoog medium). B, 13 d of growth under induced conditions (5 μ m 17-β-estradiol in Murashige and Skoog medium). A and B, Sibling plant lines containing both the pMDC150-35S activator T-DNA and a pMDC221-BBM responder T-DNA. C and D, Scanning electron micrographs of induced somatic embryos (C) in plant shown in B. D, Scanning electron micrographs of cotyledons and leaves with induced somatic embryos and a leaf-like outgrowth with a trichome on a cotyledon (arrow). E, 13-d-old plants that constitutively express BBM under the control of the CaMV 35S promoter (seeds kindly provided by Kim Boutilier). Figure 6. Open in new tabDownload slide Inducible expression of BBM leads to the formation of somatic embryos on cotyledons and leaves. A, 13 d of growth under uninduced conditions (mock inoculated Murashige and Skoog medium). B, 13 d of growth under induced conditions (5 μ m 17-β-estradiol in Murashige and Skoog medium). A and B, Sibling plant lines containing both the pMDC150-35S activator T-DNA and a pMDC221-BBM responder T-DNA. C and D, Scanning electron micrographs of induced somatic embryos (C) in plant shown in B. D, Scanning electron micrographs of cotyledons and leaves with induced somatic embryos and a leaf-like outgrowth with a trichome on a cotyledon (arrow). E, 13-d-old plants that constitutively express BBM under the control of the CaMV 35S promoter (seeds kindly provided by Kim Boutilier). ), and LEAFY COTYLEDON 2 (LEC2; Stone et al., 2001; Fig. 7 Figure 7. Open in new tabDownload slide Induced ectopic LEC2 expression results in the formation of somatic embryos on cotelydons. A, 29 d of growth under uninduced conditions (mock inoculated Murashige and Skoog medium). B, 29 d of growth under induced conditions (5 μ m 17-β-estradiol in Murashige and Skoog medium). A and B, Sibling plant lines containing both the pMDC150-35S activator T-DNA and a pMDC221-LEC2 responder T-DNA. Figure 7. Open in new tabDownload slide Induced ectopic LEC2 expression results in the formation of somatic embryos on cotelydons. A, 29 d of growth under uninduced conditions (mock inoculated Murashige and Skoog medium). B, 29 d of growth under induced conditions (5 μ m 17-β-estradiol in Murashige and Skoog medium). A and B, Sibling plant lines containing both the pMDC150-35S activator T-DNA and a pMDC221-LEC2 responder T-DNA. ), both triggering a conversion from vegetative to embryonic growth when misexpressed using the 35S promoter. Using this system, we were able to inducibly reproduce phenotypes previously described in the literature. To determine whether this system could identify the phenotype of a gene that would be overlooked by conventional ectopic gene expression methods, we inducibly expressed the FUSCA3 (FUS3) gene. Because mutations in this gene can cause viviparous seed development (Raz et al., 2001), our prediction was that ectopic expression of FUS3 would cause seed dormancy. In conventional ectopic gene expression approaches, dormant seeds would be indistinguishable from nontransformants on selection plates after transformation. Plants containing both pMDC150-35S and pMDC221-FUS3 were selected in the absence of induction and showed no mutant phenotype (Fig. 8A Figure 8. Open in new tabDownload slide The ectopic expression of FUSCA3 produces a seed dormancy phenotype. Plant lines containing both the pMDC150-35S activator T-DNA and a pMDC221-FUS3 responder T-DNA. A, Uninduced. B, Induced with 2 μ m 17-β-estradiol. C, Induced with 5 μ m 17-β-estradiol after 14 d. Figure 8. Open in new tabDownload slide The ectopic expression of FUSCA3 produces a seed dormancy phenotype. Plant lines containing both the pMDC150-35S activator T-DNA and a pMDC221-FUS3 responder T-DNA. A, Uninduced. B, Induced with 2 μ m 17-β-estradiol. C, Induced with 5 μ m 17-β-estradiol after 14 d. ). The T1 generation double transformants showed delayed germination when exposed to 2 μ m 17-β-estradiol (Fig. 8B), with a stronger seed dormancy phenotype when exposed to 5 μ m 17-β-estradiol (Fig. 8C). Similar plant lines misexpressing the GUS reporter gene, instead of FUS3, in the presence of 2 and 5 μ m 17-β-estradiol grow normally. Seedlings containing both pMDC150-35S and pMDC221-FUS3 T-DNAs that were able to germinate on 2 μ m 17-β-estradiol showed abnormal growth, with extended hypocotyls and a tendency to produce reduced leaves (data not shown). Seeds that showed prolonged seed dormancy when incubated in the presence of the inducer occasionally germinated several weeks later when left on the same plates. Degradation of the light-sensitive 17-β-estradiol may account for the initiation of this low germination rate. When dormant seeds exposed to 5 μ m 17-β-estradiol were transferred to noninductive media, within a short period of 5 d, 65% of seeds germinated and looked normal, no longer showing any aberrant growth phenotypes associated with ectopic FUS3 expression. When these transferred seeds were examined 9 d later, 92% had germinated; however, some of the later germinating seeds (17% of the total) showed the abnormal growth phenotype observed earlier, with extended hypocotyls and a tendency to produce reduced leaves. In fact, our findings are consistent with previously published data (Zuo et al., 2006), which suggest that induced phenotypes, in general, return to wild type in the absence of the inducer after a period of 5 to 7 d. The time taken, however, to return to a wild-type phenotype is gene dependent, perhaps reflecting the stability of a transcript or the type of downstream effects that result from its ectopic expression. DISCUSSION Local Inducible Expression Here, we describe a highly versatile, inducible gene expression system that provides both spatial and temporal control of gene expression in plants. The system allows rapid production of cell type-specific activation constructs. These activation constructs can faithfully reproduce expression patterns previously described for six promoters/enhancers with different tissue or cell type specificities (Odell et al., 1985; Truernit and Sauer, 1995; Szymanski et al., 1998; Booker et al., 2003; An et al., 2004; Yang et al., 2005). About 90% of plant lines containing the pMDC150-derived activator and pMDC160-derived responder showed inducible expression in the expected tissue type regardless of the position of the T-DNAs in the genome (Table I). These results demonstrate that the system can be used to generate a library of tissue-specific transactivator plant lines for the activation of LexA responder T-DNAs and that the genomic location of the activator or responder is equally likely to affect the reliability of the system. Such efficiency could not be easily achieved using systems in which gene expression is more sensitive to the position of the responder T-DNA (Baroux et al., 2005). In the system described by Baroux et al. (2005), a similar percentage of plant lines (to data presented here) showed transactivated GUS expression (i.e. 86% of plant lines with GUS responders inserted at 37 independent loci), but expression was often weak (e.g. 93.5% of these plant lines showed weak expression). In reciprocal experiments in which one of their strongly responsive pOp-GUS responder plant lines was supertransformed or crossed to a variety of promoter-LhG4 activators, 78.4% of plant lines showed strong GUS expression in predictable patterns with promoter-LhG4 activators at 74 independent loci. The positional effects observed for responders in the LhG4 system may be caused by the relatively weak activity of the Gal4 activation domain used in the LhG4 chimeric transcription factor. The more active VP16 activation domain used in XVE, when tethered to DNA-binding domains, stimulates transcription by targeting histone-acetylating complexes to nucleosomal templates (Tumbar et al., 1999; Vignali et al., 2000), resulting in the decondensation of heterochromatin. Similar decondensation of chromatin structure has been observed around the target sites of the estrogen receptor (Nye et al., 2002), suggesting that elements of XVE's estrogen receptor may also contribute to the higher efficiency of strong gene expression observed in our system. These results suggest that our system could be used to create cell type-specific and inducible activation-tagging technology. Our activation-tagging pilot study showed that 71% of plant lines produced tissue-specific expression that faithfully mimicked both the previously described expression pattern (Szymanski et al., 1998) and that observed in control plant lines; 18.5% showed aberrant expression and 10.5% showed no expression. This frequency is sufficient for use in activation-tagging screens. Tightly Regulated Gene Expression A good inducible gene expression system must deliver tightly regulated gene expression. Experiments using the cytotoxin DT-A gene demonstrated that our system is stringently regulated: Despite containing a cytotoxic gene, plants develop normally in the absence of induction. Such a system that provides stringent control of a cytotoxic gene, in combination with the ability to restrict expression to a subset of cells, creates the opportunity to study plants in which certain cell types are ablated. Production of Induced Sectors of Gene Expression The nonvolatile nature of the 17-β-estradiol inducer provides a further advantage because it can be applied to restricted sectors of the plant. This means that interesting mutants with lethal effects can be rescued in noninduced sectors. This is of particular importance when attempting to deregulate gene expression in reproductive tissues, where gene induction may produce sterility or lethal effects in the next generation. Gateway Compatibility and Induced Gene Expression Our system also benefits from the inclusion of Gateway cloning sites, making the system compatible with the growing collections of full-length cDNA entry clones that are available in the Arabidopsis Stock Center (Gong et al., 2004) and promoter/enhancer cis-elements (An et al., 2004) that allow these genes to be expressed inducibly in any plant cell-type. We used one such cDNA from the Stock Center, pYAT5G17430 (BBM; Gong et al., 2004) to induce somatic embryos on vegetative tissues, faithfully reproducing the previously described phenotype (Boutilier et al., 2002). We also generated Gateway-compatible cDNAs and inducibly misexpressed both KNAT1 (replicating the lobed-leaf phenotype described by Lincoln et al., 1994) and LEC2 (conditionally reproducing the development of somatic embryos in vegetative tissue described by Stone et al., 2001). Plant lines that show inducible FUS3 expression illustrate the value of our gene expression system. In the absence of induction, transformants can be selected on antibiotic plates and their phenotype determined after induction in subsequent generations. Because ectopic expression of FUS3 using the constitutive and near-ubiquitous CaMV 35S promoter would lead to seed dormancy (as shown with the inducible system), primary transformants would be overlooked because they would be indistinguishable from nontransformants on a selection plate. Despite its strong effect on dormancy when ectopically expressed, FUS3 expression during seed development shows only marginal differences between strongly or moderately dormant Arabidopsis wild-type accessions (Baumbusch et al., 2004). This may reflect the sensitivity of seeds to the expression of FUS3. Significantly, FUS3 has been implicated as a positive regulator of abscisic acid synthesis (Gazzarrini et al., 2004) and abscisic acid is a hormone known to promote seed dormancy (Koornneef et al., 2002). The altered expression of such dormancy genes has the potential to resolve problems of preharvest sprouting in agriculturally important seeds (for review, see Gubler et al., 2005). Modulated Gene Expression A further advantage of an XVE-dependent system is that gene expression levels can be modulated using different concentrations of inducer (Zuo et al., 2002). In experiments with KNAT1 and FUS3 plant lines, different concentrations of 17-β-estradiol also modulated the severity of the phenotypes. This allows both weak and strong phenotypes to be examined in the same plant line—a real advantage. The method of inducer application can be adapted to suit requirements. We have developed a number of inducer application methods to enhance the flexibility of our system. Gene expression can be induced in roots, germinating seed, and juvenile plants by growing plants directly on Murashige and Skoog medium supplemented with 17-β-estradiol. Young seedlings can be induced after initial stages of development by transferring them to inductive media. In older plants, gene expression can be induced by allowing excised branches (inflorescences) to take up 17-β-estradiol by transpiration or, for reporter analysis, by submerging tissue in 17-β-estradiol solution. Alternatively 17-β-estradiol can be applied in 0.01% (v/v) Silwet 77 to plant sectors using an artist's paint brush. Less accessible cell types (e.g. the female gametophyte) can be induced by transpiration, painting, or dipping flowers in 17-β-estradiol solution containing 0.02% (v/v) Break thru S240 (Goldschmidt GmbH). This reduces surface tension and allows 17-β-estradiol to spread quickly over the hydrophobic cuticle. We do not recommend spraying 17-β-estradiol because, like glucocorticoid steroids, estrogens play a significant role in human physiology and should be used with caution. In summary, we have shown that the two-component, Gateway-compatible XVE system can be used to generate faithful patterns of expression at high frequencies. The frequency with which these patterns are observed is largely independent of the position of both the activator and the responder in the genome. As the number of activator plant lines with cell type-specific activity rapidly grows and more Gateway-compatible, full-length cDNA libraries become available, this system will allow the inducible expression of any gene to be studied in any plant tissue type. This type of system will help to determine the cell type in which a gene's activity is required (i.e. for complementation studies). Furthermore, the system can be used to generate conditional mutant alleles to complement the early lethal effects of a mutation, revealing the effects of the same mutation at later stages of development. Similarly, mutations affecting early zygotic development could be conditionally complemented to generate seeds for second-site mutagenesis, revealing bypass mutants to the primary lesion in viable progeny. Conditional cell type-specific gene expression could further the development and analysis of new phenotypic traits, such as apomixis or dwarfism. Such analysis will be of particular relevance to the development of novel crop traits in which widespread transgene expression could impair plant viability or fertility (Curtis and Grossniklaus, 2006). MATERIALS AND METHODS Plasmid Construction Standard gene-cloning methods (Sambrook and Russell, 2001) were used to make the constructs. An attR1-Cmr-ccdB-attR2 integration region from the Gateway cloning system (Invitrogen) was placed downstream of a promoter containing the LexA binding site (OlexA) and basal CaMV 35S promoter (TATA box; Zuo et al., 2000; Curtis and Grossniklaus, 2003). This responder cassette was subcloned into derivatives of the pCAMBIA (http://www.cambia.org; pMDC8) and derivatives of the pMoa vector series (Barrell et al., 2002; pMDC160, pMDC220, pMDC221, and pLB12), to produce responder constructs with a variety of selectable markers (see Fig. 1). The AtGL2 promoter was amplified from Arabidopsis (Arabidopsis thaliana) Columbia genomic DNA using PCR with Gateway adapter-GL2 promoter-specific primers (5′-AAAAAGCAGGCTAAGCTTTTGAATTGTAGATAAATCATCTGC-3′and 5′-AGAAAGCTGGGTGCTAGCTTCTTTGCTTAATTATGATCTCTTCCC-3′). This PCR fragment could not be further amplified with attB1 and attB2 adapter primers, as recommended by Invitrogen, and was, therefore, digested with EcoRI and NheI to yield a truncated fragment of 1.5 kb. This AtGL2-promoter fragment was cloned into the EcoRI and XbaI sites of the pBluescript vector (CLONTECH) and amplified using Gateway-compatible primers designed to anneal to the T7 and T3 primer sequences of the pBluescript vector. The forward primer contained the AttB1 tail T7 sequence (5′-GGGGACAAGTTTGTACAAAAAAGCAGGCTGTAATACGACTCACTATAGGGC-3′) and the reverse primer contained the AttB2 tail T3 sequence (5′-GGGGACCACTTTGTACAAGAAAGCTGGCTAATTAACCCTCACTAAAGGG-3′). The 35S promoter/enhancer region was amplified from the pCAMBIA 3300 plasmid (http://www.cambia.org) using PCR with the Gateway adapter-CaMV 35S promoter-specific primers 35S-F (5′-AAAAAGCAGGCTGTTTGCGTATTGGCTAGAGCAGCTTG-3′) and 35S-R (5′-AGAAAGCTGGGTGCGTCATCCCTTACGTCAGTGGAG-3′) and the AtEASE sequence was amplified from pWY-093.1 (Yang et al., 2005) using the Gateway adapter-AtEASE enhancer-specific primers EAFP (5′-GGGGACAAGTTTGTACAAAAAAGCAGGCTCCACGATGCAAATATATCG-3′) and EARP (5′-GGGGACCACTTTGTACAAGAAAGCTGGGTGCCTTAATATCATACGAAAG-3′). The plasmid pWY-093.1 contains four tandem repeats of AtEASE; however, due to these repeats, our amplified fragment fortuitously contained five tandem repeats. H. An and G. Coupland provided the AtSUC2, ROLC, and NtTobRB7 promoters as entry clones (An et al., 2004). The Gateway-compatible PCR products AtGL2, CaMV 35S, and AtEASE were introduced into the Gateway pDONR207 (Invitrogen) vector using BP reactions to generate promoter entry clones. Different promoter fusions in the vectors pMDC163 (Curtis and Grossniklaus, 2003), pMDC150, and LB12 were produced using LR (Invitrogen) reactions. The DT-A entry clone was generated by amplifying the DT-A chain from plasmid pIBI30-DT-A (Maxwell et al., 1986; Harrison et al., 1991), flanked by attB sites using the primers DIP-forward-attB1 (5′-AAAAAGCAGGCTATGGATCCTGATGATGTTGT-3′) and DIP-reverse-attB2 (5′-AGAAAGCTGGGTCACAAAGATCGCCTGACACG-3′). The amplified product was integrated into pDONR207 using BP clonase and subsequently integrated into pMDC221 using LR clonase. The pYAT5G17430 entry clone (Gong et al., 2004), containing the full-length cDNA of BBM, was obtained from the Arabidopsis Biological Resource Center (ABRC) stock center. This was integrated into the vector pMDC221 using LR clonase. The KNAT1 entry clone was generated by subcloning a KpnI-NotI fragment that contained the full-length cDNA from the clone U10690 (Yamada et al., 2003), obtained from the ABRC, into the KpnI-NotI sites of pENTR1A vector (Invitrogen). This fragment, flanked by attL1 and attL2 sites, was integrated by into the vector pMDC221 using LR clonase. The FUS3 and LEC2 entry clones were generated by amplifying full-length cDNAs kindly provided by Francois Parcy, flanked by attB sites using the primers FUS3-forward-attB1 (5′-AAAAAGCAGGCTATGGTTGATGAAAATGTGG-3′) and FUS3-reverse-attB2 (5′-AGAAAGCTGGGTCTAGTAGAAGTCATCGAGAG-3′), and LEC2-forward-attB1 (5′-AAAAAGCAGGCTATGGATAACTTCTTACCCTTTCC-3′) and LEC2-reverse-attB2 (5′-AGAAAGCTGGGTTCACCACCACTCAAAGTCG-3′). The amplified product was integrated into pDONR221 using BP clonase and subsequently integrated into pMDC221 using LR clonase. Plant Materials, Growth Conditions, and Plant Transformation Arabidopsis Landsberg erecta plants were used for plant transformation using the floral-dip method (Clough and Bent, 1998). Plants were grown under 14-h white light/10-h dark at 22°C on Murashige and Skoog agar (1× Murashige and Skoog salts, 3% Suc, 0.8% agar) or in the greenhouse for mature plants. 17-β-Estradiol Induction Methods A stock of 20 mm 17-β-estradiol (Sigma-Aldrich) in 70% ethanol or 100% dimethylsulfoxide was made and stored at −20°C in small aliquots (17-β-estradiol is light sensitive and its activity slowly declines in a light intensity-dependent manner). The ethanol alone has no effect on transgene expression and at a concentration of ≤0.1% (v/v) in sterile culture media has no inhibitory effect on seed germination. Chemical treatments were carried out by either germinating seeds directly on Murashige and Skoog medium supplemented with 10, 5, or 2 μ m 17-β-estradiol, or by transferring seedlings from a noninductive to an inductive medium. Alternative methods include 17-β-estradiol uptake through transpiration or by submerging tissue in 17-β-estradiol solution. Restricted regions of the plant can be induced by applying 17-β-estradiol solution, supplemented with 0.01% Silwet 77 with an artist's paint brush to sectors of the plant. When applying 17-β-estradiol to the exterior of the flower buds to induce expression in the egg apparatus deep within the carpels, 17-β-estradiol solution was supplemented with 0.02% Break thru S240 (Goldschmidt GmbH), which aids spreading. GUS Staining In situ GUS staining was carried out by vacuum infiltrating GUS staining solution (50 mm sodium phosphate buffer, pH 7.0, 1 mm EDTA, 0.5 mg/mL 5-bromo-4-chloro-3-indolyl β-d GlcUA [X-Gluc; Biosynth AG], 0.4% Triton X-100, 100 mg/mL chloramphenicol, and 5 mm each of potassium ferri/ferrocyanide), and incubating at 37°C for 24 h. ACKNOWLEDGMENTS We thank Nam-Hai Chua (Rockefeller University) for kindly providing the vector PER8 and George Coupland and Hailong An (Max Planck Institut für Züchtungsforschung) for kindly providing the entry clones containing the promoters for NtTobRB7, AtSUC2, and RolC. We thank Ian Maxwell (University of Colorado) for the plasmid pIBI30-DT-A, the ABRC for distributing BBM entry clone PYAT5G17430 and the full-length cDNA clone U10690, and Francois Parcy for cDNA clones containing LEC2 and FUS3. We thank Valeria Gagliardini, Jana Schneider, and Brigitte Gabathuler for help with sequencing, Peter Kopf for technical assistance, and Urs Jauch for scanning electron microscopy. We are also grateful to Célia Baroux, Margaret Collinge, and Siân Curtis for critical reading of the manuscript. LITERATURE CITED An H, Roussot C, Suarez-Lopez P, Corbesier L, Vincent C, Pineiro M, Hepworth S, Mouradov A, Justin S, Turnbull C, et al ( 2004 ) CONSTANS acts in the phloem to regulate a systemic signal that induces photoperiodic flowering of Arabidopsis. Development 131 : 3615 –3626 Baroux C, Blanvillain R, Betts H, Batoko H, Craft J, Martinez A, Gallois P, Moore I ( 2005 ) Predictable activation of tissue-specific expression from a single gene locus using the pOp/LhG4 transactivation system in Arabidopsis. Plant Biotechnol J 3 : 91 –101 Barrell PJ, Yongjin S, Cooper PA, Conner AJ ( 2002 ) Alternative selectable markers for potato transformation using minimal T-DNA vectors. Plant Cell Tissue Organ Cult 70 : 61 –68 Baumbusch LO, Hughes DW, Galau GA, Jakobsen KS ( 2004 ) LEC1, FUS3, ABI3 and Em expression reveals no correlation with dormancy in Arabidopsis. J Exp Bot 55 : 77 –87 Blanc G, Barakat A, Guyot R, Cooke R, Delseny M ( 2000 ) Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell 12 : 1093 –1101 Booker J, Chatfield S, Leyser O ( 2003 ) Auxin acts in xylem-associated or medullary cells to mediate apical dominance. Plant Cell 15 : 495 –507 Boutilier K, Offringa R, Sharma VK, Kieft H, Ouellet T, Zhang L, Hattori J, Liu CM, van Lammeren AA, Miki BL, et al ( 2002 ) Ectopic expression of BABY BOOM triggers a conversion from vegetative to embryonic growth. Plant Cell 14 : 1737 –1749 Clough SJ, Bent AF ( 1998 ) Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J 16 : 735 –743 Collier RJ ( 1967 ) Effect of diphtheria toxin on protein synthesis: inactivation of one of the transfer factors. J Mol Biol 25 : 83 –98 Curtis MD, Grossniklaus U ( 2003 ) A gateway cloning vector set for high-throughput functional analysis of genes in planta. Plant Physiol 133 : 462 –469 Curtis MD, Grossniklaus U ( 2005 ) Thale cress (Arabidopsis thaliana) genome. In RA Meyers, ed, Encyclopedia of Molecular Cell Biology and Molecular Medicine. Wiley-VCH, Weinheim, Germany, pp 245–282 Curtis MD, Grossniklaus U ( 2006 ) Conditional gene expression in plants. In JA Teixeira da Silva, ed, Floriculture, Ornamental and Plant Biotechnology: Advances and Topical Issues, Ed 1, Vol 2. Global Science Books, London, pp 77–87 De Veylder L, Beeckman T, Van Montagu M, Inze D ( 2000 ) Increased leakiness of the tetracycline-inducible Triple-Op promoter in dividing cells renders it unsuitable for high inducible levels of a dominant negative CDC2aAt gene. J Exp Bot 51 : 1647 –1653 Deveaux Y, Peaucelle A, Roberts GR, Coen E, Simon R, Mizukami Y, Traas J, Murray JA, Doonan JH, Laufs P ( 2003 ) The ethanol switch: a tool for tissue-specific gene induction during plant development. Plant J 36 : 918 –930 Eshed Y, Baum SF, Perea JV, Bowman JL ( 2001 ) Establishment of polarity in lateral organs of plants. Curr Biol 11 : 1251 –1260 Gazzarrini S, Tsuchiya Y, Lumba S, Okamoto M, McCourt P ( 2004 ) The transcription factor FUSCA3 controls developmental timing in Arabidopsis through the hormones gibberellin and abscisic acid. Dev Cell 7 : 373 –385 Gong W, Shen YP, Ma LG, Pan Y, Du YL, Wang DH, Yang JY, Hu LD, Liu XF, Dong CX, et al ( 2004 ) Genome-wide ORFeome cloning and analysis of Arabidopsis transcription factor genes. Plant Physiol 135 : 773 –782 Gross-Hardt R, Lenhard M, Laux T ( 2002 ) WUSCHEL signaling functions in interregional communication during Arabidopsis ovule development. Genes Dev 16 : 1129 –1138 Gubler F, Millar AA, Jacobsen JV ( 2005 ) Dormancy release, ABA and pre-harvest sprouting. Curr Opin Plant Biol 8 : 183 –187 Harrison GS, Maxwell F, Long CJ, Rosen CA, Glode LM, Maxwell IH ( 1991 ) Activation of a diphtheria toxin A gene by expression of human immunodeficiency virus-1 Tat and Rev proteins in transfected cells. Hum Gene Ther 2 : 53 –60 Hartley JL, Temple GF, Brasch MA ( 2000 ) DNA cloning using in vitro site-specific recombination. Genome Res 10 : 1788 –1795 Hung CY, Lin Y, Zhang M, Pollock S, Marks MD, Schiefelbein J ( 1998 ) A common position-dependent mechanism controls cell-type patterning and GLABRA2 regulation in the root and hypocotyl epidermis of Arabidopsis. Plant Physiol 117 : 73 –84 Jack T, Fox GL, Meyerowitz EM ( 1994 ) Arabidopsis homeotic gene APETALA3 ectopic expression: transcriptional and posttranscriptional regulation determine floral organ identity. Cell 76 : 703 –716 Jefferson RA, Kavanagh TA, Bevan MW ( 1987 ) GUS fusions: beta-glucuronidase as a sensitive and versatile gene fusion marker in higher plants. EMBO J 6 : 3901 –3907 Koornneef M, Bentsink L, Hilhorst H ( 2002 ) Seed dormancy and germination. Curr Opin Plant Biol 5 : 33 –36 Laufs P, Coen E, Kronenberger J, Traas J, Doonan J ( 2003 ) Separable roles of UFO during floral development revealed by conditional restoration of gene function. Development 130 : 785 –796 Lincoln C, Long J, Yamaguchi J, Serikawa K, Hake S ( 1994 ) A knotted1-like homeobox gene in Arabidopsis is expressed in the vegetative meristem and dramatically alters leaf morphology when overexpressed in transgenic plants. Plant Cell 6 : 1859 –1876 Maizel A, Weigel D ( 2004 ) Temporally and spatially controlled induction of gene expression in Arabidopsis thaliana. Plant J 38 : 164 –171 Martinez A, Sparks C, Hart CA, Thompson J, Jepson I ( 1999 ) Ecdysone agonist inducible transcription in transgenic tobacco plants. Plant J 19 : 97 –106 Matsuhara S, Jingu F, Takahashi T, Komeda Y ( 2000 ) Heat-shock tagging: a simple method for expression and isolation of plant genome DNA flanked by T-DNA insertions. Plant J 22 : 79 –86 Maxwell IH, Maxwell F, Glode LM ( 1986 ) Regulated expression of a diphtheria toxin A-chain gene transfected into human cells: possible strategy for inducing cancer cell suicide. Cancer Res 46 : 4660 –4664 Mizukami Y, Ma H ( 1992 ) Ectopic expression of the floral homeotic gene AGAMOUS in transgenic Arabidopsis plants alters floral organ identity. Cell 71 : 119 –131 Nye AC, Rajendran RR, Stenoien DL, Mancini MA, Katzenellenbogen BS, Belmont AS ( 2002 ) Alteration of large-scale chromatin structure by estrogen receptor. Mol Cell Biol 22 : 3437 –3449 Odell JT, Nagy F, Chua NH ( 1985 ) Identification of DNA sequences required for activity of the cauliflower mosaic virus 35S promoter. Nature 313 : 810 –812 Pien S, Wyrzykowska J, McQueen-Mason S, Smart C, Fleming A ( 2001 ) Local expression of expansin induces the entire process of leaf development and modifies leaf shape. Proc Natl Acad Sci USA 98 : 11812 –11817 Raz R, Bergervoet JHW, Koornneef M ( 2001 ) Sequential steps for the developmental arrest in Arabidopsis seeds. Development 128 : 243 –252 Roslan HA, Salter MG, Wood CD, White MR, Croft KP, Robson F, Coupland G, Doonan J, Laufs P, Tomsett AB, et al ( 2001 ) Characterization of the ethanol-inducible alc gene-expression system in Arabidopsis thaliana. Plant J 28 : 225 –235 Salter MG, Paine JA, Riddell KV, Jepson I, Greenland AJ, Caddick MX, Tomsett AB ( 1998 ) Characterisation of the ethanol-inducible alc gene expression system for transgenic plants. Plant J 16 : 127 –132 Sambrook J, Russell D ( 2001 ) Molecular Cloning: A Laboratory Manual, Ed 3. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY Stone SL, Kwong LW, Yee KM, Pelletier J, Lepiniec L, Fischer RL, Goldberg RB, Harada JJ ( 2001 ) LEAFY COTYLEDON2 encodes a B3 domain transcription factor that induces embryo development. Proc Natl Acad Sci USA 98 : 11806 –11811 Szymanski DB, Jilk RA, Pollock SM, Marks MD ( 1998 ) Control of GL2 expression in Arabidopsis leaves and trichomes. Development 125 : 1161 –1171 Truernit E, Sauer N ( 1995 ) The promoter of the Arabidopsis thaliana SUC2 sucrose-H+ symporter gene directs expression of β-glucuronidase to the phloem: evidence for phloem loading and unloading by SUC2. Planta 196 : 564 –570 Tumbar T, Sudlow G, Belmont AS ( 1999 ) Large-scale chromatin unfolding and remodeling induced by VP16 acidic activation domain. J Cell Biol 145 : 1341 –1354 Vignali M, Steger DJ, Neely KE, Workman JL ( 2000 ) Distribution of acetylated histones resulting from Gal4-VP16 recruitment of SAGA and NuA4 complexes. EMBO J 19 : 2629 –2640 Weijers D, Van Hamburg JP, Van Rijn E, Hooykaas PJ, Offringa R ( 2003 ) Diphtheria toxin-mediated cell ablation reveals interregional communication during Arabidopsis seed development. Plant Physiol 133 : 1882 –1892 Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M, et al ( 2003 ) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302 : 842 –846 Yamamoto YT, Taylor CG, Acedo GN, Cheng CL, Conkling MA ( 1991 ) Characterization of cis-acting sequences regulating root-specific gene expression in tobacco. Plant Cell 3 : 371 –382 Yang W, Jefferson RA, Huttner E, Moore JM, Gagliano WB, Grossniklaus U ( 2005 ) An egg apparatus-specific enhancer of Arabidopsis, identified by enhancer detection. Plant Physiol 139 : 1421 –1432 Yoo SY, Bomblies K, Yoo SK, Yang JW, Choi MS, Lee JS, Weigel D, Ahn JH ( 2005 ) The 35S promoter used in a selectable marker gene of a plant transformation vector affects the expression of the transgene. Planta 221 : 523 –530 Zuo J, Hare PD, Chua NH ( 2006 ) Applications of chemical-inducible expression systems in functional genomics and biotechnology. In J Salinas, JJ Sanchez-Serrano, eds, Methods in Molecular Biology-Arabidopsis Protocols. Humana Press, Totowa, NJ Zuo J, Niu QW, Chua NH ( 2000 ) Technical advance: an estrogen receptor-based transactivator XVE mediates highly inducible gene expression in transgenic plants. Plant J 24 : 265 –273 Zuo J, Niu QW, Frugis G, Chua NH ( 2002 ) The WUSCHEL gene promotes vegetative-to-embryonic transition in Arabidopsis. Plant J 30 : 349 –359 Author notes 1 This work was supported by the Swiss National Science Foundation (grant no. 3100A0–100281 to M.D.C. and grant no. 3100–064061 to U.G.), the University of Zürich, and the Forschungskredit of the University of Zürich (to M.D.C.). 2 Present address: Institute of Zoology, University of Zürich, Winterthurerstr. 190, CH–8057 Zürich, Switzerland. 3 Present address: Crop and Food Research, Private Bag 4704, Christchurch, New Zealand. * Corresponding author; e-mail [email protected]; fax 4116348204. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Mark D. Curtis ([email protected]). [W] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.106.081299. © 2006 American Society of Plant Biologists This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

journal article

LitStream Collection

A Liquid Chromatography-Mass Spectrometry-Based Metabolome Database for Tomato

Moco, Sofia; Bino, Raoul J.; Vorst, Oscar; Verhoeven, Harrie A.; de Groot, Joost; van Beek, Teris A.; Vervoort, Jacques; de Vos, C.H. Ric

2006 Plant Physiology

doi: 10.1104/pp.106.078428pmid: 16896233

Abstract For the description of the metabolome of an organism, the development of common metabolite databases is of utmost importance. Here we present the Metabolome Tomato Database (MoTo DB), a metabolite database dedicated to liquid chromatography-mass spectrometry (LC-MS)- based metabolomics of tomato fruit (Solanum lycopersicum). A reproducible analytical approach consisting of reversed-phase LC coupled to quadrupole time-of-flight MS and photodiode array detection (PDA) was developed for large-scale detection and identification of mainly semipolar metabolites in plants and for the incorporation of the tomato fruit metabolite data into the MoTo DB. Chromatograms were processed using software tools for mass signal extraction and alignment, and intensity-dependent accurate mass calculation. The detected masses were assigned by matching their accurate mass signals with tomato compounds reported in literature and complemented, as much as possible, by PDA and MS/MS information, as well as by using reference compounds. Several novel compounds not previously reported for tomato fruit were identified in this manner and added to the database. The MoTo DB is available at http://appliedbioinformatics.wur.nl and contains all information so far assembled using this LC-PDA-quadrupole time-of-flight MS platform, including retention times, calculated accurate masses, PDA spectra, MS/MS fragments, and literature references. Unbiased metabolic profiling and comparison of peel and flesh tissues from tomato fruits validated the applicability of the MoTo DB, revealing that all flavonoids and α-tomatine were specifically present in the peel, while several other alkaloids and some particular phenylpropanoids were mainly present in the flesh tissue. For understanding the dynamic behavior of a complex biological system, it is essential to follow, as unbiased as possible, its response to a conditional perturbation at the transcriptome, proteome, and metabolome levels. To study the dynamics of the metabolome, to analyze fluxes in metabolic pathways, and to decipher the biological roles of metabolites, the identification of the participating metabolites should be as unambiguous as possible. Metabolomics is defined as the analysis of all metabolites in an organism and concerns the simultaneous (multiparallel) measurement of all metabolites in a given biological system (Dixon and Strack, 2003). However, this is a technically challenging task, as no single analytical method is capable of extracting and detecting all metabolites at once due to the enormous chemical variety of metabolites and the large range of concentrations at which metabolites can be present. Therefore, the characterization of a complete metabolome requires different complementary analytical technologies. Currently, mass spectrometry (MS) is the most sensitive method enabling the detection of hundreds of compounds within single extracts. Ideally, metabolome data should be incorporated into open access databases where information can be viewed, sorted, and matched. Different pathway resources are available that combine information from the omics technologies such as the Kyoto Encyclopedia of Genes and Genomes (http://www.genome.jp/kegg), MetaCyc (http://metacyc.org), or The Arabidopsis Information Resource (http://www.arabidopsis.org). Hitherto, research on plant metabolic profiling using chromatographic techniques coupled to MS technologies for database purposes has been accomplished by gas chromatography (GC)-MS analysis of extracts (Schauer et al., 2005; Tikunov et al., 2005). GC-MS entails high reproducibility in both chromatography and mass fragmentation patterns. This reproducibility enabled the development of common metabolite databases, e.g. [email protected] (http://csbdb.mpimp-golm.mpg.de/csbdb/gmd/gmd.html) and the Fiehn-Library (http://fiehnlab.ucdavis.edu/compounds), that gather information mainly on primary metabolites. Liquid chromatography (LC)-MS is the preferred technique for the separation and detection of the large and often unique group of semipolar secondary metabolites in plants. Specifically, high resolution accurate mass MS enables the detection of large numbers of parent ions present in a single extract and can provide valuable information on the chemical composition and thus the putative identity of large numbers of metabolites. Recently, accurate mass LC-MS was performed to detect secondary metabolites present in roots and leaves of Arabidopsis (Arabidopsis thaliana; von Roepenack-Lahaye et al., 2004), to study metabolic alterations in a light-hypersensitive mutant of tomato (Solanum lycopersicum; Bino et al., 2005), and to compare tubers of potato (Solanum tuberosum) of different genetic origin and developmental stages (Vorst et al., 2005). The variety of LC-MS systems, and the generally poorer retention time reproducibility of LC compared to GC, limits the establishment of a single optimized analytical procedure and hampers the comparison of LC-MS chromatograms between laboratories. Moreover, software tools able to transform automatically MS data into a list of (putative) plant metabolites, in particular for LC-MS, are not yet available. This implies that analyses of mass signal datasets are left to manual searches in the available chemical databases such as SciFinder, PubChem, or Dictionary of Natural Products. To extend the applicability of LC-MS in plant metabolomics, efforts should be made in (1) the establishment of a routine and reproducible LC-MS method, (2) the annotation of the large numbers of mass signals detected, (3) the unambiguous identification of compounds, and (4) the development of a common reference database and searching tools for secondary metabolites in plants. In this article we present an open access metabolite database for LC-MS, called Metabolome Tomato Database (MoTo DB), dedicated to tomato fruit. This database is based on literature information combined with experimental data derived from LC-MS-based metabolomics experiments. A reproducible and robust C18-based reversed-phase LC-photodiode array detection (PDA)-electrospray ionization (ESI)-quadrupole time-of-flight (QTOF)-MS method was developed for the detection and putative identification of predominantly secondary metabolites of semipolar nature. The assignment of mass signals detected relies on the combination of the parameters: (1) accurate mass, (2) retention time, (3) UV/Vis spectral information, and (4) MS/MS fragmentation data. To demonstrate the applicability of the established LC-MS metabolomics platform including database searching, peel and flesh tissues from ripe tomato fruit were compared for differences in metabolic composition. Statistically significant differences in LC-QTOF MS profiles between the tissues were identified in an unbiased manner, and differential mass peaks were annotated by searching in the MoTo DB. Several compounds not previously reported in tomato were also identified and have been incorporated into the database. All available information in the MoTo DB can be searched at http://appliedbioinformatics.wur.nl. RESULTS Metabolites Present in Tomato Fruit According to Literature First, a database was constructed based on literature research to include metabolites reported to be present in tomato fruit from both wild and cultivated varieties as well as transgenic tomato plants. Though some tomato varieties are known to contain anthocyanins in their fruit (Jones et al., 2003), so far, to our knowledge, there are no reports on the identification of this class of compounds in fruit tissue. Therefore, in our literature search we included reports on anthocyanin identification in seedlings of tomato. Names (common and International Union of Pure and Applied Chemistry [IUPAC]), Chemical Abstracts Service (CAS) registry number, molecular formula, monoisotopic accurate mass, published references, and other properties of each metabolite are systematized in this database. The database includes polar, semipolar, and apolar compounds. Because the procedure used by us for extraction, separation, and detection (see below) is biased toward compounds of semipolar nature, we expected mostly secondary metabolites like (poly)phenols, alkaloids, and derivatives thereof to be detected. Table I Table I. List of secondary metabolites identified in tomato fruit extracts according to literature Mol Form, Molecular formula; MM, monoisotopic molecular mass. Compound . Mol Form . MM . Reference . p-Hydroxybenzoic acid C7H6O3 138.0317 Mattila and Kumpulainen (2002) Salicylic acid C7H6O3 138.0317 Schmidtlein and Herrmann (1975), Petró-Turza (1987) Cinnamic acid C9H8O2 148.0524 Petró-Turza (1987) Protocatechuic acid C7H6O4 154.0266 Mattila and Kumpulainen (2002)a m-Coumaric acid C9H8O3 164.0474 Hunt and Baker (1980)a p-Coumaric acid C9H8O3 164.0473 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Petró-Turza (1987), Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Le Gall et al. (2003a)bc Vanillic acid C8H8O4 168.0423 Schmidtlein and Herrmann (1975), Mattila and Kumpulainen (2002) Caffeic acid C9H8O4 180.0423 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Sakakibara et al. (2003), Minoggio et al. (2003), Le Gall et al. (2003a)bc Ferulic acid C10H10O4 194.0579 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Minoggio et al. (2003) Sinapic acid C11H12O5 224.0685 Schmidtlein and Herrmann (1975)a Naringenin C15H12O5 272.0685 (Hunt and Baker, 1980)a; (Justesen et al., 1998)a, (Martinez-Valverde et al., 2002)a, (Raffo et al., 2002), (Minoggio et al., 2003) Naringenin chalcone C15H12O5 272.0685 Hunt and Baker (1980),a, Krause and Galensa (1992), Muir et al. (2001), Le Gall et al. (2003b),b, Minoggio et al. (2003) Kaempferol C15H10O6 286.0477 Stewart et al. (2000), Martinez-Valverde et al. (2002),a, Tokusoglu et al. (2003)a Quercetin C15H10O7 302.0427 Hertog et al. (1992), Crozier et al. (1997),a, Justesen et al. (1998),a, Stewart et al. (2000), Martinez-Valverde et al. (2002),a, Raffo et al. (2002), Sakakibara et al. (2003), Tokusoglu et al. (2003)a Myricetin C15H10O8 318.0376 Raffo et al. (2002), Sakakibara et al. (2003), Tokusoglu et al. (2003)a p-Coumaric acid-O-β-d-glucoside C15H18O8 326.1002 Fleuriet and Macheix (1977), Reschke and Herrmann (1982),a, Winter and Herrmann (1986),c, Buta and Spaulding (1997) p-Coumaroylquinic acid C16H18O8 338.1002 Fleuriet and Macheix (1977) Caffeic acid-4-O-β-d-glucoside C15H18O9 342.0951 Fleuriet and Macheix (1977), Winter and Herrmann (1986) Chlorogenic acid (3-O-caffeoylquinic acid) C16H18O9 354.0951 Fleuriet and Macheix (1977), Fleuriet and Macheix (1981), Winter and Herrmann (1986), Buta and Spaulding (1997), Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Sakakibara et al. (2003), Minoggio et al. (2003), Le Gall et al. (2003a, 2003b)bc 4-O-Caffeoylquinic acid C16H18O9 354.0951 Winter and Herrmann (1986), Mattila and Kumpulainen (2002) 5-O-Caffeoylquinic acid C16H18O9 354.0951 Winter and Herrmann (1986) Ferulic acid-O-β-d-glucoside C16H20O9 356.1107 Fleuriet and Macheix (1977), Reschke and Herrmann (1982), Winter and Herrmann (1986) Feruloylquinic acid C17H20O9 368.1107 Fleuriet and Macheix (1977) Tomatidine C27H45NO2 415.3450 Juvik et al. (1982),aFriedman et al. (1998)a Tomatidenol C27H43NO2 413.3294 Juvik et al. (1982),a, Friedman et al. (1994),a, Friedman et al. (1997),a, Friedman (2002)a Naringenin-7-O-glucoside C21H22O10 434.1213 Hunt and Baker (1980), Le Gall et al. (2003a, 2003b)bc Naringenin chalcone-glucoside C21H22O10 434.1213 Bino et al. (2005) Astragalin C21H20O11 448.1006 Le Gall et al. (2003a, 2003b)bc Dihydrokaempferol-7-O-hexoside and Dihydrokaempferol-?-O-hexoside C21H22O11 450.1162 Le Gall et al. (2003a, 2003b)bc Isoquercitrin C21H20O12 464.0955 Muir et al. (2001),b, Le Gall et al. (2003a, 2003b)b Myricitrin C21H20O12 464.0955 Sakakibara et al. (2003) Naringin C27H32O14 580.1792 Bovy et al. (2002)abd Kaempferol-3-O-rutinoside C27H30O15 594.1585 Bovy et al. (2002),bd, Le Gall et al. (2003b)bc Kaempferol-3-7-di-O-glucoside C27H30O16 610.1534 Le Gall et al. (2003a, 2003b)bc Rutin C27H30O16 610.1534 Fleuriet and Macheix (1977), Buta and Spaulding (1997), Stewart et al. (2000), Muir et al. (2001), Raffo et al. (2002); Le Gall et al. (2003a, 2003b)bc, Minoggio et al. (2003) Quercetin-3-O-trisaccharide C32H38O20 742.1956 Muir et al. (2001), Minoggio et al. (2003) p-Coumaric acid-rutin conjugate C36H36O18 756.1902 Buta and Spaulding (1997) Kaempferol-3-O-rutinoside-7-O-glucoside C33H40O20 756.2113 Le Gall et al. (2003a, 2003b)bc Delphinidin-3-O-rutinoside-5-O-glucoside C33H41O21+ 773.2135 Mathews et al. (2003)bd Petunidin-3-O-rutinoside-5-O-glucoside C34H43O21+ 787.2291 Mathews et al. (2003)bd Malvidin-3-O-rutinoside-5-O-glucoside C35H45O21+ 801.2448 Mathews et al. (2003)bd Delphinidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C42H47O23+ 919.2503 Mathews et al. (2003)bd Petunidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C43H49O23+ 933.2659 Bovy et al. (2002),bd, Mathews et al. (2003)bd Delphinidin-3-O-(caffeoyl)rutinoside-5-O-glucoside C42H47O24+ 935.2452 Mathews et al. (2003)bd Malvidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C44H51O23+ 947.2816 Bovy et al. (2002),bd, Mathews et al. (2003)bd Petunidin-3-(caffeoyl)rutinoside-5-O-glucoside C43H49O24+ 949.2608 Bovy et al. (2002),bd, Mathews et al. (2003)bd Malvidin-3-(caffeoyl)rutinoside-5-O-glucoside C44H51O24+ 963.2765 Mathews et al. (2003)bd δ-Tomatine C33H55NO7 577.3979 Friedman et al. (1998)a γ-Tomatine C39H65NO12 739.4507 Friedman et al. (1998)a β-Tomatine C45H75NO17 901.5035 Friedman et al. (1998)a Dehydrotomatine C50H81NO21 1,031.5301 Friedman et al. (1994), Kozukue and Friedman (2003) α-Tomatine C50H83NO21 1,033.5458 Juvik et al. (1982), Willker and Leibfritz (1992),c, Friedman et al. (1994), Yahara et al. (1996), Friedman et al. (1997), Friedman et al. (1998), Friedman (2002), Bianco et al. (2002), Kozukue and Friedman (2003) Lycoperoside H C50H83NO22 1,049.5407 Yahara et al. (1996),c, Yahara et al. (2004)c Lycoperoside A C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Lycoperoside B C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Lycoperoside C C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Esculeoside B C56H93NO28 1,227.5884 Fujiwara et al. (2004),c, Yahara et al. (2004)c Esculeoside A C58H95NO29 1,269.5990 Fujiwara et al. (2003, 2004)c, Yahara et al. (2004),c, Yoshizaki et al. (2005)c Lycoperoside F C58H95NO29 1,269.5990 Yahara et al. (2004)c Lycoperoside G C58H95NO29 1,269.5990 Yahara et al. (2004)c Compound . Mol Form . MM . Reference . p-Hydroxybenzoic acid C7H6O3 138.0317 Mattila and Kumpulainen (2002) Salicylic acid C7H6O3 138.0317 Schmidtlein and Herrmann (1975), Petró-Turza (1987) Cinnamic acid C9H8O2 148.0524 Petró-Turza (1987) Protocatechuic acid C7H6O4 154.0266 Mattila and Kumpulainen (2002)a m-Coumaric acid C9H8O3 164.0474 Hunt and Baker (1980)a p-Coumaric acid C9H8O3 164.0473 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Petró-Turza (1987), Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Le Gall et al. (2003a)bc Vanillic acid C8H8O4 168.0423 Schmidtlein and Herrmann (1975), Mattila and Kumpulainen (2002) Caffeic acid C9H8O4 180.0423 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Sakakibara et al. (2003), Minoggio et al. (2003), Le Gall et al. (2003a)bc Ferulic acid C10H10O4 194.0579 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Minoggio et al. (2003) Sinapic acid C11H12O5 224.0685 Schmidtlein and Herrmann (1975)a Naringenin C15H12O5 272.0685 (Hunt and Baker, 1980)a; (Justesen et al., 1998)a, (Martinez-Valverde et al., 2002)a, (Raffo et al., 2002), (Minoggio et al., 2003) Naringenin chalcone C15H12O5 272.0685 Hunt and Baker (1980),a, Krause and Galensa (1992), Muir et al. (2001), Le Gall et al. (2003b),b, Minoggio et al. (2003) Kaempferol C15H10O6 286.0477 Stewart et al. (2000), Martinez-Valverde et al. (2002),a, Tokusoglu et al. (2003)a Quercetin C15H10O7 302.0427 Hertog et al. (1992), Crozier et al. (1997),a, Justesen et al. (1998),a, Stewart et al. (2000), Martinez-Valverde et al. (2002),a, Raffo et al. (2002), Sakakibara et al. (2003), Tokusoglu et al. (2003)a Myricetin C15H10O8 318.0376 Raffo et al. (2002), Sakakibara et al. (2003), Tokusoglu et al. (2003)a p-Coumaric acid-O-β-d-glucoside C15H18O8 326.1002 Fleuriet and Macheix (1977), Reschke and Herrmann (1982),a, Winter and Herrmann (1986),c, Buta and Spaulding (1997) p-Coumaroylquinic acid C16H18O8 338.1002 Fleuriet and Macheix (1977) Caffeic acid-4-O-β-d-glucoside C15H18O9 342.0951 Fleuriet and Macheix (1977), Winter and Herrmann (1986) Chlorogenic acid (3-O-caffeoylquinic acid) C16H18O9 354.0951 Fleuriet and Macheix (1977), Fleuriet and Macheix (1981), Winter and Herrmann (1986), Buta and Spaulding (1997), Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Sakakibara et al. (2003), Minoggio et al. (2003), Le Gall et al. (2003a, 2003b)bc 4-O-Caffeoylquinic acid C16H18O9 354.0951 Winter and Herrmann (1986), Mattila and Kumpulainen (2002) 5-O-Caffeoylquinic acid C16H18O9 354.0951 Winter and Herrmann (1986) Ferulic acid-O-β-d-glucoside C16H20O9 356.1107 Fleuriet and Macheix (1977), Reschke and Herrmann (1982), Winter and Herrmann (1986) Feruloylquinic acid C17H20O9 368.1107 Fleuriet and Macheix (1977) Tomatidine C27H45NO2 415.3450 Juvik et al. (1982),aFriedman et al. (1998)a Tomatidenol C27H43NO2 413.3294 Juvik et al. (1982),a, Friedman et al. (1994),a, Friedman et al. (1997),a, Friedman (2002)a Naringenin-7-O-glucoside C21H22O10 434.1213 Hunt and Baker (1980), Le Gall et al. (2003a, 2003b)bc Naringenin chalcone-glucoside C21H22O10 434.1213 Bino et al. (2005) Astragalin C21H20O11 448.1006 Le Gall et al. (2003a, 2003b)bc Dihydrokaempferol-7-O-hexoside and Dihydrokaempferol-?-O-hexoside C21H22O11 450.1162 Le Gall et al. (2003a, 2003b)bc Isoquercitrin C21H20O12 464.0955 Muir et al. (2001),b, Le Gall et al. (2003a, 2003b)b Myricitrin C21H20O12 464.0955 Sakakibara et al. (2003) Naringin C27H32O14 580.1792 Bovy et al. (2002)abd Kaempferol-3-O-rutinoside C27H30O15 594.1585 Bovy et al. (2002),bd, Le Gall et al. (2003b)bc Kaempferol-3-7-di-O-glucoside C27H30O16 610.1534 Le Gall et al. (2003a, 2003b)bc Rutin C27H30O16 610.1534 Fleuriet and Macheix (1977), Buta and Spaulding (1997), Stewart et al. (2000), Muir et al. (2001), Raffo et al. (2002); Le Gall et al. (2003a, 2003b)bc, Minoggio et al. (2003) Quercetin-3-O-trisaccharide C32H38O20 742.1956 Muir et al. (2001), Minoggio et al. (2003) p-Coumaric acid-rutin conjugate C36H36O18 756.1902 Buta and Spaulding (1997) Kaempferol-3-O-rutinoside-7-O-glucoside C33H40O20 756.2113 Le Gall et al. (2003a, 2003b)bc Delphinidin-3-O-rutinoside-5-O-glucoside C33H41O21+ 773.2135 Mathews et al. (2003)bd Petunidin-3-O-rutinoside-5-O-glucoside C34H43O21+ 787.2291 Mathews et al. (2003)bd Malvidin-3-O-rutinoside-5-O-glucoside C35H45O21+ 801.2448 Mathews et al. (2003)bd Delphinidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C42H47O23+ 919.2503 Mathews et al. (2003)bd Petunidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C43H49O23+ 933.2659 Bovy et al. (2002),bd, Mathews et al. (2003)bd Delphinidin-3-O-(caffeoyl)rutinoside-5-O-glucoside C42H47O24+ 935.2452 Mathews et al. (2003)bd Malvidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C44H51O23+ 947.2816 Bovy et al. (2002),bd, Mathews et al. (2003)bd Petunidin-3-(caffeoyl)rutinoside-5-O-glucoside C43H49O24+ 949.2608 Bovy et al. (2002),bd, Mathews et al. (2003)bd Malvidin-3-(caffeoyl)rutinoside-5-O-glucoside C44H51O24+ 963.2765 Mathews et al. (2003)bd δ-Tomatine C33H55NO7 577.3979 Friedman et al. (1998)a γ-Tomatine C39H65NO12 739.4507 Friedman et al. (1998)a β-Tomatine C45H75NO17 901.5035 Friedman et al. (1998)a Dehydrotomatine C50H81NO21 1,031.5301 Friedman et al. (1994), Kozukue and Friedman (2003) α-Tomatine C50H83NO21 1,033.5458 Juvik et al. (1982), Willker and Leibfritz (1992),c, Friedman et al. (1994), Yahara et al. (1996), Friedman et al. (1997), Friedman et al. (1998), Friedman (2002), Bianco et al. (2002), Kozukue and Friedman (2003) Lycoperoside H C50H83NO22 1,049.5407 Yahara et al. (1996),c, Yahara et al. (2004)c Lycoperoside A C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Lycoperoside B C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Lycoperoside C C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Esculeoside B C56H93NO28 1,227.5884 Fujiwara et al. (2004),c, Yahara et al. (2004)c Esculeoside A C58H95NO29 1,269.5990 Fujiwara et al. (2003, 2004)c, Yahara et al. (2004),c, Yoshizaki et al. (2005)c Lycoperoside F C58H95NO29 1,269.5990 Yahara et al. (2004)c Lycoperoside G C58H95NO29 1,269.5990 Yahara et al. (2004)c a Identified after hydrolysis. b Identified in transgenic tomato plants. c Identified using NMR data. d Identified in seedlings. Open in new tab Table I. List of secondary metabolites identified in tomato fruit extracts according to literature Mol Form, Molecular formula; MM, monoisotopic molecular mass. Compound . Mol Form . MM . Reference . p-Hydroxybenzoic acid C7H6O3 138.0317 Mattila and Kumpulainen (2002) Salicylic acid C7H6O3 138.0317 Schmidtlein and Herrmann (1975), Petró-Turza (1987) Cinnamic acid C9H8O2 148.0524 Petró-Turza (1987) Protocatechuic acid C7H6O4 154.0266 Mattila and Kumpulainen (2002)a m-Coumaric acid C9H8O3 164.0474 Hunt and Baker (1980)a p-Coumaric acid C9H8O3 164.0473 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Petró-Turza (1987), Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Le Gall et al. (2003a)bc Vanillic acid C8H8O4 168.0423 Schmidtlein and Herrmann (1975), Mattila and Kumpulainen (2002) Caffeic acid C9H8O4 180.0423 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Sakakibara et al. (2003), Minoggio et al. (2003), Le Gall et al. (2003a)bc Ferulic acid C10H10O4 194.0579 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Minoggio et al. (2003) Sinapic acid C11H12O5 224.0685 Schmidtlein and Herrmann (1975)a Naringenin C15H12O5 272.0685 (Hunt and Baker, 1980)a; (Justesen et al., 1998)a, (Martinez-Valverde et al., 2002)a, (Raffo et al., 2002), (Minoggio et al., 2003) Naringenin chalcone C15H12O5 272.0685 Hunt and Baker (1980),a, Krause and Galensa (1992), Muir et al. (2001), Le Gall et al. (2003b),b, Minoggio et al. (2003) Kaempferol C15H10O6 286.0477 Stewart et al. (2000), Martinez-Valverde et al. (2002),a, Tokusoglu et al. (2003)a Quercetin C15H10O7 302.0427 Hertog et al. (1992), Crozier et al. (1997),a, Justesen et al. (1998),a, Stewart et al. (2000), Martinez-Valverde et al. (2002),a, Raffo et al. (2002), Sakakibara et al. (2003), Tokusoglu et al. (2003)a Myricetin C15H10O8 318.0376 Raffo et al. (2002), Sakakibara et al. (2003), Tokusoglu et al. (2003)a p-Coumaric acid-O-β-d-glucoside C15H18O8 326.1002 Fleuriet and Macheix (1977), Reschke and Herrmann (1982),a, Winter and Herrmann (1986),c, Buta and Spaulding (1997) p-Coumaroylquinic acid C16H18O8 338.1002 Fleuriet and Macheix (1977) Caffeic acid-4-O-β-d-glucoside C15H18O9 342.0951 Fleuriet and Macheix (1977), Winter and Herrmann (1986) Chlorogenic acid (3-O-caffeoylquinic acid) C16H18O9 354.0951 Fleuriet and Macheix (1977), Fleuriet and Macheix (1981), Winter and Herrmann (1986), Buta and Spaulding (1997), Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Sakakibara et al. (2003), Minoggio et al. (2003), Le Gall et al. (2003a, 2003b)bc 4-O-Caffeoylquinic acid C16H18O9 354.0951 Winter and Herrmann (1986), Mattila and Kumpulainen (2002) 5-O-Caffeoylquinic acid C16H18O9 354.0951 Winter and Herrmann (1986) Ferulic acid-O-β-d-glucoside C16H20O9 356.1107 Fleuriet and Macheix (1977), Reschke and Herrmann (1982), Winter and Herrmann (1986) Feruloylquinic acid C17H20O9 368.1107 Fleuriet and Macheix (1977) Tomatidine C27H45NO2 415.3450 Juvik et al. (1982),aFriedman et al. (1998)a Tomatidenol C27H43NO2 413.3294 Juvik et al. (1982),a, Friedman et al. (1994),a, Friedman et al. (1997),a, Friedman (2002)a Naringenin-7-O-glucoside C21H22O10 434.1213 Hunt and Baker (1980), Le Gall et al. (2003a, 2003b)bc Naringenin chalcone-glucoside C21H22O10 434.1213 Bino et al. (2005) Astragalin C21H20O11 448.1006 Le Gall et al. (2003a, 2003b)bc Dihydrokaempferol-7-O-hexoside and Dihydrokaempferol-?-O-hexoside C21H22O11 450.1162 Le Gall et al. (2003a, 2003b)bc Isoquercitrin C21H20O12 464.0955 Muir et al. (2001),b, Le Gall et al. (2003a, 2003b)b Myricitrin C21H20O12 464.0955 Sakakibara et al. (2003) Naringin C27H32O14 580.1792 Bovy et al. (2002)abd Kaempferol-3-O-rutinoside C27H30O15 594.1585 Bovy et al. (2002),bd, Le Gall et al. (2003b)bc Kaempferol-3-7-di-O-glucoside C27H30O16 610.1534 Le Gall et al. (2003a, 2003b)bc Rutin C27H30O16 610.1534 Fleuriet and Macheix (1977), Buta and Spaulding (1997), Stewart et al. (2000), Muir et al. (2001), Raffo et al. (2002); Le Gall et al. (2003a, 2003b)bc, Minoggio et al. (2003) Quercetin-3-O-trisaccharide C32H38O20 742.1956 Muir et al. (2001), Minoggio et al. (2003) p-Coumaric acid-rutin conjugate C36H36O18 756.1902 Buta and Spaulding (1997) Kaempferol-3-O-rutinoside-7-O-glucoside C33H40O20 756.2113 Le Gall et al. (2003a, 2003b)bc Delphinidin-3-O-rutinoside-5-O-glucoside C33H41O21+ 773.2135 Mathews et al. (2003)bd Petunidin-3-O-rutinoside-5-O-glucoside C34H43O21+ 787.2291 Mathews et al. (2003)bd Malvidin-3-O-rutinoside-5-O-glucoside C35H45O21+ 801.2448 Mathews et al. (2003)bd Delphinidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C42H47O23+ 919.2503 Mathews et al. (2003)bd Petunidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C43H49O23+ 933.2659 Bovy et al. (2002),bd, Mathews et al. (2003)bd Delphinidin-3-O-(caffeoyl)rutinoside-5-O-glucoside C42H47O24+ 935.2452 Mathews et al. (2003)bd Malvidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C44H51O23+ 947.2816 Bovy et al. (2002),bd, Mathews et al. (2003)bd Petunidin-3-(caffeoyl)rutinoside-5-O-glucoside C43H49O24+ 949.2608 Bovy et al. (2002),bd, Mathews et al. (2003)bd Malvidin-3-(caffeoyl)rutinoside-5-O-glucoside C44H51O24+ 963.2765 Mathews et al. (2003)bd δ-Tomatine C33H55NO7 577.3979 Friedman et al. (1998)a γ-Tomatine C39H65NO12 739.4507 Friedman et al. (1998)a β-Tomatine C45H75NO17 901.5035 Friedman et al. (1998)a Dehydrotomatine C50H81NO21 1,031.5301 Friedman et al. (1994), Kozukue and Friedman (2003) α-Tomatine C50H83NO21 1,033.5458 Juvik et al. (1982), Willker and Leibfritz (1992),c, Friedman et al. (1994), Yahara et al. (1996), Friedman et al. (1997), Friedman et al. (1998), Friedman (2002), Bianco et al. (2002), Kozukue and Friedman (2003) Lycoperoside H C50H83NO22 1,049.5407 Yahara et al. (1996),c, Yahara et al. (2004)c Lycoperoside A C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Lycoperoside B C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Lycoperoside C C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Esculeoside B C56H93NO28 1,227.5884 Fujiwara et al. (2004),c, Yahara et al. (2004)c Esculeoside A C58H95NO29 1,269.5990 Fujiwara et al. (2003, 2004)c, Yahara et al. (2004),c, Yoshizaki et al. (2005)c Lycoperoside F C58H95NO29 1,269.5990 Yahara et al. (2004)c Lycoperoside G C58H95NO29 1,269.5990 Yahara et al. (2004)c Compound . Mol Form . MM . Reference . p-Hydroxybenzoic acid C7H6O3 138.0317 Mattila and Kumpulainen (2002) Salicylic acid C7H6O3 138.0317 Schmidtlein and Herrmann (1975), Petró-Turza (1987) Cinnamic acid C9H8O2 148.0524 Petró-Turza (1987) Protocatechuic acid C7H6O4 154.0266 Mattila and Kumpulainen (2002)a m-Coumaric acid C9H8O3 164.0474 Hunt and Baker (1980)a p-Coumaric acid C9H8O3 164.0473 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Petró-Turza (1987), Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Le Gall et al. (2003a)bc Vanillic acid C8H8O4 168.0423 Schmidtlein and Herrmann (1975), Mattila and Kumpulainen (2002) Caffeic acid C9H8O4 180.0423 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Sakakibara et al. (2003), Minoggio et al. (2003), Le Gall et al. (2003a)bc Ferulic acid C10H10O4 194.0579 Schmidtlein and Herrmann (1975),a, Hunt and Baker (1980),a, Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Minoggio et al. (2003) Sinapic acid C11H12O5 224.0685 Schmidtlein and Herrmann (1975)a Naringenin C15H12O5 272.0685 (Hunt and Baker, 1980)a; (Justesen et al., 1998)a, (Martinez-Valverde et al., 2002)a, (Raffo et al., 2002), (Minoggio et al., 2003) Naringenin chalcone C15H12O5 272.0685 Hunt and Baker (1980),a, Krause and Galensa (1992), Muir et al. (2001), Le Gall et al. (2003b),b, Minoggio et al. (2003) Kaempferol C15H10O6 286.0477 Stewart et al. (2000), Martinez-Valverde et al. (2002),a, Tokusoglu et al. (2003)a Quercetin C15H10O7 302.0427 Hertog et al. (1992), Crozier et al. (1997),a, Justesen et al. (1998),a, Stewart et al. (2000), Martinez-Valverde et al. (2002),a, Raffo et al. (2002), Sakakibara et al. (2003), Tokusoglu et al. (2003)a Myricetin C15H10O8 318.0376 Raffo et al. (2002), Sakakibara et al. (2003), Tokusoglu et al. (2003)a p-Coumaric acid-O-β-d-glucoside C15H18O8 326.1002 Fleuriet and Macheix (1977), Reschke and Herrmann (1982),a, Winter and Herrmann (1986),c, Buta and Spaulding (1997) p-Coumaroylquinic acid C16H18O8 338.1002 Fleuriet and Macheix (1977) Caffeic acid-4-O-β-d-glucoside C15H18O9 342.0951 Fleuriet and Macheix (1977), Winter and Herrmann (1986) Chlorogenic acid (3-O-caffeoylquinic acid) C16H18O9 354.0951 Fleuriet and Macheix (1977), Fleuriet and Macheix (1981), Winter and Herrmann (1986), Buta and Spaulding (1997), Martinez-Valverde et al. (2002), Mattila and Kumpulainen (2002), Raffo et al. (2002), Sakakibara et al. (2003), Minoggio et al. (2003), Le Gall et al. (2003a, 2003b)bc 4-O-Caffeoylquinic acid C16H18O9 354.0951 Winter and Herrmann (1986), Mattila and Kumpulainen (2002) 5-O-Caffeoylquinic acid C16H18O9 354.0951 Winter and Herrmann (1986) Ferulic acid-O-β-d-glucoside C16H20O9 356.1107 Fleuriet and Macheix (1977), Reschke and Herrmann (1982), Winter and Herrmann (1986) Feruloylquinic acid C17H20O9 368.1107 Fleuriet and Macheix (1977) Tomatidine C27H45NO2 415.3450 Juvik et al. (1982),aFriedman et al. (1998)a Tomatidenol C27H43NO2 413.3294 Juvik et al. (1982),a, Friedman et al. (1994),a, Friedman et al. (1997),a, Friedman (2002)a Naringenin-7-O-glucoside C21H22O10 434.1213 Hunt and Baker (1980), Le Gall et al. (2003a, 2003b)bc Naringenin chalcone-glucoside C21H22O10 434.1213 Bino et al. (2005) Astragalin C21H20O11 448.1006 Le Gall et al. (2003a, 2003b)bc Dihydrokaempferol-7-O-hexoside and Dihydrokaempferol-?-O-hexoside C21H22O11 450.1162 Le Gall et al. (2003a, 2003b)bc Isoquercitrin C21H20O12 464.0955 Muir et al. (2001),b, Le Gall et al. (2003a, 2003b)b Myricitrin C21H20O12 464.0955 Sakakibara et al. (2003) Naringin C27H32O14 580.1792 Bovy et al. (2002)abd Kaempferol-3-O-rutinoside C27H30O15 594.1585 Bovy et al. (2002),bd, Le Gall et al. (2003b)bc Kaempferol-3-7-di-O-glucoside C27H30O16 610.1534 Le Gall et al. (2003a, 2003b)bc Rutin C27H30O16 610.1534 Fleuriet and Macheix (1977), Buta and Spaulding (1997), Stewart et al. (2000), Muir et al. (2001), Raffo et al. (2002); Le Gall et al. (2003a, 2003b)bc, Minoggio et al. (2003) Quercetin-3-O-trisaccharide C32H38O20 742.1956 Muir et al. (2001), Minoggio et al. (2003) p-Coumaric acid-rutin conjugate C36H36O18 756.1902 Buta and Spaulding (1997) Kaempferol-3-O-rutinoside-7-O-glucoside C33H40O20 756.2113 Le Gall et al. (2003a, 2003b)bc Delphinidin-3-O-rutinoside-5-O-glucoside C33H41O21+ 773.2135 Mathews et al. (2003)bd Petunidin-3-O-rutinoside-5-O-glucoside C34H43O21+ 787.2291 Mathews et al. (2003)bd Malvidin-3-O-rutinoside-5-O-glucoside C35H45O21+ 801.2448 Mathews et al. (2003)bd Delphinidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C42H47O23+ 919.2503 Mathews et al. (2003)bd Petunidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C43H49O23+ 933.2659 Bovy et al. (2002),bd, Mathews et al. (2003)bd Delphinidin-3-O-(caffeoyl)rutinoside-5-O-glucoside C42H47O24+ 935.2452 Mathews et al. (2003)bd Malvidin-3-O-(p-coumaroyl)rutinoside-5-O-glucoside C44H51O23+ 947.2816 Bovy et al. (2002),bd, Mathews et al. (2003)bd Petunidin-3-(caffeoyl)rutinoside-5-O-glucoside C43H49O24+ 949.2608 Bovy et al. (2002),bd, Mathews et al. (2003)bd Malvidin-3-(caffeoyl)rutinoside-5-O-glucoside C44H51O24+ 963.2765 Mathews et al. (2003)bd δ-Tomatine C33H55NO7 577.3979 Friedman et al. (1998)a γ-Tomatine C39H65NO12 739.4507 Friedman et al. (1998)a β-Tomatine C45H75NO17 901.5035 Friedman et al. (1998)a Dehydrotomatine C50H81NO21 1,031.5301 Friedman et al. (1994), Kozukue and Friedman (2003) α-Tomatine C50H83NO21 1,033.5458 Juvik et al. (1982), Willker and Leibfritz (1992),c, Friedman et al. (1994), Yahara et al. (1996), Friedman et al. (1997), Friedman et al. (1998), Friedman (2002), Bianco et al. (2002), Kozukue and Friedman (2003) Lycoperoside H C50H83NO22 1,049.5407 Yahara et al. (1996),c, Yahara et al. (2004)c Lycoperoside A C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Lycoperoside B C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Lycoperoside C C52H85NO23 1,091.5512 Yahara et al. (1996, 2004)c Esculeoside B C56H93NO28 1,227.5884 Fujiwara et al. (2004),c, Yahara et al. (2004)c Esculeoside A C58H95NO29 1,269.5990 Fujiwara et al. (2003, 2004)c, Yahara et al. (2004),c, Yoshizaki et al. (2005)c Lycoperoside F C58H95NO29 1,269.5990 Yahara et al. (2004)c Lycoperoside G C58H95NO29 1,269.5990 Yahara et al. (2004)c a Identified after hydrolysis. b Identified in transgenic tomato plants. c Identified using NMR data. d Identified in seedlings. Open in new tab summarizes all (poly)phenolic compounds (48) and alkaloids (15) so far reported to be present in tomato fruit extracts, including compounds that have been identified only in fruits of transgenic tomato plants. Many compounds were assigned before MS technologies became available. The number of compounds identified by NMR is very limited. Metabolite Extraction and LC-PDA-MS Analysis A representative tomato fruit sample was obtained by combining fruits of 96 different tomato cultivars producing ripe red, orange-colored beef, round, or cherry type of fruits at different stages of ripening (Tikunov et al., 2005). In addition, some purple-skinned fruits were selected for analyses of anthocyanins, which is a class of tomato fruit compounds only occurring in specific varieties (Jones et al., 2003) or in transgenic plants (Mathews et al., 2003). Peel material was chosen as the starting material, as this tissue contains the highest levels of flavonoids (Muir et al., 2001), which represent an important class of secondary metabolites. The 75% methanol/water extract enabled separation by C18-reversed-phase LC and detection by both PDA and MS of semipolar metabolites. Figure 1 Figure 1. Open in new tabDownload slide Typical chromatograms obtained from reversed-phase LC-PDA-ESI-QTOF-MS analysis of tomato peel extract. A, Total ion signal (QTOF MS). B, Absorbance signal (PDA). Retention times (in minutes) are indicated for the most intense peaks (difference between the two detectors is 0.15 min). Inserts in A show accurate mass (I) and MS/MS spectrum (II), and in B absorbance spectrum (III) obtained for the compound rutin eluting at 23.3 min. Figure 1. Open in new tabDownload slide Typical chromatograms obtained from reversed-phase LC-PDA-ESI-QTOF-MS analysis of tomato peel extract. A, Total ion signal (QTOF MS). B, Absorbance signal (PDA). Retention times (in minutes) are indicated for the most intense peaks (difference between the two detectors is 0.15 min). Inserts in A show accurate mass (I) and MS/MS spectrum (II), and in B absorbance spectrum (III) obtained for the compound rutin eluting at 23.3 min. shows an example of a chromatogram obtained upon LC-PDA-QTOF-MS analysis of 75% methanol/water extracts from tomato peel. These extracts were stable for several months at −20°C, as determined by comparing LC-PDA chromatograms. Only naringenin chalcone was observed to decay slowly into naringenin while standing in the autosampler (20°C) during a series of analyses (about 1.4 μg g−1 fresh weight h−1). To test the reproducibility of the LC system, chromatograms of the tomato fruit material that have been analyzed over a period of 2 years (>100 samples) were manually compared for retention time shifts using some typical tomato compounds (Table II Table II. Retention time shifts observed during LC-QTOF-MS analysis of tomato fruit Ret (min), Retention time, in minutes; Av, average; StDev, standard deviation; Wd, retention time window. Ret Metabolite . Chlorogenic Acid . . . Rutin . . . Naringenin Chalcone . . . . Av . StDev . Wd . Av . StDev . Wd . Av . StDev . Wd . min Within series (n = 13) 14.42 0.03 0.09 23.40 0.04 0.13 41.81 0.03 0.11 In-between series (n = 6) 14.92 0.33 0.79 23.85 0.50 0.99 42.26 0.50 1.12 Ret Metabolite . Chlorogenic Acid . . . Rutin . . . Naringenin Chalcone . . . . Av . StDev . Wd . Av . StDev . Wd . Av . StDev . Wd . min Within series (n = 13) 14.42 0.03 0.09 23.40 0.04 0.13 41.81 0.03 0.11 In-between series (n = 6) 14.92 0.33 0.79 23.85 0.50 0.99 42.26 0.50 1.12 Open in new tab Table II. Retention time shifts observed during LC-QTOF-MS analysis of tomato fruit Ret (min), Retention time, in minutes; Av, average; StDev, standard deviation; Wd, retention time window. Ret Metabolite . Chlorogenic Acid . . . Rutin . . . Naringenin Chalcone . . . . Av . StDev . Wd . Av . StDev . Wd . Av . StDev . Wd . min Within series (n = 13) 14.42 0.03 0.09 23.40 0.04 0.13 41.81 0.03 0.11 In-between series (n = 6) 14.92 0.33 0.79 23.85 0.50 0.99 42.26 0.50 1.12 Ret Metabolite . Chlorogenic Acid . . . Rutin . . . Naringenin Chalcone . . . . Av . StDev . Wd . Av . StDev . Wd . Av . StDev . Wd . min Within series (n = 13) 14.42 0.03 0.09 23.40 0.04 0.13 41.81 0.03 0.11 In-between series (n = 6) 14.92 0.33 0.79 23.85 0.50 0.99 42.26 0.50 1.12 Open in new tab ). Within a single series of analyses, the standard variation was very small (about 2 s) for all compounds tested. Between series of analyses over this time period, the maximum variation was 30 s, with a maximum retention time window of 1.1 min for naringenin chalcone. During this prolonged period, LC columns of different batches were used. Comparison of Ionization Modes Since compounds may preferentially ionize in either positive or negative mode in our LC system, which is based on a gradient of acetonitrile acidified with formic acid, we analyzed tomato extracts sequentially in both modes and compared the absolute mass signal intensities, expressed in peak heights, of the monoisotopic parent ions of some identified compounds. Phenolic acids and their carboxylic acid derivatives ionized better in negative ionization mode, while flavonoids generated higher signal intensities in positive ionization mode (Fig. 2 Figure 2. Open in new tabDownload slide Peak intensity ratios, in logarithmic scale, of mass signals (peak height) obtained in positive and negative ionization modes for some metabolites found in tomato peel extracts. Figure 2. Open in new tabDownload slide Peak intensity ratios, in logarithmic scale, of mass signals (peak height) obtained in positive and negative ionization modes for some metabolites found in tomato peel extracts. ). Nitrogen-containing compounds such as Phe and some alkaloids ionized better in positive mode, and were mainly detected as formic acid adducts in negative mode. These adducts were formed in the ionization source and were readily recognized in MS/MS mode from the loss of 46 D (formic acid). A loss of 18 D corresponding to a loss of water was also regularly observed in negative ionization mode. Automatic Mass Alignment and Exact Mass Calculation First, reproducibility of sample preparation and subsequent automated extraction and comparison of mass signal intensities, expressed as peak height using metAlign software (Bino et al., 2005; Vorst et al., 2005), was performed on a dataset obtained from LC-MS analysis of eight replicate extractions of tomato peel. The retention time correction used by the software to align all mass signals was, on average, 2.5 s, which is in accordance to the retention shift observed on manual inspection of the chromatograms (Table II). The overall variation in mass signal intensities between these replicate samples was <15%. Automation of the calculation of the accurate mass of detected LC-MS signals was tested using a dataset of 44 tomato extracts obtained from both peel and flesh tissues analyzed in negative ionization mode. Upon metAlign-assisted data processing, 4,958 mass signals with signal-to-noise ratios >3 were extracted. It is known that exact mass measurements on QTOF instruments using lock mass correction provide the highest accuracy at analyte signal intensities that are similar to the lock mass signal (Colombo et al., 2004). To establish the dynamic range in signal intensity for producing high mass accuracy in our TOF MS, the deviation of manually measured mass (i.e. the mean of the three top scans of the extracted mass peak) from the theoretical mass was plotted against the parent mass signal intensity (ion counts at top scan) for some known tomato metabolites (Fig. 3 Figure 3. Open in new tabDownload slide Difference between observed and theoretical monoisotopic masses, calculated as Δppm (y axis), as a function of the parent ion signal intensity, expressed as ion counts/scan at center of peak (x axis, log10-transformed data) for some identified compounds in tomato peel extracts. Threshold levels for mass accuracies between +5 and −5 ppm, and for analyte mass signal intensities between 0.25 and 2.0 times the lock mass signal intensity are indicated with dotted lines. Figure 3. Open in new tabDownload slide Difference between observed and theoretical monoisotopic masses, calculated as Δppm (y axis), as a function of the parent ion signal intensity, expressed as ion counts/scan at center of peak (x axis, log10-transformed data) for some identified compounds in tomato peel extracts. Threshold levels for mass accuracies between +5 and −5 ppm, and for analyte mass signal intensities between 0.25 and 2.0 times the lock mass signal intensity are indicated with dotted lines. ). Typically, accurate mass measurements derived from peak intensities lower than the lock mass intensity resulted in a positive deviation from the real mass, while mass measurements from peak intensities higher than lock mass intensity resulted in a negative deviation. High mass accuracies (i.e. mass deviation less than 5 ppm) were observed within an analyte signal intensity window of 0.25 to 2.0 times the lock mass. Thus, to automatically calculate correct accurate masses for signals extracted and aligned by metAlign, a script called metAccure (O. Vorst, H.A. Verhoeven, C.H.R. de Vos, C.A. Maliepaard, and R.C.H.J. van Ham, unpublished data) was programmed to use only those scans with mass signal intensities within this intensity window. In this way, appropriate accurate masses were automatically obtained for 479 (about 10%) of the total mass signals detected in ESI-negative mode, in which isotopes, adducts, and fragments are included. This number indicates that for the majority of extracted mass signals, though having a chromatographically relevant signal-to-noise ratio of at least 3, the intensities in the samples analyzed were too low to estimate properly their accurate mass, either by automated calculation through metAccure or by manual calculation. Identification of Tomato Metabolites The identification of compounds reported to be present in tomato fruit was done using two approaches. First, 19 available standard compounds (see “Materials and Methods”) were injected and compared for retention time, accurate mass, and UV/Vis spectra with LC peaks detected in the extracts from the pooled peel material of the 96 tomato cultivars. In this way, chlorogenic acid (i.e. 3-caffeoylquinic acid), rutin, kaempferol-rutinoside, naringenin, naringenin chalcone, and α-tomatine were identified. Second, the chromatograms from the 44 LC-MS data sets were checked for the presence of accurate masses, as calculated by metAccure, corresponding to metabolites that were expected to be detected with our system (Table I). The accurate mass hits were subsequently combined with PDA and MS/MS fragmentation data for further identification and confirmation of metabolites. As an example, data of known tomato metabolites observed in extracts of the pooled peel material of the 96 tomato cultivars, derived by LC-PDA-MS and MS/MS analyses in negative mode, are listed in Table III. Table III. Metabolites that have previously been reported in literature, identified by LC-PDA-ESI-QTOF-MS/MS (negative ionization mode) in tomato peel extracts Ret (min), Retention time, in minutes; Av, average; StDev, standard deviation; Av m/z, average found mass signal; UV/Vis, absorbance maximums in the UV/Vis range; Mol Form, molecular formula of the metabolite; Theo. Mass, theoretical monoisotopic mass calculated for the ion (M-H)−; Mean Δ (ppm), deviation between the averages of found accurate mass and real accurate mass, in ppm; Putative ID, putative identification of metabolite; () FA, formic acid adduct; −, data not found; (S), identification confirmed by the standard compound; I, II, III, IV, V, and VI, different isomers (only one reported in literature). Ret . . Av m/z . UV/Vis . MS/MS Fragments . Mol Form . Theo. Mass . Mean Δ . Putative ID . Av . StDev . . . . . . . . min ppm 9.45 0.09 341.0883 – 179, 135 C15H18O9 341.0878 1.52 Caffeic acid-hexose I 9.75 0.08 325.0930 294sh, 313 163 C15H18O8 325.0929 0.25 Coumaric acid-hexose I 10.32 0.08 341.0883 310 179, 161, 135 C15H18O9 341.0878 1.58 Caffeic acid-hexose II 11.35 0.08 341.0883 302sh, 318 281, 251, 233, 221, 179, 161, 135 C15H18O9 341.0878 1.53 Caffeic acid-hexose III 12.08 0.06 355.1036 290sh, 313 193, 177, 145 C16H20O9 355.1035 0.31 Ferulic acid-hexose I 12.58 0.07 341.0883 – 181, 179, 137, 135 C15H18O9 341.0878 1.49 Caffeic acid-hexose IV 13.32 0.05 341.0883 – 281, 221, 181, 179, 161, 137, 135 C15H18O9 341.0878 1.39 Caffeic acid-hexose V 13.43 0.07 353.0878 300sh, 327 191, 173, 127 C16H18O9 353.0878 0.01 3-Caffeoylquinic acid 13.71 0.07 325.0929 285 163, 119 C15H18O8 325.0929 0.05 Coumaric acid-hexose II 14.41 0.10 353.0878 295sh, 327 179, 173 C16H18O9 353.0878 −0.08 5-Caffeoylquinic acid (S) 15.90 0.05 355.1036 – 193, 175, 160 C16H20O9 355.1035 0.42 Ferulic acid-hexose II 15.98 0.06 341.0886 – 179 C15H18O9 341.0878 2.26 Caffeic acid-hexose VI 16.76 0.07 353.0880 323 191, 173, 161, 127 C16H18O9 353.0878 0.49 4-Caffeoylquinic acid 19.53 0.25 1,272.5901 – 1,227, 1,095, 1,065, 933, 866, 770 C57H95NO30 1,272.5866 2.75 (Esculeoside B) FA 21.42 0.04 741.1870 256, 299sh, 351 301, 271, 255 C32H38O20 741.1884 −1.82 Quercetin-hexose-deoxyhexose-pentose 22.83 0.06 1,314.6001 – 1,269, 1,137, 1,107, 974, 770, 752 C59H97NO31 1,314.5972 2.21 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA I 23.43 0.04 609.1451 256, 299sh, 355 301, 271, 255 C27H30O16 609.1461 −1.59 Quercetin-Glc-rhamnose (S) 25.48 0.16 1,314.6005 – 1,269, 1,137, 1,107, 975, 908, 866, 812, 770, 752, 275, 179, 161, 149, 143, 125, 113 C59H97NO31 1,314.5972 2.54 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA II 26.37 0.21 1,314.6021 – 1,270, 1,138, 1,108, 976, 909, 813, 753, 179, 161, 143, 125, 113 C59H97NO31 1,314.5972 3.74 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA III 26.41 0.03 593.1505 368 285 C27H30O15 593.1512 −1.09 Kaempferol-Glc-rhamnose (S) 26.44 0.39 1,094.5382 – 1,049 C51H85NO24 1,094.5389 −0.59 (Lycoperoside H) FA 32.46 0.37 1,078.5463 – 1,033, 871, 738, 576, 161, 143 C51H85NO23 1,078.5440 2.14 (α-Tomatine) FA (S) 32.59 0.22 1,136.5539 – 1,091, 958, 928, 796, 635, 149, 143, 113 C53H87NO25 1,136.5494 3.91 (Lycoperoside C) FA or (Lycoperoside B) FA or (Lycoperoside A) FA3 32.65 0.02 433.1135 315sh, 368 271, 151 C21H22O10 433.1140 −1.21 Naringenin chalcone-hexose I 41.43 0.05 271.0617 288, 303sh 151,119,107 C15H12O5 271.0612 1.84 Naringenin (S) 41.86 0.05 271.0615 365 151, 119, 107 C15H12O5 271.0612 1.15 Naringenin chalcone (S) Ret . . Av m/z . UV/Vis . MS/MS Fragments . Mol Form . Theo. Mass . Mean Δ . Putative ID . Av . StDev . . . . . . . . min ppm 9.45 0.09 341.0883 – 179, 135 C15H18O9 341.0878 1.52 Caffeic acid-hexose I 9.75 0.08 325.0930 294sh, 313 163 C15H18O8 325.0929 0.25 Coumaric acid-hexose I 10.32 0.08 341.0883 310 179, 161, 135 C15H18O9 341.0878 1.58 Caffeic acid-hexose II 11.35 0.08 341.0883 302sh, 318 281, 251, 233, 221, 179, 161, 135 C15H18O9 341.0878 1.53 Caffeic acid-hexose III 12.08 0.06 355.1036 290sh, 313 193, 177, 145 C16H20O9 355.1035 0.31 Ferulic acid-hexose I 12.58 0.07 341.0883 – 181, 179, 137, 135 C15H18O9 341.0878 1.49 Caffeic acid-hexose IV 13.32 0.05 341.0883 – 281, 221, 181, 179, 161, 137, 135 C15H18O9 341.0878 1.39 Caffeic acid-hexose V 13.43 0.07 353.0878 300sh, 327 191, 173, 127 C16H18O9 353.0878 0.01 3-Caffeoylquinic acid 13.71 0.07 325.0929 285 163, 119 C15H18O8 325.0929 0.05 Coumaric acid-hexose II 14.41 0.10 353.0878 295sh, 327 179, 173 C16H18O9 353.0878 −0.08 5-Caffeoylquinic acid (S) 15.90 0.05 355.1036 – 193, 175, 160 C16H20O9 355.1035 0.42 Ferulic acid-hexose II 15.98 0.06 341.0886 – 179 C15H18O9 341.0878 2.26 Caffeic acid-hexose VI 16.76 0.07 353.0880 323 191, 173, 161, 127 C16H18O9 353.0878 0.49 4-Caffeoylquinic acid 19.53 0.25 1,272.5901 – 1,227, 1,095, 1,065, 933, 866, 770 C57H95NO30 1,272.5866 2.75 (Esculeoside B) FA 21.42 0.04 741.1870 256, 299sh, 351 301, 271, 255 C32H38O20 741.1884 −1.82 Quercetin-hexose-deoxyhexose-pentose 22.83 0.06 1,314.6001 – 1,269, 1,137, 1,107, 974, 770, 752 C59H97NO31 1,314.5972 2.21 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA I 23.43 0.04 609.1451 256, 299sh, 355 301, 271, 255 C27H30O16 609.1461 −1.59 Quercetin-Glc-rhamnose (S) 25.48 0.16 1,314.6005 – 1,269, 1,137, 1,107, 975, 908, 866, 812, 770, 752, 275, 179, 161, 149, 143, 125, 113 C59H97NO31 1,314.5972 2.54 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA II 26.37 0.21 1,314.6021 – 1,270, 1,138, 1,108, 976, 909, 813, 753, 179, 161, 143, 125, 113 C59H97NO31 1,314.5972 3.74 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA III 26.41 0.03 593.1505 368 285 C27H30O15 593.1512 −1.09 Kaempferol-Glc-rhamnose (S) 26.44 0.39 1,094.5382 – 1,049 C51H85NO24 1,094.5389 −0.59 (Lycoperoside H) FA 32.46 0.37 1,078.5463 – 1,033, 871, 738, 576, 161, 143 C51H85NO23 1,078.5440 2.14 (α-Tomatine) FA (S) 32.59 0.22 1,136.5539 – 1,091, 958, 928, 796, 635, 149, 143, 113 C53H87NO25 1,136.5494 3.91 (Lycoperoside C) FA or (Lycoperoside B) FA or (Lycoperoside A) FA3 32.65 0.02 433.1135 315sh, 368 271, 151 C21H22O10 433.1140 −1.21 Naringenin chalcone-hexose I 41.43 0.05 271.0617 288, 303sh 151,119,107 C15H12O5 271.0612 1.84 Naringenin (S) 41.86 0.05 271.0615 365 151, 119, 107 C15H12O5 271.0612 1.15 Naringenin chalcone (S) Open in new tab Table III. Metabolites that have previously been reported in literature, identified by LC-PDA-ESI-QTOF-MS/MS (negative ionization mode) in tomato peel extracts Ret (min), Retention time, in minutes; Av, average; StDev, standard deviation; Av m/z, average found mass signal; UV/Vis, absorbance maximums in the UV/Vis range; Mol Form, molecular formula of the metabolite; Theo. Mass, theoretical monoisotopic mass calculated for the ion (M-H)−; Mean Δ (ppm), deviation between the averages of found accurate mass and real accurate mass, in ppm; Putative ID, putative identification of metabolite; () FA, formic acid adduct; −, data not found; (S), identification confirmed by the standard compound; I, II, III, IV, V, and VI, different isomers (only one reported in literature). Ret . . Av m/z . UV/Vis . MS/MS Fragments . Mol Form . Theo. Mass . Mean Δ . Putative ID . Av . StDev . . . . . . . . min ppm 9.45 0.09 341.0883 – 179, 135 C15H18O9 341.0878 1.52 Caffeic acid-hexose I 9.75 0.08 325.0930 294sh, 313 163 C15H18O8 325.0929 0.25 Coumaric acid-hexose I 10.32 0.08 341.0883 310 179, 161, 135 C15H18O9 341.0878 1.58 Caffeic acid-hexose II 11.35 0.08 341.0883 302sh, 318 281, 251, 233, 221, 179, 161, 135 C15H18O9 341.0878 1.53 Caffeic acid-hexose III 12.08 0.06 355.1036 290sh, 313 193, 177, 145 C16H20O9 355.1035 0.31 Ferulic acid-hexose I 12.58 0.07 341.0883 – 181, 179, 137, 135 C15H18O9 341.0878 1.49 Caffeic acid-hexose IV 13.32 0.05 341.0883 – 281, 221, 181, 179, 161, 137, 135 C15H18O9 341.0878 1.39 Caffeic acid-hexose V 13.43 0.07 353.0878 300sh, 327 191, 173, 127 C16H18O9 353.0878 0.01 3-Caffeoylquinic acid 13.71 0.07 325.0929 285 163, 119 C15H18O8 325.0929 0.05 Coumaric acid-hexose II 14.41 0.10 353.0878 295sh, 327 179, 173 C16H18O9 353.0878 −0.08 5-Caffeoylquinic acid (S) 15.90 0.05 355.1036 – 193, 175, 160 C16H20O9 355.1035 0.42 Ferulic acid-hexose II 15.98 0.06 341.0886 – 179 C15H18O9 341.0878 2.26 Caffeic acid-hexose VI 16.76 0.07 353.0880 323 191, 173, 161, 127 C16H18O9 353.0878 0.49 4-Caffeoylquinic acid 19.53 0.25 1,272.5901 – 1,227, 1,095, 1,065, 933, 866, 770 C57H95NO30 1,272.5866 2.75 (Esculeoside B) FA 21.42 0.04 741.1870 256, 299sh, 351 301, 271, 255 C32H38O20 741.1884 −1.82 Quercetin-hexose-deoxyhexose-pentose 22.83 0.06 1,314.6001 – 1,269, 1,137, 1,107, 974, 770, 752 C59H97NO31 1,314.5972 2.21 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA I 23.43 0.04 609.1451 256, 299sh, 355 301, 271, 255 C27H30O16 609.1461 −1.59 Quercetin-Glc-rhamnose (S) 25.48 0.16 1,314.6005 – 1,269, 1,137, 1,107, 975, 908, 866, 812, 770, 752, 275, 179, 161, 149, 143, 125, 113 C59H97NO31 1,314.5972 2.54 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA II 26.37 0.21 1,314.6021 – 1,270, 1,138, 1,108, 976, 909, 813, 753, 179, 161, 143, 125, 113 C59H97NO31 1,314.5972 3.74 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA III 26.41 0.03 593.1505 368 285 C27H30O15 593.1512 −1.09 Kaempferol-Glc-rhamnose (S) 26.44 0.39 1,094.5382 – 1,049 C51H85NO24 1,094.5389 −0.59 (Lycoperoside H) FA 32.46 0.37 1,078.5463 – 1,033, 871, 738, 576, 161, 143 C51H85NO23 1,078.5440 2.14 (α-Tomatine) FA (S) 32.59 0.22 1,136.5539 – 1,091, 958, 928, 796, 635, 149, 143, 113 C53H87NO25 1,136.5494 3.91 (Lycoperoside C) FA or (Lycoperoside B) FA or (Lycoperoside A) FA3 32.65 0.02 433.1135 315sh, 368 271, 151 C21H22O10 433.1140 −1.21 Naringenin chalcone-hexose I 41.43 0.05 271.0617 288, 303sh 151,119,107 C15H12O5 271.0612 1.84 Naringenin (S) 41.86 0.05 271.0615 365 151, 119, 107 C15H12O5 271.0612 1.15 Naringenin chalcone (S) Ret . . Av m/z . UV/Vis . MS/MS Fragments . Mol Form . Theo. Mass . Mean Δ . Putative ID . Av . StDev . . . . . . . . min ppm 9.45 0.09 341.0883 – 179, 135 C15H18O9 341.0878 1.52 Caffeic acid-hexose I 9.75 0.08 325.0930 294sh, 313 163 C15H18O8 325.0929 0.25 Coumaric acid-hexose I 10.32 0.08 341.0883 310 179, 161, 135 C15H18O9 341.0878 1.58 Caffeic acid-hexose II 11.35 0.08 341.0883 302sh, 318 281, 251, 233, 221, 179, 161, 135 C15H18O9 341.0878 1.53 Caffeic acid-hexose III 12.08 0.06 355.1036 290sh, 313 193, 177, 145 C16H20O9 355.1035 0.31 Ferulic acid-hexose I 12.58 0.07 341.0883 – 181, 179, 137, 135 C15H18O9 341.0878 1.49 Caffeic acid-hexose IV 13.32 0.05 341.0883 – 281, 221, 181, 179, 161, 137, 135 C15H18O9 341.0878 1.39 Caffeic acid-hexose V 13.43 0.07 353.0878 300sh, 327 191, 173, 127 C16H18O9 353.0878 0.01 3-Caffeoylquinic acid 13.71 0.07 325.0929 285 163, 119 C15H18O8 325.0929 0.05 Coumaric acid-hexose II 14.41 0.10 353.0878 295sh, 327 179, 173 C16H18O9 353.0878 −0.08 5-Caffeoylquinic acid (S) 15.90 0.05 355.1036 – 193, 175, 160 C16H20O9 355.1035 0.42 Ferulic acid-hexose II 15.98 0.06 341.0886 – 179 C15H18O9 341.0878 2.26 Caffeic acid-hexose VI 16.76 0.07 353.0880 323 191, 173, 161, 127 C16H18O9 353.0878 0.49 4-Caffeoylquinic acid 19.53 0.25 1,272.5901 – 1,227, 1,095, 1,065, 933, 866, 770 C57H95NO30 1,272.5866 2.75 (Esculeoside B) FA 21.42 0.04 741.1870 256, 299sh, 351 301, 271, 255 C32H38O20 741.1884 −1.82 Quercetin-hexose-deoxyhexose-pentose 22.83 0.06 1,314.6001 – 1,269, 1,137, 1,107, 974, 770, 752 C59H97NO31 1,314.5972 2.21 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA I 23.43 0.04 609.1451 256, 299sh, 355 301, 271, 255 C27H30O16 609.1461 −1.59 Quercetin-Glc-rhamnose (S) 25.48 0.16 1,314.6005 – 1,269, 1,137, 1,107, 975, 908, 866, 812, 770, 752, 275, 179, 161, 149, 143, 125, 113 C59H97NO31 1,314.5972 2.54 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA II 26.37 0.21 1,314.6021 – 1,270, 1,138, 1,108, 976, 909, 813, 753, 179, 161, 143, 125, 113 C59H97NO31 1,314.5972 3.74 (Lycoperoside G) FA or (Lycoperoside F) FA or (Esculeoside A) FA III 26.41 0.03 593.1505 368 285 C27H30O15 593.1512 −1.09 Kaempferol-Glc-rhamnose (S) 26.44 0.39 1,094.5382 – 1,049 C51H85NO24 1,094.5389 −0.59 (Lycoperoside H) FA 32.46 0.37 1,078.5463 – 1,033, 871, 738, 576, 161, 143 C51H85NO23 1,078.5440 2.14 (α-Tomatine) FA (S) 32.59 0.22 1,136.5539 – 1,091, 958, 928, 796, 635, 149, 143, 113 C53H87NO25 1,136.5494 3.91 (Lycoperoside C) FA or (Lycoperoside B) FA or (Lycoperoside A) FA3 32.65 0.02 433.1135 315sh, 368 271, 151 C21H22O10 433.1140 −1.21 Naringenin chalcone-hexose I 41.43 0.05 271.0617 288, 303sh 151,119,107 C15H12O5 271.0612 1.84 Naringenin (S) 41.86 0.05 271.0615 365 151, 119, 107 C15H12O5 271.0612 1.15 Naringenin chalcone (S) Open in new tab In an analogous way, the presence of anthocyanins was confirmed by LC-PDA-QTOF-MS/MS analysis (positive mode) in peel extracts from purple-skin tomato fruits (data not shown). Using this primarily accurate mass-directed targeted approach, about 41% (25 compounds) of the metabolites cited in Table I were identified in both tomato peel samples. In addition, caffeic acid, ferulic acid, p-coumaric acid, quercetin, and kaempferol aglycones could be detected but only after acid hydrolysis of the extract. All experimental LC-MS information gathered for these metabolites, including retention time window, accurate mass, PDA spectral information, and MS/MS data generated at different collision energies were added to the MoTo DB. Database Building The data from Table I were used as a foundation upon which to initiate the tomato fruit LC-MS database. From the molecular formula, the accurate mass of each component was calculated using the “Isotopic compositions of the elements 1997” list (Rosman and Taylor, 1998) for accurate mass assignments. The observed mass, together with a mass accuracy setting, is the main search entry for this database (Fig. 4 Figure 4. Open in new tabDownload slide A, Strategy applied for data analysis and identification of metabolites in tomato fruit, using LC-PDA-QTOF MS. Key entry into the database is the (intensity-corrected) accurate mass. B, Screenshot from the MoTo database query frame. Detected masses can be filled in (in this example m/z 609 in negative-ionization mode) and searched against the database at user-defined mass accuracy (first frame). If at least one mass hit is found in the database, the elemental compositions, deviations from accurate masses, and IUPAC names of the corresponding metabolites are indicated, as well as links to PubChem, if applicable, and our own experimental data (second frame). The last frame shows the experimental and literature information available for the selected compound. Figure 4. Open in new tabDownload slide A, Strategy applied for data analysis and identification of metabolites in tomato fruit, using LC-PDA-QTOF MS. Key entry into the database is the (intensity-corrected) accurate mass. B, Screenshot from the MoTo database query frame. Detected masses can be filled in (in this example m/z 609 in negative-ionization mode) and searched against the database at user-defined mass accuracy (first frame). If at least one mass hit is found in the database, the elemental compositions, deviations from accurate masses, and IUPAC names of the corresponding metabolites are indicated, as well as links to PubChem, if applicable, and our own experimental data (second frame). The last frame shows the experimental and literature information available for the selected compound. ). A choice on the entry form is provided to enable ionization-specific correction of mass spectrometer data, to submit the proper mass value of the uncharged molecule to the database. Mass accuracy can be set from 1 to 1,000 ppm, thus enabling the matching of data from detectors generating masses with either low or high accuracy. All other properties of the compounds are stored in a table, which can be accessed from the hit list after mass searching. Each hit suggests either a metabolite previously found in literature and validated by experimental data (Table III) or a novel compound (Table IV). Table IV. Novel metabolites identified or putatively assigned by LC-PDA-ESI-QTOF-MS/MS in tomato fruit extracts (abbreviations as in Table III) Ret . . Av m/z . UV/Vis . MS/MS Fragments . Mol Form . Theo. Mass . Mean Δ . Putative ID . Av . StDev . . . . . . . . min ppm 4.74 0.05 299.0771 251 137 C13H16O8 299.0772 −0.48 Hydroxybenzoic acid-hexose 7.42 0.07 380.1558 – 146 C15H27NO10 380.1562 −1.11 Pantothenic acid-hexose 12.99 0.05 431.1557 – 269, 161, 143, 125, 119, 113, 101 C19H28O11 431.1559 −0.43 Benzyl alcohol-dihexose 14.76 0.05 771.1989 263sh, 351 609, 463, 301 C33H40O21 771.1989 −0.01 Quercetin-dihexose-deoxyhexose 15.47 0.06 595.1665 – 475, 385, 355 C27H32O15 595.1668 −0.51 Naringenin chalcone-dihexose or Naringenin-dihexose 15.82 0.04 401.1452 – 293, 269, 233, 191, 161, 149, 131, 125, 101 C18H26O10 401.1453 −0.37 Benzyl alcohol-hexose-pentose 24.77 0.15 1,312.5872 – 1,266, 1,135, 1,105 C59H95NO31 1,312.5815 4.33 (Dehydrolycoperoside G) FA or (Dehydrolycoperoside F) FA or (Dehydroesculeoside A) FA 27.05 0.12 515.1193 301sh, 323 353, 335, 191, 179, 173 C25H24O12 515.1195 −0.45 Dicaffeoylquinic acid I 27.60 0.07 515.1191 301sh, 323 353, 191, 179 C25H24O12 515.1195 −0.72 Dicaffeoylquinic acid II 29.71 0.07 515.1188 301sh, 327 353, 299, 203, 191, 179, 173, 135 C25H24O12 515.1195 −1.40 Dicaffeoylquinic acid III 30.11 0.04 887.2246 256, 301sh, 323 741, 723, 301, 271, 255, 179 C41H44O22 887.2251 −0.57 Quercetin-hexose-deoxyhexose-pentose-p-coumaric acid 32.16 0.03 433.1137 307sh, 360 271, 151 C21H22O10 433.1140 −0.84 Naringenin chalcone-hexose II 38.40 0.08 677.1503 301sh, 327 515 C34H30O15 677.1512 −1.29 Tricaffeoylquinic acid I 39.78 0.11 677.1493 292sh, 325 515, 353, 335, 179, 173 C34H30O15 677.1512 −2.82 Tricaffeoylquinic acid II Ret . . Av m/z . UV/Vis . MS/MS Fragments . Mol Form . Theo. Mass . Mean Δ . Putative ID . Av . StDev . . . . . . . . min ppm 4.74 0.05 299.0771 251 137 C13H16O8 299.0772 −0.48 Hydroxybenzoic acid-hexose 7.42 0.07 380.1558 – 146 C15H27NO10 380.1562 −1.11 Pantothenic acid-hexose 12.99 0.05 431.1557 – 269, 161, 143, 125, 119, 113, 101 C19H28O11 431.1559 −0.43 Benzyl alcohol-dihexose 14.76 0.05 771.1989 263sh, 351 609, 463, 301 C33H40O21 771.1989 −0.01 Quercetin-dihexose-deoxyhexose 15.47 0.06 595.1665 – 475, 385, 355 C27H32O15 595.1668 −0.51 Naringenin chalcone-dihexose or Naringenin-dihexose 15.82 0.04 401.1452 – 293, 269, 233, 191, 161, 149, 131, 125, 101 C18H26O10 401.1453 −0.37 Benzyl alcohol-hexose-pentose 24.77 0.15 1,312.5872 – 1,266, 1,135, 1,105 C59H95NO31 1,312.5815 4.33 (Dehydrolycoperoside G) FA or (Dehydrolycoperoside F) FA or (Dehydroesculeoside A) FA 27.05 0.12 515.1193 301sh, 323 353, 335, 191, 179, 173 C25H24O12 515.1195 −0.45 Dicaffeoylquinic acid I 27.60 0.07 515.1191 301sh, 323 353, 191, 179 C25H24O12 515.1195 −0.72 Dicaffeoylquinic acid II 29.71 0.07 515.1188 301sh, 327 353, 299, 203, 191, 179, 173, 135 C25H24O12 515.1195 −1.40 Dicaffeoylquinic acid III 30.11 0.04 887.2246 256, 301sh, 323 741, 723, 301, 271, 255, 179 C41H44O22 887.2251 −0.57 Quercetin-hexose-deoxyhexose-pentose-p-coumaric acid 32.16 0.03 433.1137 307sh, 360 271, 151 C21H22O10 433.1140 −0.84 Naringenin chalcone-hexose II 38.40 0.08 677.1503 301sh, 327 515 C34H30O15 677.1512 −1.29 Tricaffeoylquinic acid I 39.78 0.11 677.1493 292sh, 325 515, 353, 335, 179, 173 C34H30O15 677.1512 −2.82 Tricaffeoylquinic acid II Open in new tab Table IV. Novel metabolites identified or putatively assigned by LC-PDA-ESI-QTOF-MS/MS in tomato fruit extracts (abbreviations as in Table III) Ret . . Av m/z . UV/Vis . MS/MS Fragments . Mol Form . Theo. Mass . Mean Δ . Putative ID . Av . StDev . . . . . . . . min ppm 4.74 0.05 299.0771 251 137 C13H16O8 299.0772 −0.48 Hydroxybenzoic acid-hexose 7.42 0.07 380.1558 – 146 C15H27NO10 380.1562 −1.11 Pantothenic acid-hexose 12.99 0.05 431.1557 – 269, 161, 143, 125, 119, 113, 101 C19H28O11 431.1559 −0.43 Benzyl alcohol-dihexose 14.76 0.05 771.1989 263sh, 351 609, 463, 301 C33H40O21 771.1989 −0.01 Quercetin-dihexose-deoxyhexose 15.47 0.06 595.1665 – 475, 385, 355 C27H32O15 595.1668 −0.51 Naringenin chalcone-dihexose or Naringenin-dihexose 15.82 0.04 401.1452 – 293, 269, 233, 191, 161, 149, 131, 125, 101 C18H26O10 401.1453 −0.37 Benzyl alcohol-hexose-pentose 24.77 0.15 1,312.5872 – 1,266, 1,135, 1,105 C59H95NO31 1,312.5815 4.33 (Dehydrolycoperoside G) FA or (Dehydrolycoperoside F) FA or (Dehydroesculeoside A) FA 27.05 0.12 515.1193 301sh, 323 353, 335, 191, 179, 173 C25H24O12 515.1195 −0.45 Dicaffeoylquinic acid I 27.60 0.07 515.1191 301sh, 323 353, 191, 179 C25H24O12 515.1195 −0.72 Dicaffeoylquinic acid II 29.71 0.07 515.1188 301sh, 327 353, 299, 203, 191, 179, 173, 135 C25H24O12 515.1195 −1.40 Dicaffeoylquinic acid III 30.11 0.04 887.2246 256, 301sh, 323 741, 723, 301, 271, 255, 179 C41H44O22 887.2251 −0.57 Quercetin-hexose-deoxyhexose-pentose-p-coumaric acid 32.16 0.03 433.1137 307sh, 360 271, 151 C21H22O10 433.1140 −0.84 Naringenin chalcone-hexose II 38.40 0.08 677.1503 301sh, 327 515 C34H30O15 677.1512 −1.29 Tricaffeoylquinic acid I 39.78 0.11 677.1493 292sh, 325 515, 353, 335, 179, 173 C34H30O15 677.1512 −2.82 Tricaffeoylquinic acid II Ret . . Av m/z . UV/Vis . MS/MS Fragments . Mol Form . Theo. Mass . Mean Δ . Putative ID . Av . StDev . . . . . . . . min ppm 4.74 0.05 299.0771 251 137 C13H16O8 299.0772 −0.48 Hydroxybenzoic acid-hexose 7.42 0.07 380.1558 – 146 C15H27NO10 380.1562 −1.11 Pantothenic acid-hexose 12.99 0.05 431.1557 – 269, 161, 143, 125, 119, 113, 101 C19H28O11 431.1559 −0.43 Benzyl alcohol-dihexose 14.76 0.05 771.1989 263sh, 351 609, 463, 301 C33H40O21 771.1989 −0.01 Quercetin-dihexose-deoxyhexose 15.47 0.06 595.1665 – 475, 385, 355 C27H32O15 595.1668 −0.51 Naringenin chalcone-dihexose or Naringenin-dihexose 15.82 0.04 401.1452 – 293, 269, 233, 191, 161, 149, 131, 125, 101 C18H26O10 401.1453 −0.37 Benzyl alcohol-hexose-pentose 24.77 0.15 1,312.5872 – 1,266, 1,135, 1,105 C59H95NO31 1,312.5815 4.33 (Dehydrolycoperoside G) FA or (Dehydrolycoperoside F) FA or (Dehydroesculeoside A) FA 27.05 0.12 515.1193 301sh, 323 353, 335, 191, 179, 173 C25H24O12 515.1195 −0.45 Dicaffeoylquinic acid I 27.60 0.07 515.1191 301sh, 323 353, 191, 179 C25H24O12 515.1195 −0.72 Dicaffeoylquinic acid II 29.71 0.07 515.1188 301sh, 327 353, 299, 203, 191, 179, 173, 135 C25H24O12 515.1195 −1.40 Dicaffeoylquinic acid III 30.11 0.04 887.2246 256, 301sh, 323 741, 723, 301, 271, 255, 179 C41H44O22 887.2251 −0.57 Quercetin-hexose-deoxyhexose-pentose-p-coumaric acid 32.16 0.03 433.1137 307sh, 360 271, 151 C21H22O10 433.1140 −0.84 Naringenin chalcone-hexose II 38.40 0.08 677.1503 301sh, 327 515 C34H30O15 677.1512 −1.29 Tricaffeoylquinic acid I 39.78 0.11 677.1493 292sh, 325 515, 353, 335, 179, 173 C34H30O15 677.1512 −2.82 Tricaffeoylquinic acid II Open in new tab Links with the PubChem and MedLine databases are available for extended, external searches on particular or related components. The information for each compound includes molecular formula, molecular mass, CAS number, IUPAC name, and analytical properties such as retention time, MS/MS fragments, and UV/Vis absorbance maxima, when available. Literature references related to the occurrence in tomato fruit are also listed. Since our aim is to provide a compound database with data from literature and/or experimental MS/MS data, we did not include unknown or novel compounds that have not been validated. Comparison of Metabolic Profiles of Peel and Flesh Tissues The applicability of the LC-MS platform and metabolite database to automatically extract and annotate (differentially accumulating) mass signals was tested with red, ripe fruits of tomato cultivar Money Maker. Since we are interested in the differential distribution of metabolites and their biochemical pathways between tomato fruit tissues, peel and flesh material was separated from whole ripe fruits and analyzed by LC-PDA-ESI-QTOF-MS in both positive and negative ion modes. After automatic peak extraction and alignment of samples per ionization mode using metAlign, 2,944 mass signals (signal-to-noise ratio >3) were obtained in negative mode and 4,059 in positive mode. Since both tissues had similar water content (i.e. flesh: 94%, peel: 93%; n = 8; determined by freeze drying), the intensities of their mass signals were directly comparable. For each aligned mass peak, the extracts from both tissues were compared for significant differences in signal intensity (based on eight extraction repetitions) using the Student's t test tool within metAlign. As expected, the mass profiles of these fruit tissues were markedly different. About 38% of the total of mass signals detected were significantly ≥1.5-fold higher in the peel extracts than in the flesh extracts (1,095 signals for negative mode and 1,566 for positive mode), and about 25% were higher in flesh than in peel (794 for negative mode and 880 for positive mode). Chromatographic mass peaks detected in negative ionization mode that were significantly different between the extracts from both tissues are visualized in Figure 5 Figure 5. Open in new tabDownload slide Unbiased LC-QTOF MS-based comparative profiling of aqueous-methanol extracts from peel and flesh tissues from ripe tomato fruit (var. Moneymaker). Mass chromatograms (m/z 100–1,500) were acquired in ESI-negative mode. Retention times (in minutes) and nominal masses of the most intense signals are indicated in the chromatograms (plotted as base peak intensities [BPI], from 4–50 min). A, Representative original chromatogram of peel tissue. B, Representative original chromatogram of flesh tissue. C, Differential chromatogram for metabolites that are significantly (P < 0.05; n = 8 extracts) at least 1.5-fold higher in extracts from peel compared to flesh tissue (peaks pointing upwards) or higher in extracts from flesh compared to peel tissue (peaks pointing downwards). a, Coumaric acid-hexose II; b, quercetin-hexose-deoxyhexose-pentose; c, rutin; d, kaempferol-hexose-deoxyhexose-pentose or quercetin-dideoxyhexose-pentose; e, α-tomatine; f, naringenin; g, naringenin chalcone; h, caffeic acid-hexose II; i, 3-caffeoylquinic acid; j, spirosolanol-trihexose; and k, hydroxyfurastanol tetrahexoside. Figure 5. Open in new tabDownload slide Unbiased LC-QTOF MS-based comparative profiling of aqueous-methanol extracts from peel and flesh tissues from ripe tomato fruit (var. Moneymaker). Mass chromatograms (m/z 100–1,500) were acquired in ESI-negative mode. Retention times (in minutes) and nominal masses of the most intense signals are indicated in the chromatograms (plotted as base peak intensities [BPI], from 4–50 min). A, Representative original chromatogram of peel tissue. B, Representative original chromatogram of flesh tissue. C, Differential chromatogram for metabolites that are significantly (P < 0.05; n = 8 extracts) at least 1.5-fold higher in extracts from peel compared to flesh tissue (peaks pointing upwards) or higher in extracts from flesh compared to peel tissue (peaks pointing downwards). a, Coumaric acid-hexose II; b, quercetin-hexose-deoxyhexose-pentose; c, rutin; d, kaempferol-hexose-deoxyhexose-pentose or quercetin-dideoxyhexose-pentose; e, α-tomatine; f, naringenin; g, naringenin chalcone; h, caffeic acid-hexose II; i, 3-caffeoylquinic acid; j, spirosolanol-trihexose; and k, hydroxyfurastanol tetrahexoside. . Subsequent metAccure-assisted accurate mass calculation of the differential mass peaks and searching for analogous masses in the MoTo DB indicated that flavonoids and derivatives thereof and α-tomatine were mainly occurring in the peel extracts. On the other hand, some phenylpropanoids (h, 52-fold; i, 2-fold) as well as glycosylated steroids such as glycosylated spirosolanols (j, 130-fold) were significantly higher in the flesh extracts. An intense mass signal, k, was solely detected in the extracts from flesh tissue and could be identified as the parent ion of a hydroxyfurostanol tetrahexose (e.g. tomatoside A) from the accurate mass observed ([M-H]− = 1,081.5442, C51H85O24−, 1.0 ppm difference from theoretical mass) and its MS/MS fragmentation pattern. DISCUSSION Metabolomics is developing as an important functional genomics tool. Technical improvements in the large-scale determination of metabolites in complex plant tissues and dissemination of metabolomics research data are essential (Sumner et al., 2003; Bino et al., 2004). A major challenge is to construct consolidated metabolite libraries and to develop metabolite-specific data management systems. Here we set out to establish a reproducible LC-PDA-MS-based metabolomics platform including a LC-MS metabolite database and mass-directed searching tools for a commonly used plant material, i.e. tomato fruit. An in-depth literature study was performed to obtain as much information as possible on metabolites previously detected in tomato fruits. Because tomato is an important crop, numerous analytical studies aimed at identifying its constituents have been performed. However, a number of problems arise when building such a database from the literature. First, finding the exact identity of a specific natural compound can be troublesome since common names or non-IUPAC nomenclatures are often used. Second, studies performed without MS or NMR technologies might lead to questioning the validity of at least some of the assigned compounds. Third, it is known that using harsh conditions during sample preparation may produce artifacts, which can result in the correct identification, but of a compound not occurring in the original biological sample. For instance, it has long been thought that the flavanone naringenin instead of naringenin chalcone was the main tomato flavonoid (Krause and Galensa, 1992). This is probably due to unforeseen cyclization of the chalcone to the corresponding flavanone during sample preparation and compound isolation. Likewise, some of the metabolites reported in literature have been identified after an enzymatic or chemical hydrolysis step. In the nonhydrolyzed tomato peel extract we exclusively found a range of glycosylated forms of caffeic acid, coumaric acids, and the flavonols quercetin and kaempferol, while the corresponding aglycones were only detectable after acid hydrolysis of the same sample. The amount of information obtained by a single LC-QTOF MS analysis can be extensive and the use of dedicated software for data processing and comparison is crucial. The extraction of relevant mass signals and the subsequent alignment of chromatograms were performed using metAlign (Vorst et al., 2005). An average of 2 s variation within series of analyses and 30 s between analyses over a 2-year time period is an indication of high chromatographic reproducibility. These retention time shifts are sufficiently low to align correctly and thus compare samples when analyzed under the same chromatographic conditions. Variation in metabolite retention is a known and common obstacle in LC and thus important to take into account when searching LC-MS-based databases for comparable masses. Representative retention times and retention indexes of unknown mass peaks relative to tomato key compounds, such as rutin, chlorogenic acid, and naringenin, can be of use when comparing data generated by different LC systems or with a different type of C18-reversed-phase column. MetAccure (O. Vorst, H.A. Verhoeven, C.H.R. de Vos, C.A. Maliepaard, and R.C.H.J. van Ham, unpublished data) is an important tool for automated accurate mass calculation of all aligned mass signals from the metAlign output. Within a specific range of mass signal intensities (depending on the specificities of the TOF MS and lock mass intensity used), the metAccure-assisted accurate mass calculations enabled the assignment of compounds. By calculating the average of all detected accurate masses of a certain aligned mass peak over all samples analyzed (taking into account only those scans with the correct range of ion intensities), high mass accuracies were obtained, i.e., frequently within 1 ppm and, in all cases, within 4 ppm deviation from the predicted mass (Table III). Apparently, this high mass accuracy was consistent over the entire mass range analyzed (mass-to-charge ratio [m/z] 100–1,500; accuracies better than 3 ppm were obtained for metabolites at both low [e.g. 271.0615 for naringenin chalcone] and high m/z values [e.g. 1,314.6005 for the formic acid adducts of the possible isomers lycoperoside G or F or esculeoside A]. With the QTOF instrument used, the metAccure script was able to generate appropriate accurate masses for about 10% of the total mass peaks detected in ESI-negative mode. Evidently, this percentage is highly dependent on the dynamic range of accurate mass measurements of the mass spectrometer used, as well as on the concentrations of each metabolite in the samples analyzed. By changing the lock mass-to-analyte ratio in successive analyses of the same sample it should be possible, in principle, to obtain accurate mass data for a wider range of amplitudes, leading to an expansion of the dynamic range. The identification of compounds, in particular secondary metabolites, through a metabolomic profiling approach encounters some major difficulties. First, the number of commercially available standards of secondary metabolites reported to be present in a specific plant species or tissue is low. Second, in an automated online separation, PDA detection, MS measurement, and/or MS/MS fragmentation of mass signals, it is difficult to meet optimized levels for all eluting compounds. Due to overlapping compounds, low intensity mass signals, or difficulties in the isolation of the mass signal for MS/MS fragmentation, the extraction of usable information for identification purposes can be complicated. Third, the lack of dedicated software and databases that integrate spectroscopic and MS data limits the identification procedure to a manual level. Nevertheless, by these means 43 metabolites could be readily assigned in the tomato fruit extract (Tables III and IV), leaving more to be identified. The total number of compounds detectable by our LC-MS system is difficult to calculate due to the presence of mass signals from isotopes, adducts, and unintended in-source fragmentation. Using the strategy demonstrated in this study, the assignment of compounds lies on the integration of different sources of information (accurate mass, retention time, fragmentation pattern, and UV/Vis spectra). In addition to experimental data, previous findings and biochemical evidence can complement certain putative assignments. In the MoTo DB we established searching tools to link an observed mass in LC-MS chromatograms to the putative tomato metabolite, through calculating the exact monoisotopic mass of each metabolite for both positive and negative ionization modes. Identifications can be validated using the retention time intervals, PDA spectra, and MS/MS data so far available. The link with external databases allows searching for similar molecules from other sources. Some compounds reported in literature appear to occur more than once in our chromatograms, e.g. p-coumaroylhexoside, caffeoylhexoside, and naringenin chalcone-hexoside (Table III). Apparently, these metabolites can exist as different constitutional isomers in tomato fruit. The position and/or nature of the sugar substitution can influence the polarity and therefore the retention time of the compound. From the literature it is often unclear which particular isomer is mentioned. Three chromatographic peaks corresponding to caffeoylquinic acids were found. According to previous studies with comparable analytical systems (Clifford et al., 2003), the order of elution is likely 5-caffeoylquinic acid, followed by 3-caffeoylquinic acid, and then 4-caffeoylquinic acid (Table III). Applying the same data analysis strategy, novel derivatives of phenolic acids and flavonoids were putatively assigned and information on the level of their identification are presented (Table IV). Dicaffeoylquinic acid (three isomers) and tricaffeoylquinic acid (two isomers) were identified in tomato, and novel glycosides of naringenin, naringenin chalcone, and quercetin were detected. The chromatographic separation of several isomers of coumaroyl- and caffeoylhexosides, of which only one has previously been described, also indicates the high resolution power of our LC-MS set up. MS/MS fragmentation can sometimes distinguish between constitutional isomers, however in most cases other approaches such as NMR will have to be performed to unravel the complete and exact structure of novel compounds. These NMR studies are part of our future activities in tomato metabolomics. Ideally, the combination of LC/MS/NMR should be performed for the unambiguous structure elucidation of metabolites (Exarchou et al., 2003; Sumner et al., 2003; Wolfender et al., 2003). Organizing all such analytical data into a single database will facilitate the identification of compounds and will further improve the quality and quantity of compound annotation through database searching. By making use of the MoTo DB and the LC-PDA-MS platforms established, extracts from two tissues in tomato fruit, peel and flesh, were compared for relative differences in LC-MS signals in an untargeted manner (Fig. 5). As was expected from previous experiments (e.g. Muir et al., 2001; Bovy et al., 2002) most of the flavonoid species and their glycosides were detected in the extracts of peel tissue, while in the flesh extracts these compounds were hardly or not detectable at all. The specific accumulation of flavonoids in peel is in accordance with the idea that these compounds play a role in the protection against stress, for example by UV light (Winkel-Shirley, 2002). On the other hand, by using this untargeted approach it became clear that tomato flesh contains markedly higher amounts of, among many still unknown metabolites, specific phenolic compounds such as caffeoylhexose II and 3-caffeoylquinic acid, as well as glycosylated alkaloids of the spirosolanol type. A compound uniquely present in the extracts from flesh tissue was identified as a hydroxyfurostanol tetrahexose, which might correspond to tomatoside A (Schelochkova et al., 1980). This molecule has a brassinosteroid-like structure and is structurally related to spirosolanes. Recently, highly active biosynthesis of brassinosteroids has been found in developing tomato fruits (Montoya et al., 2005). As yet, neither the biological functions nor the mechanisms underlying the specific accumulation of these phenolic acids and glycosylated spirosolanols in the flesh of the fruit are known. Clearly, further research into the differential distribution of (secondary) metabolites between peel and flesh tissues of tomato fruit, by analyzing these tissues from fruits from several cultivars, may provide novel information on tissue-specific regulation of biochemical pathways. CONCLUSION The maturation of metabolomics as the next cornerstone of functional genomics ultimately depends on the establishment of databases (Sumner et al., 2003; Bino et al., 2004). However, at the moment there are no effective database tools to query and/or comprehensively mine LC-MS-based plant metabolomics data through automated database search engines. The generation of such tools depends on the availability of metabolite databases that can be trusted and for which the source of data and its history are maintained and made publicly accessible. Here we present the first step to implement such an open access metabolite database, the MoTo DB dedicated to tomato, which intends to systematize metabolite LC-MS, MS/MS, and absorbance spectra information for common knowledge. The next step is to utilize the validated metabolomic information to study the dynamics of the metabolome, to elucidate mutants and gene functions based on differential metabolic profiles, and to decipher the biological relevance of each metabolite. The combination of information from other omics technologies can lead to a wider view on the systems biology of the plant studied. As a result, the integration of databases from these different disciplines will be inevitable. MATERIALS AND METHODS Plant Material A large pool of tomato (Lycopersicum esculentum, now Solanum lycopersicum) fruit material was prepared by combining fruits from turning, pink, and red ripe stages of development of 96 different tomato cultivars representing the three major types of tomato fruits (i.e. cherry, Dutch beef, and normal round tomatoes). These plants were grown in an environmentally controlled greenhouse located in Wageningen, The Netherlands, during the summer and autumn of 2003. Plants were grown in rock wool plugs connected to an automatic irrigation system comparable to standard commercial cultivation conditions. For analysis of anthocyanins, purple-colored fruits from offspring of a crossing of two natural mutants, Af × hp-2 j (van Tuinen et al., 2005), were harvested at the ripe stage of development. Peel (about 2 mm thickness) was removed from fruits, ground into a fine powder in liquid nitrogen, and stored at −80°C until further analysis. For metabolite profile comparison of peel and flesh, red ripe fruits of cultivar Money Maker were used of which peel (2 mm thickness) and flesh (rest of fruit) were separated and used as described. Extraction Of the frozen tomato powder, 0.5 g fresh weight was weighed and extracted with 1.5 mL pure methanol (final methanol concentration in the extract approximately 75%). Hydrolyzed extracts were prepared by sequentially adding 1 mL of 0.1% tert-butylhydroquinone in methanol solution and 0.4 mL of HCl 6 m to 0.6 g fresh weight tomato material, shaking in a water bath at 90°C to 95°C for 1 h, and adding 2 mL of methanol (Bovy et al., 2002). All samples were sonicated for 15 min, filtered through a 0.2 μm inorganic membrane filter (Anotop 10, Whatman), and analyzed. Chemicals Standard compounds p-coumaric acid, protocatechuic acid, salicylic acid, caffeic acid, ferulic acid, cinnamic acid, myricetin, and naringenin were purchased from ICN; p-hydroxybenzoic acid, chlorogenic acid quercetin, Phe, sinapic acid, and α-tomatine from Sigma; vanillic acid and rutin (quercetin-3-O-rutinoside) from Acros; naringenin chalcone from Apin Chemicals, kaempferol and kaempferol-3-O-rutinoside from Extrasynthese; and tert-butylhydroquinone from Aldrich. Acetonitrile HPLC supragradient and methanol absolute HPLC supragradient were obtained from Biosolve. Formic acid for synthesis 98% to 100% was from Merck-Schuchardt, HCl 37% for analysis from Acros, and ultrapure water was obtained from an Elga Maxima purification unit (Bucks). Leucine enkaphaline was purchased from Sigma. Chromatographic Conditions HPLC was carried out using a Waters Alliance 2795 HT system with a column oven. For chromatographic separation, a Luna C18(2) precolumn (2.0 × 4 mm) and analytical column (2.0 × 150 mm, 100 Å, particle size 3 μm) from Phenomenex were used. Five microliters of sample was injected into the system for LC-PDA-MS analysis. Degassed solutions of formic acid:ultrapure water (1:103, v/v; eluent A) and formic acid:acetonitrile (1:103, v/v; eluent B) were pumped at 0.19 mL min−1 into the HPLC system. The gradient applied started at 5% B and increased linearly to 35% B in 45 min. Then, for 15 min the column was washed and equilibrated before the next injection. The column temperature was kept at 40°C and the samples at 20°C. The room temperature was maintained at 20°C. Detection of Metabolites by PDA and MS The HPLC system was connected online to a Waters 2996 PDA detector, set to acquire data every second from 240 to 600 nm with a resolution of 4.8 nm, and subsequently to a QTOF Ultima V4.00.00 mass spectrometer (Waters-Corporation, MS technologies). An ESI source working either in positive or negative ion mode was used for all MS analyses. Before each series of analyses, the mass spectrometer was calibrated using phosphoric acid:acetonitrile:water (1:103:103, v/v) solution. Capillary voltage, collision energy, and desolvation temperature were optimized to obtain a series of phosphoric acid clusters suitable for calibration between m/z 80 and 1,500. During sample analyses, the capillary voltage was set to 2.75 kV and the cone at 35 V. Source and desolvation temperatures were set to 120°C and 250°C, respectively. Cone gas and desolvation gas flows were 50 and 500 Lh−1, respectively. In the positive ion mode, the collision energy was 5 eV while in the negative ion mode it was 10 eV. Resolution was set at 10,000 and during calibration the MS parameters were adjusted to achieve such a resolution. TOF-MS data were acquired in centroid mode. During LC-MS analyses scan durations of 0.9 s and an interscan time of 0.1 s were used. For LC-MS/MS measurements, 10 μL of sample was injected into the system and MS/MS measurements were made with 0.40 s of scan duration and 0.10 s of interscan delay with increasing collision energies according to the following program: 5 (ESI positive) or 10 (ESI negative), 15, 30, and 50 eV. A lockspray source was equipped with the mass spectrometer allowing online mass correction to obtain high mass accuracy of analytes. Leucine enkephalin, [M+H]+ = 556.2766 and [M-H]− = 554.2620, was used as a lock mass, being continuously sprayed into a second ESI source using an LKB Bromma 2150 HPLC pump, and sampled every 10 s, producing an average intensity of 500 counts/scan in centroid mode (approximately 100 count/scan in continuum mode). Data Analysis and Alignment Acquisition of LC-PDA-MS data was performed under MassLynx 4.0 (Waters). MassLynx was used for visualization and manual processing of LC-PDA-MS/MS data. Mass data were automatically processed by metAlign version 1.0 (www.metalign.nl). MetAlign transforms accurate masses into nominal masses to shorten the calculation time and minimize the number of mass bins. Baseline and noise calculations were performed from scan number 225 to 2,475, corresponding to retention times 4.0 min to 49.3 min. The maximum amplitude was set to 15,000 and peaks below three times the local noise were discarded. The .csv file output containing nominal mass peak intensity data (peak heights, i.e. ion counts/scan at the center of the peak) at aligned retention times (scans) over all samples processed was used for further data processing. A script called metAccure was used for the calculation of accurate masses from the metAlign-extracted peaks. MetAccure calculates the accurate mass, using only those scans in which signal intensities are within a user-defined window relative to the lock mass intensity of each mass signal using the .csv files containing retention time alignments, originating from metAlign analysis, in combination with the original data in NetCDF format, created from MassLynx.raw files by Dbridge (O. Vorst, H.A. Verhoeven, C.H.R. de Vos, C.A. Maliepaard, and R.C.H.J. van Ham, unpublished data). Comparison of extracts from peel and flesh tissues for significant differences in intensity of each aligned mass signal was made using the t-student statistical tool within metAlign (level of significance set at 0.05). The settings for baseline corrections and signal alignment were analogous to those described above. Annotation of Metabolites Datasets obtained after metAlign and metAccure treatment were analyzed as (retention time×accurate mass×peak intensity) matrixes for metabolite identification. [M+H]+ and [M-H]− values were calculated for metabolites present in Table I and used for sorting with the matrixes. Data collected during the first 4.0 min of chromatography were discarded. Novel metabolites were identified by calculating the elemental composition from accurate mass measurements using the MassLynx software. The tolerance was set at 5 ppm, taking into account the correct analyte-lock mass signal ratio. For an observed accurate mass, a list of possible molecular formulas was obtained, selected for the presence of C, H, O, and N. In addition, raw datasets were checked manually in MassLynx for retention time, UV/Vis spectra, and QTOF-MS/MS fragmentation patterns for chromatographically separated peaks, complementing the accurate mass-based elemental formulas. The combination of accurate mass data, retention time (as an indication of polarity), UV/Vis spectra, and MS/MS data allowed a putative identification of metabolites. Best matches were searched in the Dictionary of Natural Products and SciFinder databases for possible structures. The putative identifications were confirmed by published data and with standard compounds, if commercially available. MoTo DB Buildup Based on available literature information about compounds identified in tomato, information acquired from LC-PDA-MS analysis of tomato fruit was used to validate each metabolite: (1) a retention time; (2) accurate mass in the form of monoisotopic mass (neutral) and in the ion forms (M+H)+ and (M-H)−; (3) elemental compositions; (4) MS/MS fragments; and (5) maximum absorbance peaks in UV/Vis. Given a found mass and a Δppm (or ΔmD) that is set by the user, the database can find possible matches. Formic acid, if detected, was also included in the database. The database is implemented in MySQL and running on a Linux cluster. ACKNOWLEDGMENTS We kindly thank Arjen Lommen for providing the software for LC-MS data analysis, Sjef Boeren for assistance in some of the MS/MS measurements, Ageeth van Tuinen for providing the anthocyanin-rich tomatoes, and Robert Hall and Sacco de Vries for carefully reading the manuscript. We thank Roeland van Ham and Velitchka Mihaleva for their useful comments during the construction of the database. We are also grateful to Syngenta Seeds, Seminis, Enza Zaden, Rijk Zwaan, Nickerson-Zwaan, and De Ruiter Seeds for providing the seeds of the 96 tomato cultivars. LITERATURE CITED Bianco G, Schmitt-Kopplin P, De Benedetto G, Kettrup A, Cataldi TR ( 2002 ) Determination of glycoalkaloids and relative aglycones by nonaqueous capillary electrophoresis coupled with electrospray ionization-ion trap mass spectrometry. Electrophoresis 23 : 2904 –2912 Bino RJ, de Vos CHR, Lieberman M, Hall RD, Bovy A, Jonker HH, Tikunov Y, Lommen A, Moco S, Levin I ( 2005 ) The light-hyperresponsive high pigment-2dg mutation of tomato: alterations in the fruit metabolome. New Phytol 166 : 427 –438 Bino RJ, Hall RD, Fiehn O, Kopka J, Saito K, Draper J, Nikolau BJ, Mendes P, Roessner-Tunali U, Beale MH, et al ( 2004 ) Potential of metabolomics as a functional genomics tool. Trends Plant Sci 9 : 418 –425 Bovy A, de Vos CHR, Kemper M, Schijlen E, Almenar Pertejo M, Muir S, Collins G, Robinson S, Verhoeyen M, Hughes S, et al ( 2002 ) High-flavonol tomatoes resulting from the heterologous expression of the maize transcription factor genes LC and C1. Plant Cell 14 : 2509 –2526 Buta JG, Spaulding DW ( 1997 ) Endogenous levels of phenolics in tomato fruit during growth and maturation. J Plant Growth Regul 16 : 43 –46 Clifford MN, Johnston KL, Knight S, Kuhnert N ( 2003 ) Hierarchical scheme for LC-MSn identification of chlorogenic acids. J Agric Food Chem 51 : 2900 –2911 Colombo M, Sirtori FR, Rizzo V ( 2004 ) A fully automated method for accurate mass determination using high-performance liquid chromatography with a quadrupole/orthogonal acceleration time-of-flight mass spectrometer. Rapid Commun Mass Spectrom 18 : 511 –517 Crozier A, Lean MEJ, McDonald MS, Black C ( 1997 ) Quantitative analysis of the flavonoid content of commercial tomatoes, onions, lettuce, and celery. J Agric Food Chem 45 : 590 –595 Dixon RA, Strack D ( 2003 ) Phytochemistry meets genome analysis, and beyond. Phytochemistry 62 : 815 –816 Exarchou V, Godejohann M, van Beek TA, Gerothanassis IP, Vervoort J ( 2003 ) LC-UV-solid-phase extraction-NMR-MS combined with a cryogenic flow probe and its application to the identification of compounds present in Greek oregano. Anal Chem 75 : 6288 –6294 Fleuriet A, Macheix JJ ( 1977 ) Effect des blessures sur les composés phénoliques des fruits de tomates ≪cerise≫ (Lycopersicum esculentum var. cerasiforme). Physiol Veg 15 : 239 –250 Fleuriet A, Macheix J-J ( 1981 ) Quinyl esters and glucose derivatives of hydroxycinnamic acids during growth and ripening of tomato fruit. Phytochemistry 20 : 667 –671 Friedman M ( 2002 ) Tomato glycoalkaloids: role in the plant and in the diet. J Agric Food Chem 50 : 5751 –5780 Friedman M, Kozukue N, Harden LA ( 1997 ) Structure of the tomato glycoalkaloid tomatidenol-3-beta-lycotetraose (dehydrotomatine). J Agric Food Chem 45 : 1541 –1547 Friedman M, Kozukue N, Harden LA ( 1998 ) Preparation and characterization of acid hydrolysis products of the tomato glycoalkaloid alpha-tomatine. J Agric Food Chem 46 : 2096 –2101 Friedman M, Levin CE, Mcdonald GM ( 1994 ) α-Tomatine determination in tomatoes by HPLC using pulsed amperometric detection. J Agric Food Chem 42 : 1959 –1964 Fujiwara Y, Takaki A, Uehara Y, Ikeda T, Okawa M, Yamauchi K, Ono M, Yoshimitsu H, Nohara T ( 2004 ) Tomato steroidal alkaloid glycosides, esculeosides A and B, from ripe fruits. Tetrahedron 60 : 4915 –4920 Fujiwara Y, Yahara S, Ikeda T, Ono M, Nohara T ( 2003 ) Cytotoxic major saponin from tomato fruits. Chem Pharm Bull (Tokyo) 51 : 234 –235 Hertog MGL, Hollman PCH, Katan MB ( 1992 ) Content of potentially anticarcinogenic flavonoids of 28 vegetables and 9 fruits commonly consumed in the Netherlands. J Agric Food Chem 40 : 2379 –2383 Hunt GM, Baker EA ( 1980 ) Phenolic constituents of tomato fruit cuticles. Phytochemistry 19 : 1415 –1419 Jones CM, Mes P, Myers JR ( 2003 ) Characterization and inheritance of the Anthocyanin fruit (Aft) tomato. J Hered 94 : 449 –456 Justesen U, Knuthsen P, Leth T ( 1998 ) Quantitative analysis of flavonols, flavones, and flavanones in fruits, vegetables and beverages by high-performance liquid chromatography with photo-diode array and mass spectrometric detection. J Chromatogr A 799 : 101 –110 Juvik JA, Stevens MA, Rick CM ( 1982 ) Survey of the genus Lycopersicon for variability in alpha-tomatine content. HortScience 17 : 764 –766 Kozukue N, Friedman M ( 2003 ) Tomatine, chlorophyll, beta-carotene and lycopene content in tomatoes during growth and maturation. J Sci Food Agric 83 : 195 –200 Krause M, Galensa R ( 1992 ) Determination of naringenin and naringenin-chalcone in tomato skins by reversed phase HPLC after solid-phase extraction. Z Lebensm Unters Forsch 194 : 29 –32 Le Gall G, Colquhoun IJ, Davis AL, Collins GJ, Verhoeyen ME ( 2003 a) Metabolite profiling of tomato (Lycopersicon esculentum) using 1H NMR spectroscopy as a tool to detect potential unintended effects following a genetic modification. J Agric Food Chem 51 : 2447 –2456 Le Gall G, DuPont MS, Mellon FA, Davis AL, Collins GJ, Verhoeyen ME, Colquhoun IJ ( 2003 b) Characterization and content of flavonoid glycosides in genetically modified tomato (Lycopersicon esculentum) fruits. J Agric Food Chem 51 : 2438 –2446 Martinez-Valverde I, Periago MJ, Provan G, Chesson A ( 2002 ) Phenolic compounds, lycopene and antioxidant activity in commercial varieties of tomato (Lycopersicum esculentum). J Sci Food Agric 82 : 323 –330 Mathews H, Clendennen SK, Caldwell CG, Liu XL, Connors K, Matheis N, Schuster DK, Menasco DJ, Wagoner W, Lightner J, et al ( 2003 ) Activation tagging in tomato identifies a transcriptional regulator of anthocyanin biosynthesis, modification, and transport. Plant Cell 15 : 1689 –1703 Mattila P, Kumpulainen J ( 2002 ) Determination of free and total phenolic acids in plant-derived foods by HPLC with diode-array detection. J Agric Food Chem 50 : 3660 –3667 Minoggio M, Bramati L, Simonetti P, Gardana C, Iemoli L, Santangelo E, Mauri PL, Spigno P, Soressi GP, Pietta PG ( 2003 ) Polyphenol pattern and antioxidant activity of different tomato lines and cultivars. Ann Nutr Metab 47 : 64 –69 Montoya T, Nomura T, Yokota T, Farrar K, Harrison K, Jones JG, Kaneta T, Kamiya Y, Szekeres M, Bishop GJ ( 2005 ) Patterns of Dwarf expression and brassinosteroid accumulation in tomato reveal the importance of brassinosteroid synthesis during fruit development. Plant J 42 : 262 –269 Muir SR, Collins GJ, Robinson S, Hughes S, Bovy A, De Vos CHR, van Tunen AJ, Verhoeyen ME ( 2001 ) Overexpression of petunia chalcone isomerase in tomato results in fruit containing increased levels of flavonols. Nat Biotechnol 19 : 470 –474 Petró-Turza M ( 1987 ) Flavor of tomato and tomato products. Food Rev Int 2 : 309 –351 Raffo A, Leonardi C, Fogliano V, Ambrosino P, Salucci M, Gennaro L, Bugianesi R, Giuffrida F, Quaglia G ( 2002 ) Nutritional value of cherry tomatoes (Lycopersicon esculentum cv. Naomi F1) harvested at different ripening stages. J Agric Food Chem 50 : 6550 –6556 Reschke A, Herrmann K ( 1982 ) Vorkommen von 1-O-hydroxycinnamyl-β-D-glucosen im gemüse. 1. Phenolcarbonsäure-verbindungen des gemüses. Z Lebensm-Unters-Forsch 174 : 5 –8 Rosman KJR, Taylor PDP ( 1998 ) Isotopic compositions of the elements 1997. Pure Appl Chem 70 : 217 –235 Sakakibara H, Honda Y, Nakagawa S, Ashida H, Kanazawa K ( 2003 ) Simultaneous determination of all polyphenols in vegetables, fruits, and teas. J Agric Food Chem 51 : 571 –581 Schauer N, Steinhauser D, Strelkov S, Schomburg D, Allison G, Moritz T, Lundgren K, Roessner-Tunali U, Forbes MG, Willmitzer L, et al ( 2005 ) GC-MS libraries for the rapid identification of metabolites in complex biological samples. FEBS Lett 579 : 1332 –1337 Schmidtlein H, Herrmann K ( 1975 ) Über die phenolsäuren des gemüses. II. Hydroxyzimtsäuren und hydroxybenzoesäuren der frucht- und samengemüsearten. Z Lebensm Unters Forsch 159 : 213 –218 Schelochkova AP, Vollerner JS, Koshoev KK ( 1980 ) Tomatoside A from Licopersicum esculentum seeds. Khim Prir Soedin 4 : 533 –540 Stewart AJ, Bozonnet S, Mullen W, Jenkins GI, Lean MEJ, Crozier A ( 2000 ) Occurrence of flavonols in tomatoes and tomato-based products. J Agric Food Chem 48 : 2663 –2669 Sumner LW, Mendes P, Dixon RA ( 2003 ) Plant metabolomics: large-scale phytochemistry in the functional genomics era. Phytochemistry 62 : 817 –836 Tikunov Y, Lommen A, de Vos CHR, Verhoeven HA, Bino RJ, Hall RD, Bovy AG ( 2005 ) A novel approach for nontargeted data analysis for metabolomics: large-scale profiling of tomato fruit volatiles. Plant Physiol 139 : 1125 –1137 Tokusoglu O, Unal MK, Yildirim Z ( 2003 ) HPLC-UV and GC-MS characterization of the flavonol aglycons quercetin, kaempferol, and myricetin in tomato pastes and other tomato-based products. Acta Chromatogr 13 : 196 –207 van Tuinen A, de Vos CHR, Hall RD, van der Plas LHW, Bowler C, Bino RJ ( 2005 ) Use of metabolomics for development of tomato mutants with enhanced nutritional value by exploiting natural non-GMO light-hyperresponsive mutants. In P Jaiwal, ed, Plant Genetic Engineering: Improvement of the Nutritional and the Therapeutic Qualities of Plants. Agritech Publications/Agricell Report, Shrub Oak, NY von Roepenack-Lahaye E, Degenkolb T, Zerjeski M, Franz M, Roth U, Wessjohann L, Schmidt J, Scheel D, Clemens S ( 2004 ) Profiling of Arabidopsis secondary metabolites by capillary liquid chromatography coupled to electrospray ionization quadrupole time-of-flight mass spectrometry. Plant Physiol 134 : 548 –559 Vorst O, de Vos CHR, Lommen A, Staps RV, Visser RGF, Bino RJ, Hall RD ( 2005 ) A non-directed approach to the differential analysis of multiple LC-MS derived metabolic profiles. Metabolomics 1 : 169 –180 Willker W, Leibfritz D ( 1992 ) Complete assignment and conformational studies of tomatine and tomatidine. Magn Reson Chem 30 : 645 –650 Winkel-Shirley B ( 2002 ) Biosynthesis of flavonoids and effects of stress. Curr Opin Plant Biol 5 : 218 –223 Winter M, Herrmann K ( 1986 ) Esters and glucosides of hydroxycinnamic acids in vegetables. J Agric Food Chem 34 : 616 –620 Wolfender JL, Ndjoko K, Hostettmann K ( 2003 ) Liquid chromatography with ultraviolet absorbance-mass spectrometric detection and with nuclear magnetic resonance spectroscopy: a powerful combination for the on-line structural investigation of plant metabolites. J Chromatogr A 1000 : 437 –455 Yahara S, Uda N, Nohara T ( 1996 ) Lycoperosides A-C, three stereoisomeric 23-acetoxyspirosolan-3 beta-ol beta-lycotetraosides from Lycopersicon esculentum. Phytochemistry 42 : 169 –172 Yahara S, Uda N, Yoshio E, Yae E ( 2004 ) Steroidal alkaloid glycosides from tomato (Lycopersicon esculentum). J Nat Prod 67 : 500 –502 Yoshizaki M, Matsushita S, Fujiwara Y, Ikeda T, Ono M, Nohara T ( 2005 ) Tomato new sapogenols, isoesculeogenin A and esculeogenin B. Chem Pharm Bull (Tokyo) 53 : 839 –840 Author notes 1 This work was supported by the European Community-Access to Research Infrastructure action of the Improving Human Potential Program (grant no. HPRI–CT–1999–00085), the EU RTD project Capillary NMR (grant no. HPRI–CT–1999–50018), and the research programme of the Centre of BioSystems Genomics that is a part of The Netherlands Genomics Initiative/Netherlands Organization for Scientific Research. * Corresponding author; e-mail [email protected]; fax 31–317–484801. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Sofia Moco ([email protected]). www.plantphysiol.org/cgi/doi/10.1104/pp.106.078428. © 2006 American Society of Plant Biologists This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

journal article

LitStream Collection

Consensus by Democracy. Using Meta-Analyses of Microarray and Genomic Data to Model the Cold Acclimation Signaling Pathway in Arabidopsis

Benedict, Catherine; Geisler, Matt; Trygg, Johan; Huner, Norman; Hurry, Vaughan

2006 Plant Physiology

doi: 10.1104/pp.106.083527pmid: 16896234

Abstract The whole-genome response of Arabidopsis (Arabidopsis thaliana) exposed to different types and durations of abiotic stress has now been described by a wealth of publicly available microarray data. When combined with studies of how gene expression is affected in mutant and transgenic Arabidopsis with altered ability to transduce the low temperature signal, these data can be used to test the interactions between various low temperature-associated transcription factors and their regulons. We quantized a collection of Affymetrix microarray data so that each gene in a particular regulon could vote on whether a cis-element found in its promoter conferred induction (+1), repression (−1), or no transcriptional change (0) during cold stress. By statistically comparing these election results with the voting behavior of all genes on the same gene chip, we verified the bioactivity of novel cis-elements and defined whether they were inductive or repressive. Using in silico mutagenesis we identified functional binding consensus variants for the transcription factors studied. Our results suggest that the previously identified ICEr1 (induction of CBF expression region 1) consensus does not correlate with cold gene induction, while the ICEr3/ICEr4 consensuses identified using our algorithms are present in regulons of genes that were induced coordinate with observed ICE1 transcript accumulation and temporally preceding genes containing the dehydration response element. Statistical analysis of overlap and cis-element enrichment in the ICE1, CBF2, ZAT12, HOS9, and PHYA regulons enabled us to construct a regulatory network supported by multiple lines of evidence that can be used for future hypothesis testing. The sequencing of the Arabidopsis (Arabidopsis thaliana) genome and the subsequent development of microarrays supporting near full-genome transcriptomic studies has resulted in the generation of large collections of experimental data describing the transcriptomes of wild-type, mutant, and transgenic plants under a variety of growth conditions, including exposure to biotic and abiotic stress. However, a clear challenge facing plant biologists is to devise methods to uncover connections between different stress regulons using the additional insights provided by the available Arabidopsis genomic map. Near full-genome array experiments lend themselves to promoter-content-based hypothesis testing in that they provide an open experimental architecture: No a priori knowledge of the plant's transcriptional response is needed to select the genomic population of genes sharing a common cis-element (cis-regulon) and almost all of the known genes containing a cis-element of interest will have corresponding expression data. The large number of data points generated by each new microarray experiment allows us to establish whether observed overlaps in regulon characteristics (e.g. gene identities or promoter elements) are occurring at rates significantly less or greater than random chance. Thus, we can statistically establish the links between transcription factors (TFs) identified by reverse genetics (and characterized in microarray studies) and the cis-elements shown to appear at frequencies greater than chance in the promoters of their regulon members. The wealth of data also facilitates the establishment of links between cis-elements identified based on binding assays or enrichment, and a statistically verified bioactivity based on patterns of the cis-regulon's expression in wild-type plants during time-course experiments, as well as between stress-associated, bioactive cis-elements and novel pathways/treatments by virtue of their statistically significant over- or underrepresentation in these differentially expressed gene groups of interest. One very active area of current research is the transcriptional response of the model plant Arabidopsis to low temperatures. Recent reviews describing current models of the low temperature transcriptional responses show a linear cascade consisting of a nonphosphorylated form of the c-MYC-like basic helix-loop-helix Inducer of CBF Expression 1 (ICE1) protein that is phosphorylated following exposure to low temperatures and subsequently activates transcription of the CBF3 transcriptional activator (Chinnusamy et al., 2006; Nakashima and Yamaguchi-Shinozaki, 2006). CBF3, along with its low temperature-responsive paralogs CBF1 and CBF2, in turn activate the transcription of members of the CBF regulon, a group of genes containing the dehydration response element (DRE)/C-repeat element in their promoters. The CBF regulon(s) includes the transcriptional activators ZAT10, RAP2.1, and RAP2.6, along with many other well-known COR genes (Fowler and Thomashow, 2002). The ZAT12 transcriptional repressor, which has been placed outside the ICE1-directed cascade (Nakashima and Yamaguchi-Shinozaki, 2006), is responsible for repression of the CBFs and their downstream TFs, in addition to its own regulon (Vogel et al., 2005). HOS9, thought to be an ICE1-independent constitutively expressed transcriptional repressor, also modulates low temperature gene transcription and plant-freezing tolerance (Zhu et al., 2004). However, because the ICE1-directed cascade contains at least 52 cold-regulated TFs (Lee et al., 2005) and the ice1 mutation appears to affect plant hormonal homeostasis (Lee et al., 2005), there are still many more components to positionally map within the cascade and connect in terms of their interaction with other up- and downstream TFs. In light of the combinatory nature of this task and the time-consuming lab bench procedures required to test each possible connection, it is unlikely that the entire ICE1-mediated TF cascade will be mapped using conventional methods. However, if we can bioinformatically reduce the number of bench-top verifications needed, this task becomes more feasible. Past studies have defined the gene regulons controlled by different ICE1 cascade TFs based on the subsets of genes that are both cold responsive in wild-type plants and differentially expressed in TF overexpressors/mutants (Chinnusamy et al., 2003; Lee et al., 2005; Vogel et al., 2005). An equally informative and complementary representation of a TF's regulon is the group of genes whose promoters contain the defined consensus binding sequence (cis-element) of that TF (its cis-regulon). Analysis of the behavior of the cis-regulon associated with a TF helps eliminate the biasing effect of its interconnected downstream regulons and is expected to be more robust across various experiments and between different experimenters. Because many of the TFs within the ICE1 cascade have been analyzed in Arabidopsis overexpressor and mutant plants using near full-genome microarrays (Boyce et al., 2003; Chinnusamy et al., 2003; Zhu et al., 2004; Lee et al., 2005; Vogel et al., 2005), many of the connections in the current model of the ICE1 cascade (Fig. 1 Figure 1. Open in new tabDownload slide Schematic representation of low temperature signal transduction assembled from reviews published by Chinnusamy et al. (2006) and Nakashima and Yamaguchi-Shinozaki (2006). Protein TFs are represented by ovals while TF regulons are represented by “REG” boxes. Where reported, the consensus sequences (cis-elements) bound by the upstream TF are shown as color-coded boxes within the promoter space (solid line) preceding the REG box or TF gene coding sequence (blue box, DRE; green box, ICEr1; pink box, ICEr2; and yellow box, EP2). Regulatory connections are mapped using triangular arrowheads to represent gene activation and flattened arrowheads to represent gene repression. Dashed arrows represent regulatory connections with experimental support but accomplished by unknown (and possibly posttranscriptional) mechanisms. Protein phosphorylation events, where known, are indicated by circles attached to the protein TFs. Genes modeled (and corresponding Arabidopsis Genome Initiative no.) are ICE1, AT3G26744; ZAT12, AT5G59820; HOS9, AT2G01500; CBF2, AT4G25470; CBF1, AT4G25490; CBF3, AT4G25480; ZAT10, AT1G27730; RAP2.1, AT1G46768; and RAP2.6, AT1G43160. Figure 1. Open in new tabDownload slide Schematic representation of low temperature signal transduction assembled from reviews published by Chinnusamy et al. (2006) and Nakashima and Yamaguchi-Shinozaki (2006). Protein TFs are represented by ovals while TF regulons are represented by “REG” boxes. Where reported, the consensus sequences (cis-elements) bound by the upstream TF are shown as color-coded boxes within the promoter space (solid line) preceding the REG box or TF gene coding sequence (blue box, DRE; green box, ICEr1; pink box, ICEr2; and yellow box, EP2). Regulatory connections are mapped using triangular arrowheads to represent gene activation and flattened arrowheads to represent gene repression. Dashed arrows represent regulatory connections with experimental support but accomplished by unknown (and possibly posttranscriptional) mechanisms. Protein phosphorylation events, where known, are indicated by circles attached to the protein TFs. Genes modeled (and corresponding Arabidopsis Genome Initiative no.) are ICE1, AT3G26744; ZAT12, AT5G59820; HOS9, AT2G01500; CBF2, AT4G25470; CBF1, AT4G25490; CBF3, AT4G25480; ZAT10, AT1G27730; RAP2.1, AT1G46768; and RAP2.6, AT1G43160. ) can now be tested using cis-regulon analysis. While the role of ICE1 in inducing CBF1 and CBF3 is well known, we were interested in examining the nature of the rest of the ICE1-controlled gene regulon. Specifically, we wanted to know how ICE1 interacts with the other reported low-temperature signaling transcriptional activators and repressors, and if the ICE1-mediated low temperature signaling cascade is a linear chain of transcriptional activations/repressions eventually leading to COR expression, or if the feedback mechanisms observed to occur among members of the CBF TFs also extend to ICE1 transcription. We present the ICE1-mediated transcriptional signaling cascade as a case study for our meta-analysis of microarray and genomic data. Our analysis shows that the previously identified ICEr1 consensus (Chinnusamy et al., 2003; Zarka et al., 2003) does not correlate with cold gene induction, while our bioinformatically identified ICE1-binding consensuses ICEr3 and ICEr4 are present in regulons of genes whose inductions are coordinate with observed ICE1 transcript accumulation and temporally precede that of genes containing the DRE. Additionally, the ICE1-binding consensus ICEr3 is similar to a cis-element previously reported to be phytochrome A (phyA) responsive (Hudson and Quail, 2003), potentially explaining the observed circadian gating and light responsiveness of the CBF-mediated/DRE regulon. We statistically link HOS9, CBF2, ZAT12, NAC072, and PHYA to the ICE1-mediated low temperature signaling pathway. Our set of correlation data was used to model a low temperature signaling framework that can be used for future hypothesis testing. RESULTS The Current ICE1-Signaling Model Lacks Consensus Our current understanding of the ICE1-mediated low temperature transcriptional cascade has been established through both forward and reverse genetic approaches coupled with microarray analyses. Based on recent reviews (Chinnusamy et al., 2006; Nakashima and Yamaguchi-Shinozaki, 2006) this cascade can be represented as a linear signaling path with several outliers that are known to be involved but with their linkages, if any, to other downstream targets of the cascade unclear (Fig. 1). The lack of well-defined consensus binding sequences (cis-elements) for many of these TFs makes identifying and mapping their target genes difficult. To date, fewer than a dozen cold-responsive cis-elements have been identified and reported to the PLACE cis-element database (http://www.dna.affrc.go.jp/PLACE). One way to fill in these gaps is to combine existing microarray results deposited into public databases with the genomic promoter data available online in a novel database that cross-references individual gene responsiveness to TF overexpression/mutation and low temperature with gene promoter content (i.e. the presence or absence of cis-elements of interest). However, before we can do this we need a collection of putative cis-element candidates to consider. Identifying Election Candidates: Inclusive Promoter Sampling of TF Overexpressor/Mutant Regulons to Find Putative cis-Elements Our first criteria for establishing a regulatory link between any single TF and its preferred cis-element is that Arabidopsis plants overexpressing the TF should induce or repress a regulon of genes containing that cis-element in their 500 bp promoter. Therefore, a motif search of the promoters of all genes differentially induced/repressed in the TF overexpressor should identify this cis-element as one of the nucleic acid consensuses overrepresented in the promoter population. As the furthest upstream TF identified in the cold-signaling pathway to date, we began our analysis with the ICE1 microarray data reported by Chinnusamy et al. (2003) and Lee et al. (2005). Because constitutive ICE1 expression was insufficient to activate gene transcription in Arabidopsis (Chinnusamy et al., 2003), and because only a cold treatment induced CBF3-driven reporter and endogenous CBF expression in ICE1-overexpressing Arabidopsis, we submitted the promoters corresponding to the lists of 217 and 109 cold-responsive genes reported by Chinnusamy et al. (2003) and Lee et al. (2005) as less induced or nonresponsive in the ice1 mutants versus wild-type plants at 6 and 3 h cold treatment, respectively, to the Inclusive Motif Sampler (Coessens et al., 2003). These analyses identified CRCGT/CACGT as the potential ICE1-binding consensus core favored in vivo (Supplemental Fig. 1A), as judged by log likelihood score (Supplemental Table I). The ICEr1 element (CACATG) identified by Zarka et al. (2003) as responsible for CBF2 induction and shown to be bound by mutant and wild-type forms of the ICE1 protein in vitro (Chinnusamy et al., 2003) was not enriched in the ICE1-affected gene lists. To obtain sufficient statistical power for our bioinformatic analyses, we required consensus sequences containing a minimum of 6 bp, and therefore extended the novel element CACGT to include the preceding A/C base variation observed in our Inclusive Motif search (Supplemental Fig. 1A). Because the sequences ACACGT and CCACGT represented potentially novel cis-elements in the ICE1 regulon, we flagged them for further bioinformatic analyses. However, PatMatch queries for the list of all Arabidopsis genes containing ACACGT in the 500 bp promoter space returned a list that included genes containing the abscisic acid (ABA) response element (ABRE; GACACGT). Because ABRE was identified as enriched in the CBF3 regulon in its reverse complementary form (ACGTGTC; see Maruyama et al., 2004), we filtered the results of all further bioinformatic analyses of these novel ICE1 elements to exclude genes where A/CCACGT was preceded by G (the ABRE-containing subset). This was done by defining the new ICE1-related cis-elements as HACACGT and HCCACGT (where H is the bioinformatic representation of A or C or T), and we named these elements ICEr3 and ICEr4, respectively. Motif searching also uncovered an enrichment of the DRE cis-element in the 3 and 6 h ice1-affected gene promoter lists (Supplemental Table I), supporting the previous observation that the CBF TFs are ICE1 target genes and that ice1 mutation affects induction of the genes CBFs bind (i.e. DRE-containing genes). To test the suggested exclusion of the HOS9 and ZAT12 TFs from the ICE1-directed cascade (Fig. 1), we needed to identify and bioinformatically confirm the bioactivity of their enriched cis-elements. A search of the 138 available promoter sequences for the genes reported to be differentially induced in hos9 mutants in the cold versus wild-type plants by Zhu et al. (2004) identified ACGCG(T), which we named HOS9r1, as the potential HOS9-binding consensus favored in vivo, as judged by log likelihood score (Supplemental Fig. 1B). Similarly, Vogel and colleagues (2005) identified the CATTGA core, which we refer to as ZAT12r1, as common to many genes induced by both cold and ZAT12 overexpression. Motif searches of the CBF2 (Vogel et al., 2005) and CBF3 (Maruyama et al., 2004) regulon promoters by other researchers had previously identified the DRE (RCCGAC) and ABRE (GACACGT) as enriched in their regulons. Creating the Voter Registry: Assembly of MasterDB We next needed to create a database (MasterDB) that would cross-reference all available transcriptional response data for each gene with the presence or absence of the identified candidate cis-elements in each gene's promoter. We first assembled the normalized Affymetrix microarray data for the cold-signaling TF overexpressors/mutants ice1 (Chinnusamy et al., 2003; Lee et al., 2005), 35S:CBF2 (Vogel et al., 2005), 35S:ZAT12 (Vogel et al., 2005), hos9 (Zhu et al., 2004), and 35S:NAC072 (Tran et al., 2004) from the supplementary material provided for the publications listed with the additional standardized stress treatment time-course data for wild-type plants reported by the AtGenExpress project authors (http://www.weigelworld.org/resources/microarray/AtGenExpress/), as well as selected mutant and ABA treatment experiments available at the Nottingham Arabidopsis Science Center (NASC) arrays (http://affymetrix.arabidopsis.info/narrays/experimentbrowse.pl) in MasterDB. We intended to use MasterDB to determine whether our candidate cis-elements were bioactive by examining if their corresponding cis-regulons responded to cold in wild-type plants at a frequency greater than the chip-wide average. Our expectation was that the systematic selection of subpopulations of genes on the microarray using only genomic promoter data (i.e. cis-element content, not TF or cold responsiveness) would yield cis-regulons in which the distribution between the three bioinformatic states of induced, repressed, and nonresponsive was significantly different than the distribution of the whole array population. To perform this analysis, we needed to quantize all the transcriptomic fold-change comparison data in MasterDB into this set of discrete states. This allowed us to restrict our attention to the numerical relationships between the categories of genes with and without a particular cis-element in their 500 bp promoter and then mathematically examine their distribution using probability. Effectively, we wanted every gene on the genome chip to independently vote for its preferred transcriptional response to each particular treatment in question, be it induction, repression, or nonresponse. These responses were numerically represented as +1, −1, or 0, respectively, in MasterDB, and genes without corresponding expression data were designated numerically as 2 and omitted from further analyses. Induction/repression calls in MasterDB were assigned using fold-change cutoffs that varied on a sliding scale to account for the fact that for highly expressed genes smaller fold changes represent large changes in absolute transcript numbers (and presumably transcriptional activity), and correspondingly, seemingly large fold changes in sparsely expressed genes represent relatively small changes in absolute transcript numbers and transcriptional activity. The four-tiered sliding scale had set cutoffs of 10-fold for genes whose normalized gene expression levels (NGELs) across all experimental controls were less than 10, 3-fold for genes with NGELs between 10 and 50, 2-fold for genes with NGELs between 50 to 500, and 1.5-fold for genes with NGELs >500 in the Affymetrix DNA-chip system. This meant that the majority of genes (51% or 12,144 of 23,714 genes in the MasterDB) in our sliding scale-rated system were assessed using a 2-fold cutoff (Supplemental Table II). Tallying the Votes: Testing cis-Regulon Bioactivity and the Temporal Order of Regulon Induction/Repression in Wild-Type Plants With the construction of the voter registry complete, it was time to tally the votes for the various candidate cis-elements. The voter participation rate for our elections was high, with approximately 70% of all genes encoded by the Arabidopsis genome containing the cis-elements of interest present on the array. When we compared the voting behavior of the ICEr3, DRE, ICEr1, ZAT12r1, HOS9r1, ABRE, and ICEr4 cis-regulons with that of all other genes in MasterDB, statistical analysis revealed that the ICEr1 and ZAT12r1 cis-regulons were no more likely to be induced than a regulon of randomly selected genes from the arrays examined at any point during the cold treatment time course (χ2, P < 0.01; Fig. 2, F and G Figure 2. Open in new tabDownload slide Observed transcriptional response of genes containing the ICEr3 (HACACGT), ICEr4 (HCCACGT), ABRE (GACACGT), HOS9r1 (ACGCGT), DRE (RCCGAC), ICEr1 (CACATG), and ZAT12r1 (CATTGA) elements in their 500 bp promoters to cold (2°C) treatment (using microarray data from WeigelWorld AtGenExpress Database). The number of genes induced (left column) and repressed (right column) by the treatment (black bars) was compared to the expected frequency (white bars), based on the percentage of all present cold-responding genes on the chip. Statistically significant difference was assessed by χ2 test: ** = P < 0.01. A, ICEr3-containing induced genes. B, ICEr4-containing induced genes. C, ABRE-containing induced genes. D, HOS9r1-containing induced genes. E, DRE-containing induced genes. F, ICEr1-containing induced genes. G, ZAT12r1-containing induced genes. H, ICEr3-containing repressed genes. I, ICEr4-containing repressed genes. J, ABRE-containing repressed genes. K, HOS9r1-containing repressed genes. L, DRE-containing repressed genes. M, ICEr1-containing repressed genes. N, ZAT12r1-containing repressed genes. Figure 2. Open in new tabDownload slide Observed transcriptional response of genes containing the ICEr3 (HACACGT), ICEr4 (HCCACGT), ABRE (GACACGT), HOS9r1 (ACGCGT), DRE (RCCGAC), ICEr1 (CACATG), and ZAT12r1 (CATTGA) elements in their 500 bp promoters to cold (2°C) treatment (using microarray data from WeigelWorld AtGenExpress Database). The number of genes induced (left column) and repressed (right column) by the treatment (black bars) was compared to the expected frequency (white bars), based on the percentage of all present cold-responding genes on the chip. Statistically significant difference was assessed by χ2 test: ** = P < 0.01. A, ICEr3-containing induced genes. B, ICEr4-containing induced genes. C, ABRE-containing induced genes. D, HOS9r1-containing induced genes. E, DRE-containing induced genes. F, ICEr1-containing induced genes. G, ZAT12r1-containing induced genes. H, ICEr3-containing repressed genes. I, ICEr4-containing repressed genes. J, ABRE-containing repressed genes. K, HOS9r1-containing repressed genes. L, DRE-containing repressed genes. M, ICEr1-containing repressed genes. N, ZAT12r1-containing repressed genes. ). In contrast, the ICEr3 (Fig. 2A) and ABRE (Fig. 2C) cis-regulons were significantly more likely to be induced at 1 h (and subsequent time points of the cold treatment). This was followed by statistically significant ICEr4 and HOS9r1 cis-regulon induction at 3 h (Fig. 2, B and D), and DRE cis-regulon induction at 6 h (Fig. 2E). Examination of the cold repression of the various cis-regulons revealed a tendency for the ICEr3 (Fig. 2H) and HOS9r1 (Fig. 2K) cis-regulons to be repressed within the first 30 min of cold treatment. This preferential repression disappeared by 1 h for the HOS9r1 cis-regulon but continued for the ICEr3 cis-regulon until 3 h cold treatment. The 3 h time point was also the first interval at which significant repression of the ICEr1 cis-regulon was observed. The ICEr1 cis-regulon was also more likely to be repressed at 24 h of cold treatment. The temporal sequence of ICEr3, ICEr4, and DRE cis-regulon activity supported the conclusion that ICEr3 binds the ice1-affected transcriptional activator lying furthest upstream of the CBF TFs, while the lack of ICEr1 cis-regulon induction in wild-type plants argued against its role as a binding site for active ICE1. However, because of the sequence similarity between ICEr3 and the ABRE, and their similar cold responsiveness, we repeated the gene voting analysis used to produce Figure 2 over a wider variety of stresses and compared ABRE cis-regulon behavior with that of ICEr3 (as well as ICEr4 and ICEr1). ABRE-containing genes also responded to salt and mannitol in a manner similar to the ICEr3 cis-regulon (Table I Table I. Comparison of the stress inducibility of ICEr3-, ICEr4-, ABRE-, and ICEr1-containing genes χ2P values for induced gene group enrichment versus predicted gene group induction shown; the P < 0.01 level of significance after Bonferroni correction for multiple comparisons is indicated in bold, P > 0.05 in italic. Induced gene group reduction (versus expected) is indicated with an asterisk. cis-Regulon . Time . UVB . Methyl Viologen . Drought . Salt . Cold . Mannitol . Wounding . h ICEr3-containing genes 0.5 1.1E-06 9.0E-01 5.0E-11 6.8E-01 7.7E-01 1.4E-01 1.3E-11 1 9.6E-11 4.2E-01 1.1E-17 1.5E-01 4.3E-03 2.2E-09 6.9E-23 3 5.6E-05 5.4E-01 1.1E-09 2.8E-21 3.3E-09 4.4E-40 1.7E-08 6 5.2E-04 2.3E-01 4.2E-05 9.0E-22 3.1E-20 7.7E-52 9.4E-05 12 1.1E-01 1.6E-08 1.5E-08 4.6E-30 1.1E-14 2.1E-47 9.4E-05 24 7.9E-03 3.0E-07 3.3E-03 2.3E-26 4.8E-12 1.3E-37 8.7E-05 ICEr4-containing genes 0.5 1.5E-01 4.7E-01 1.0E-04 8.5E-02 6.1E-01 6.7E-01 1.2E-02 1 1.9E-03 4.6E-01 1.9E-07 6.5E-01 7.9E-02 3.8E-02 5.3E-07 3 4.2E-01 9.6E-01 3.9E-04 6.9E-11 8.6E-05 3.8E-16 3.0E-04 6 3.0E-03 2.1E-01 1.1E-02 1.1E-11 8.0E-09 2.6E-17 2.2E-01 12 6.5E-01 1.8E-02 3.5E-02 3.0E-11 2.9E-09 5.2E-17 2.2E-01 24 1.5E-01 1.0E-03 2.0E-01 6.1E-11 1.5E-12 5.7E-18 3.4E-01 ABRE-containing genes 0.5 2.0E-01 8.6E-01 3.3E-04 9.2E-01 5.9E-01 1.7E-01 1.4E-01 1 3.5E-03 3.7E-01 1.4E-09 3.2E-01 1.4E-03 2.9E-04 1.6E-05 3 9.1E-02 1.9E-01 2.2E-02 1.1E-36 9.0E-04 8.8E-46 5.0E-01 6 1.9E-03 6.8E-02 1.0E-01 1.7E-25 3.4E-09 2.1E-55 5.3E-01 12 9.6E-01 2.3E-01 1.0E+00 3.0E-25 2.3E-08 2.3E-46 5.3E-01 24 1.1E-01 1.8E-02 1.3E-01 9.6E-32 3.5E-19 7.8E-42 6.2E-01 ICEr1-containing genes 0.5 9.7E-02 2.9E-01 1.7E-02 5.8E-01 5.3E-02 3.1E-01 4.4E-01 1 9.7E-02 8.8E-01 4.5E-02 4.9E-01 1.8E-01 2.8E-01 6.1E-03 3 2.5E-01 9.0E-01 6.6E-02 7.1E-03 6.8E-01 2.2E-01 2.7E-02 6 1.5E-02* 3.2E-01 9.4E-01 1.6E-01 2.4E-01 3.5E-01 8.0E-01 12 1.0E-01 7.0E-01 1.3E-01 2.5E-01 2.0E-02 1.1E-01 8.0E-01 24 2.9E-01 1.7E-01 8.1E-02 2.1E-01 9.3E-01 2.0E-01 2.3E-01 cis-Regulon . Time . UVB . Methyl Viologen . Drought . Salt . Cold . Mannitol . Wounding . h ICEr3-containing genes 0.5 1.1E-06 9.0E-01 5.0E-11 6.8E-01 7.7E-01 1.4E-01 1.3E-11 1 9.6E-11 4.2E-01 1.1E-17 1.5E-01 4.3E-03 2.2E-09 6.9E-23 3 5.6E-05 5.4E-01 1.1E-09 2.8E-21 3.3E-09 4.4E-40 1.7E-08 6 5.2E-04 2.3E-01 4.2E-05 9.0E-22 3.1E-20 7.7E-52 9.4E-05 12 1.1E-01 1.6E-08 1.5E-08 4.6E-30 1.1E-14 2.1E-47 9.4E-05 24 7.9E-03 3.0E-07 3.3E-03 2.3E-26 4.8E-12 1.3E-37 8.7E-05 ICEr4-containing genes 0.5 1.5E-01 4.7E-01 1.0E-04 8.5E-02 6.1E-01 6.7E-01 1.2E-02 1 1.9E-03 4.6E-01 1.9E-07 6.5E-01 7.9E-02 3.8E-02 5.3E-07 3 4.2E-01 9.6E-01 3.9E-04 6.9E-11 8.6E-05 3.8E-16 3.0E-04 6 3.0E-03 2.1E-01 1.1E-02 1.1E-11 8.0E-09 2.6E-17 2.2E-01 12 6.5E-01 1.8E-02 3.5E-02 3.0E-11 2.9E-09 5.2E-17 2.2E-01 24 1.5E-01 1.0E-03 2.0E-01 6.1E-11 1.5E-12 5.7E-18 3.4E-01 ABRE-containing genes 0.5 2.0E-01 8.6E-01 3.3E-04 9.2E-01 5.9E-01 1.7E-01 1.4E-01 1 3.5E-03 3.7E-01 1.4E-09 3.2E-01 1.4E-03 2.9E-04 1.6E-05 3 9.1E-02 1.9E-01 2.2E-02 1.1E-36 9.0E-04 8.8E-46 5.0E-01 6 1.9E-03 6.8E-02 1.0E-01 1.7E-25 3.4E-09 2.1E-55 5.3E-01 12 9.6E-01 2.3E-01 1.0E+00 3.0E-25 2.3E-08 2.3E-46 5.3E-01 24 1.1E-01 1.8E-02 1.3E-01 9.6E-32 3.5E-19 7.8E-42 6.2E-01 ICEr1-containing genes 0.5 9.7E-02 2.9E-01 1.7E-02 5.8E-01 5.3E-02 3.1E-01 4.4E-01 1 9.7E-02 8.8E-01 4.5E-02 4.9E-01 1.8E-01 2.8E-01 6.1E-03 3 2.5E-01 9.0E-01 6.6E-02 7.1E-03 6.8E-01 2.2E-01 2.7E-02 6 1.5E-02* 3.2E-01 9.4E-01 1.6E-01 2.4E-01 3.5E-01 8.0E-01 12 1.0E-01 7.0E-01 1.3E-01 2.5E-01 2.0E-02 1.1E-01 8.0E-01 24 2.9E-01 1.7E-01 8.1E-02 2.1E-01 9.3E-01 2.0E-01 2.3E-01 Open in new tab Table I. Comparison of the stress inducibility of ICEr3-, ICEr4-, ABRE-, and ICEr1-containing genes χ2P values for induced gene group enrichment versus predicted gene group induction shown; the P < 0.01 level of significance after Bonferroni correction for multiple comparisons is indicated in bold, P > 0.05 in italic. Induced gene group reduction (versus expected) is indicated with an asterisk. cis-Regulon . Time . UVB . Methyl Viologen . Drought . Salt . Cold . Mannitol . Wounding . h ICEr3-containing genes 0.5 1.1E-06 9.0E-01 5.0E-11 6.8E-01 7.7E-01 1.4E-01 1.3E-11 1 9.6E-11 4.2E-01 1.1E-17 1.5E-01 4.3E-03 2.2E-09 6.9E-23 3 5.6E-05 5.4E-01 1.1E-09 2.8E-21 3.3E-09 4.4E-40 1.7E-08 6 5.2E-04 2.3E-01 4.2E-05 9.0E-22 3.1E-20 7.7E-52 9.4E-05 12 1.1E-01 1.6E-08 1.5E-08 4.6E-30 1.1E-14 2.1E-47 9.4E-05 24 7.9E-03 3.0E-07 3.3E-03 2.3E-26 4.8E-12 1.3E-37 8.7E-05 ICEr4-containing genes 0.5 1.5E-01 4.7E-01 1.0E-04 8.5E-02 6.1E-01 6.7E-01 1.2E-02 1 1.9E-03 4.6E-01 1.9E-07 6.5E-01 7.9E-02 3.8E-02 5.3E-07 3 4.2E-01 9.6E-01 3.9E-04 6.9E-11 8.6E-05 3.8E-16 3.0E-04 6 3.0E-03 2.1E-01 1.1E-02 1.1E-11 8.0E-09 2.6E-17 2.2E-01 12 6.5E-01 1.8E-02 3.5E-02 3.0E-11 2.9E-09 5.2E-17 2.2E-01 24 1.5E-01 1.0E-03 2.0E-01 6.1E-11 1.5E-12 5.7E-18 3.4E-01 ABRE-containing genes 0.5 2.0E-01 8.6E-01 3.3E-04 9.2E-01 5.9E-01 1.7E-01 1.4E-01 1 3.5E-03 3.7E-01 1.4E-09 3.2E-01 1.4E-03 2.9E-04 1.6E-05 3 9.1E-02 1.9E-01 2.2E-02 1.1E-36 9.0E-04 8.8E-46 5.0E-01 6 1.9E-03 6.8E-02 1.0E-01 1.7E-25 3.4E-09 2.1E-55 5.3E-01 12 9.6E-01 2.3E-01 1.0E+00 3.0E-25 2.3E-08 2.3E-46 5.3E-01 24 1.1E-01 1.8E-02 1.3E-01 9.6E-32 3.5E-19 7.8E-42 6.2E-01 ICEr1-containing genes 0.5 9.7E-02 2.9E-01 1.7E-02 5.8E-01 5.3E-02 3.1E-01 4.4E-01 1 9.7E-02 8.8E-01 4.5E-02 4.9E-01 1.8E-01 2.8E-01 6.1E-03 3 2.5E-01 9.0E-01 6.6E-02 7.1E-03 6.8E-01 2.2E-01 2.7E-02 6 1.5E-02* 3.2E-01 9.4E-01 1.6E-01 2.4E-01 3.5E-01 8.0E-01 12 1.0E-01 7.0E-01 1.3E-01 2.5E-01 2.0E-02 1.1E-01 8.0E-01 24 2.9E-01 1.7E-01 8.1E-02 2.1E-01 9.3E-01 2.0E-01 2.3E-01 cis-Regulon . Time . UVB . Methyl Viologen . Drought . Salt . Cold . Mannitol . Wounding . h ICEr3-containing genes 0.5 1.1E-06 9.0E-01 5.0E-11 6.8E-01 7.7E-01 1.4E-01 1.3E-11 1 9.6E-11 4.2E-01 1.1E-17 1.5E-01 4.3E-03 2.2E-09 6.9E-23 3 5.6E-05 5.4E-01 1.1E-09 2.8E-21 3.3E-09 4.4E-40 1.7E-08 6 5.2E-04 2.3E-01 4.2E-05 9.0E-22 3.1E-20 7.7E-52 9.4E-05 12 1.1E-01 1.6E-08 1.5E-08 4.6E-30 1.1E-14 2.1E-47 9.4E-05 24 7.9E-03 3.0E-07 3.3E-03 2.3E-26 4.8E-12 1.3E-37 8.7E-05 ICEr4-containing genes 0.5 1.5E-01 4.7E-01 1.0E-04 8.5E-02 6.1E-01 6.7E-01 1.2E-02 1 1.9E-03 4.6E-01 1.9E-07 6.5E-01 7.9E-02 3.8E-02 5.3E-07 3 4.2E-01 9.6E-01 3.9E-04 6.9E-11 8.6E-05 3.8E-16 3.0E-04 6 3.0E-03 2.1E-01 1.1E-02 1.1E-11 8.0E-09 2.6E-17 2.2E-01 12 6.5E-01 1.8E-02 3.5E-02 3.0E-11 2.9E-09 5.2E-17 2.2E-01 24 1.5E-01 1.0E-03 2.0E-01 6.1E-11 1.5E-12 5.7E-18 3.4E-01 ABRE-containing genes 0.5 2.0E-01 8.6E-01 3.3E-04 9.2E-01 5.9E-01 1.7E-01 1.4E-01 1 3.5E-03 3.7E-01 1.4E-09 3.2E-01 1.4E-03 2.9E-04 1.6E-05 3 9.1E-02 1.9E-01 2.2E-02 1.1E-36 9.0E-04 8.8E-46 5.0E-01 6 1.9E-03 6.8E-02 1.0E-01 1.7E-25 3.4E-09 2.1E-55 5.3E-01 12 9.6E-01 2.3E-01 1.0E+00 3.0E-25 2.3E-08 2.3E-46 5.3E-01 24 1.1E-01 1.8E-02 1.3E-01 9.6E-32 3.5E-19 7.8E-42 6.2E-01 ICEr1-containing genes 0.5 9.7E-02 2.9E-01 1.7E-02 5.8E-01 5.3E-02 3.1E-01 4.4E-01 1 9.7E-02 8.8E-01 4.5E-02 4.9E-01 1.8E-01 2.8E-01 6.1E-03 3 2.5E-01 9.0E-01 6.6E-02 7.1E-03 6.8E-01 2.2E-01 2.7E-02 6 1.5E-02* 3.2E-01 9.4E-01 1.6E-01 2.4E-01 3.5E-01 8.0E-01 12 1.0E-01 7.0E-01 1.3E-01 2.5E-01 2.0E-02 1.1E-01 8.0E-01 24 2.9E-01 1.7E-01 8.1E-02 2.1E-01 9.3E-01 2.0E-01 2.3E-01 Open in new tab ). To determine whether the ABRE and ICEr3 elements acted independently (and therefore, if ICEr3 represented a unique cis-element, and not just an ABRE variant), we ran a principal component analysis (PCA)-partial least squares analysis of the observed minus expected gene counts used to create Table I and showed clear separation of the ICEr3 from the ABRE, ICEr4, and ICEr1 elements (Supplemental Fig. 2). Furthermore, while ICEr4 and ABRE did not form distinct clusters in our PCA-partial least squares analysis, the differential cold inducibility of the ICEr4 cis-regulon versus the ABRE cis-regulon in the ice1 mutant background (see below, and Fig. 3, B and F Figure 3. Open in new tabDownload slide Observed transcriptional response of genes containing the ICEr3 (HACACGT), ICEr4 (HCCACGT), HOS9r1 (ACGCGT), DRE (RCCGAC), ICEr1 (CACATG), and ABRE (GACACGT) elements in their 500 bp promoters to cold (0°C) treatment in wild-type and ice1 Arabidopsis backgrounds (using microarray data published in Lee et al., 2005). The number of genes induced by the treatment was compared to the expected frequency, based on the percentage of all present cold-responding genes on the chip. Statistically significant difference was assessed by χ2 test: ** = P < 0.01. A, ICEr3-containing induced genes. B, ICEr4-containing induced genes. C, HOS9r1-containing induced genes. D, DRE-containing induced genes. E, ICEr1-containing induced genes. F, ABRE-containing induced genes. Figure 3. Open in new tabDownload slide Observed transcriptional response of genes containing the ICEr3 (HACACGT), ICEr4 (HCCACGT), HOS9r1 (ACGCGT), DRE (RCCGAC), ICEr1 (CACATG), and ABRE (GACACGT) elements in their 500 bp promoters to cold (0°C) treatment in wild-type and ice1 Arabidopsis backgrounds (using microarray data published in Lee et al., 2005). The number of genes induced by the treatment was compared to the expected frequency, based on the percentage of all present cold-responding genes on the chip. Statistically significant difference was assessed by χ2 test: ** = P < 0.01. A, ICEr3-containing induced genes. B, ICEr4-containing induced genes. C, HOS9r1-containing induced genes. D, DRE-containing induced genes. E, ICEr1-containing induced genes. F, ABRE-containing induced genes. ) after 3 and 6 h cold treatment led us to conclude that both the ICEr3 and ICEr4 sequences represented novel cis-elements. Assessing Regional Voting Trends: cis-Regulon Responsiveness in the ice1 Mutant Background After statistically associating single TFs with their corresponding cis-element(s), the next step in constructing a regulatory map is to determine which TFs are predicted to belong to the same transcriptional cascade, and how they are connected within that cascade. To determine which cis-regulons were affected by the ice1 mutation during cold stress (and therefore which regulons were ICE1 dependent), we repeated our analysis of cis-regulon cold inducibility in the ice1 mutant background (Fig. 3). The ice1 mutation had obvious effects on the cold inducibility of the DRE cis-regulon: While DRE cis-regulon inducibility was evident in wild-type plants at 6 and 24 h, regulon inducibility was negligible at 6 h in the ice1 plants (Fig. 3D). Similarly, the significant HOS9r1 cis-regulon induction at 3 and 6 h in wild-type plants was not mirrored in ice1 mutants at these time points (Fig. 3B). The ICEr4 cis-regulon also demonstrated delayed induction in ice1 mutants (at 6 h) versus wild-type plants (Fig. 3B). In contrast, the ICEr1 cis-regulon became significantly cold inducible in ice1 plants (as opposed to noninducible in wild-type plants) at both 3 and 24 h (Fig. 3C). The ICEr3 cis-regulon was cold responsive at 3, 6, and 24 h in both wild-type and ice1 plants, but the strength of the enrichment of ICEr3-containing genes in the total induced gene list was stronger in the wild-type than in ice1 mutant plants (Fig. 3A). For the ABRE cis-regulon, which was cold responsive at all three time points in wild-type plants, the ice1 mutation only affected regulon inducibility at 6 h. These results suggested that the TFs binding to the DRE, HOS9r1, ICEr4, ICEr1, ICEr3, and ABRE cis-elements were part of the ICE1-mediated TF cascade, and demonstrated the functional independence of the ICEr3, ICEr4, and ABRE elements. TF Regulon Overlap When examined in the proper temporal window, downstream TFs belonging to the same transcriptional cascade as upstream TFs should demonstrate statistically significant TF regulon overlap, because upstream TFs should induce a regulon of genes that includes both the downstream TFs and their respective regulons. χ2 comparisons between the 1,475 genes repressed at 3 h cold in the ice1 mutant (versus wild type) and the regulons induced or repressed by the TF or phytochrome mutants/overexpressors hos9_1, phyA, phyB, sfr6, CBF2-OX, and ZAT12-OX in MasterDB were performed to determine which TF regulons demonstrated statistically significant overlap (Table II Table II. ICE1 TF regulon overlap with selected low temperature signaling TFs χ2P values for induced/repressed gene group enrichment/reduction versus expected are shown; the P < 0.05 level of significance after Bonferroni correction is indicated in bold. For full comparisons, see Supplemental Table III. First Regulon . Second Regulon . Observed No. Genes . Expected No. Genes . χ2P Values . ice1 repressed (1475) at 3 h cold (versus wild type) hos9_1 induced 5 9 1.94E-01 hos9_1 repressed 9 2 4.96E-06 CBF2 induced 85 78 4.07E-01 CBF2 repressed 142 59 2.78E-28 phyB induced 107 235 6.89E-20 phyB repressed 86 145 2.48E-07 phyA induced 54 65 1.66E-01 phyA repressed 105 53 1.63E-13 phyA + FR induced 325 136 9.92E-65 phyA + FR repressed 86 73 1.33E-01 NAC072 induced 13 31 1.03 E-03 NAC072 repressed 79 65 8.03E-02 ice1 induced (2012) at 3 h cold (versus wild type) hos9_1 induced 41 12 6.77E-17 hos9_1 repressed 15 3 5.12E-12 CBF2 induced 307 106 4.14E-89 CBF2 repressed 139 80 2.78E-11 ZAT12 induced 200 55 9.36E-89 ZAT12 repressed 414 659 2.73E-31 phyB induced 156 321 9.08E-24 phyB repressed 82 198 4.33E-18 phyA induced 249 89 3.87E-68 phyA repressed 132 72 3.76E-13 phyA + FR induced 249 186 1.09E-06 phyA + FR repressed 256 100 2.15E-57 NAC072 induced 27 42 1.65E-02 NAC072 repressed 7 89 6.30E-19 First Regulon . Second Regulon . Observed No. Genes . Expected No. Genes . χ2P Values . ice1 repressed (1475) at 3 h cold (versus wild type) hos9_1 induced 5 9 1.94E-01 hos9_1 repressed 9 2 4.96E-06 CBF2 induced 85 78 4.07E-01 CBF2 repressed 142 59 2.78E-28 phyB induced 107 235 6.89E-20 phyB repressed 86 145 2.48E-07 phyA induced 54 65 1.66E-01 phyA repressed 105 53 1.63E-13 phyA + FR induced 325 136 9.92E-65 phyA + FR repressed 86 73 1.33E-01 NAC072 induced 13 31 1.03 E-03 NAC072 repressed 79 65 8.03E-02 ice1 induced (2012) at 3 h cold (versus wild type) hos9_1 induced 41 12 6.77E-17 hos9_1 repressed 15 3 5.12E-12 CBF2 induced 307 106 4.14E-89 CBF2 repressed 139 80 2.78E-11 ZAT12 induced 200 55 9.36E-89 ZAT12 repressed 414 659 2.73E-31 phyB induced 156 321 9.08E-24 phyB repressed 82 198 4.33E-18 phyA induced 249 89 3.87E-68 phyA repressed 132 72 3.76E-13 phyA + FR induced 249 186 1.09E-06 phyA + FR repressed 256 100 2.15E-57 NAC072 induced 27 42 1.65E-02 NAC072 repressed 7 89 6.30E-19 Open in new tab Table II. ICE1 TF regulon overlap with selected low temperature signaling TFs χ2P values for induced/repressed gene group enrichment/reduction versus expected are shown; the P < 0.05 level of significance after Bonferroni correction is indicated in bold. For full comparisons, see Supplemental Table III. First Regulon . Second Regulon . Observed No. Genes . Expected No. Genes . χ2P Values . ice1 repressed (1475) at 3 h cold (versus wild type) hos9_1 induced 5 9 1.94E-01 hos9_1 repressed 9 2 4.96E-06 CBF2 induced 85 78 4.07E-01 CBF2 repressed 142 59 2.78E-28 phyB induced 107 235 6.89E-20 phyB repressed 86 145 2.48E-07 phyA induced 54 65 1.66E-01 phyA repressed 105 53 1.63E-13 phyA + FR induced 325 136 9.92E-65 phyA + FR repressed 86 73 1.33E-01 NAC072 induced 13 31 1.03 E-03 NAC072 repressed 79 65 8.03E-02 ice1 induced (2012) at 3 h cold (versus wild type) hos9_1 induced 41 12 6.77E-17 hos9_1 repressed 15 3 5.12E-12 CBF2 induced 307 106 4.14E-89 CBF2 repressed 139 80 2.78E-11 ZAT12 induced 200 55 9.36E-89 ZAT12 repressed 414 659 2.73E-31 phyB induced 156 321 9.08E-24 phyB repressed 82 198 4.33E-18 phyA induced 249 89 3.87E-68 phyA repressed 132 72 3.76E-13 phyA + FR induced 249 186 1.09E-06 phyA + FR repressed 256 100 2.15E-57 NAC072 induced 27 42 1.65E-02 NAC072 repressed 7 89 6.30E-19 First Regulon . Second Regulon . Observed No. Genes . Expected No. Genes . χ2P Values . ice1 repressed (1475) at 3 h cold (versus wild type) hos9_1 induced 5 9 1.94E-01 hos9_1 repressed 9 2 4.96E-06 CBF2 induced 85 78 4.07E-01 CBF2 repressed 142 59 2.78E-28 phyB induced 107 235 6.89E-20 phyB repressed 86 145 2.48E-07 phyA induced 54 65 1.66E-01 phyA repressed 105 53 1.63E-13 phyA + FR induced 325 136 9.92E-65 phyA + FR repressed 86 73 1.33E-01 NAC072 induced 13 31 1.03 E-03 NAC072 repressed 79 65 8.03E-02 ice1 induced (2012) at 3 h cold (versus wild type) hos9_1 induced 41 12 6.77E-17 hos9_1 repressed 15 3 5.12E-12 CBF2 induced 307 106 4.14E-89 CBF2 repressed 139 80 2.78E-11 ZAT12 induced 200 55 9.36E-89 ZAT12 repressed 414 659 2.73E-31 phyB induced 156 321 9.08E-24 phyB repressed 82 198 4.33E-18 phyA induced 249 89 3.87E-68 phyA repressed 132 72 3.76E-13 phyA + FR induced 249 186 1.09E-06 phyA + FR repressed 256 100 2.15E-57 NAC072 induced 27 42 1.65E-02 NAC072 repressed 7 89 6.30E-19 Open in new tab TF Regulon cis-Element Enrichment While inclusive motif sampling of TF regulons uncovered the most highly overrepresented cis-elements in the individual TF regulons, we wanted to expand our TF regulon analysis to include all of our candidate cis-elements (Table III Table III. Cis-element frequency in cold-signaling induced and repressed TF regulons Genomic frequency in 500 bp promoters is shown in parentheses. Bold text represents a statistically significant (χ2P < 0.05) difference between the genomic and regulon cis-element frequencies after Bonferroni correction for multiple comparisons. Regulon . No. Genes . HOS9r1 (6.5) . ICEr3 (14.2) . ICEr1 (21.3) . DRE (13.9) . ABRE (6.1) . ZAT12 (29.6) . PhyAr1 (8.5) . ICEr4 (10.0) . % % % % % % % % Wild type induced 3 h 1,429 6.2 20.5 22.3 14.6 9.1 27.8 10.4 13.1 Wild type repressed 3 h 157 19.1 19.1 21.7 8.3 5.1 31.2 7.0 10.8 hos9 induced 138 19.6 17.4 15.9 15.2 6.5 29.7 7.2 14.5 ice1 induced 3 h versus wild type 2,012 3.9 16.0 22.0 11.9 7.4 28.9 7.3 7.3 ice1 repressed 3 h versus wild type 1,475 3.3 15.5 25.6 13.9 9.6 27.1 10.2 12.1 ice1 induced + wild-type cold repressed 72 1.4 27.8 20.8 6.9 4.2 29.2 12.5 15.3 ice1 repressed + wild-type cold induced 190 6.3 27.4 25.3 21.1 14.2 22.1 15.8 17.4 CBF2 induced 1,098 2.8 17.5 21.6 23.2 7.9 27.4 9.4 12.2 CBF2 repressed 836 3.7 15.0 25.1 10.2 5.9 26.0 5.4 11.6 ZAT12 induced 577 3.8 13.9 19.6 13.2 6.9 28.2 6.8 10.2 ZAT12 repressed 6,962 3.8 13.4 17.7 14.4 7.0 27.3 8.3 10.1 sfr6 induced (light) 1,323 2.6 14.2 19.8 13.5 7.5 27.1 8.8 10.7 sfr6 repressed (light) 892 3.9 15.8 20.4 13.2 7.5 26.1 7.3 9.3 sfr6 induced (dark) 1,349 2.5 14.3 20.6 13.7 7.1 28.5 8.1 9.4 sfr6 repressed (dark) 838 2.6 14.8 20.5 15.5 5.0 25.1 6.9 9.8 NAC72 induced 101 2.0 23.8 17.8 14.9 7.9 21.8 11.9 17.8 NAC72 repressed 212 3.8 19.8 24.5 14.6 9.0 22.6 11.3 13.7 phyA induced 943 2.8 15.0 20.5 12.7 5.5 29.4 5.1 11.2 phyA repressed 763 3.8 16.8 22.8 15.7 7.6 26.3 8.7 12.5 phyB induced 1,161 2.2 13.5 22.4 13.9 5.1 26.6 5.9 9.0 phyB repressed 715 3.5 19.9 18.3 13.6 11.2 26.0 12.3 11.0 phyA + FR induced 1,976 2.2 14.9 25.5 12.9 6.9 27.4 7.1 10.1 phyA + FR repressed 1,066 4.1 17.8 17.8 16.0 7.6 25.6 9.0 13.5 Dusk induced 117 1.7 3.4 19.7 4.3 1.7 26.5 0.9 6.8 Dusk repressed 198 3.0 24.7 27.3 13.1 13.6 22.7 17.7 15.2 Regulon . No. Genes . HOS9r1 (6.5) . ICEr3 (14.2) . ICEr1 (21.3) . DRE (13.9) . ABRE (6.1) . ZAT12 (29.6) . PhyAr1 (8.5) . ICEr4 (10.0) . % % % % % % % % Wild type induced 3 h 1,429 6.2 20.5 22.3 14.6 9.1 27.8 10.4 13.1 Wild type repressed 3 h 157 19.1 19.1 21.7 8.3 5.1 31.2 7.0 10.8 hos9 induced 138 19.6 17.4 15.9 15.2 6.5 29.7 7.2 14.5 ice1 induced 3 h versus wild type 2,012 3.9 16.0 22.0 11.9 7.4 28.9 7.3 7.3 ice1 repressed 3 h versus wild type 1,475 3.3 15.5 25.6 13.9 9.6 27.1 10.2 12.1 ice1 induced + wild-type cold repressed 72 1.4 27.8 20.8 6.9 4.2 29.2 12.5 15.3 ice1 repressed + wild-type cold induced 190 6.3 27.4 25.3 21.1 14.2 22.1 15.8 17.4 CBF2 induced 1,098 2.8 17.5 21.6 23.2 7.9 27.4 9.4 12.2 CBF2 repressed 836 3.7 15.0 25.1 10.2 5.9 26.0 5.4 11.6 ZAT12 induced 577 3.8 13.9 19.6 13.2 6.9 28.2 6.8 10.2 ZAT12 repressed 6,962 3.8 13.4 17.7 14.4 7.0 27.3 8.3 10.1 sfr6 induced (light) 1,323 2.6 14.2 19.8 13.5 7.5 27.1 8.8 10.7 sfr6 repressed (light) 892 3.9 15.8 20.4 13.2 7.5 26.1 7.3 9.3 sfr6 induced (dark) 1,349 2.5 14.3 20.6 13.7 7.1 28.5 8.1 9.4 sfr6 repressed (dark) 838 2.6 14.8 20.5 15.5 5.0 25.1 6.9 9.8 NAC72 induced 101 2.0 23.8 17.8 14.9 7.9 21.8 11.9 17.8 NAC72 repressed 212 3.8 19.8 24.5 14.6 9.0 22.6 11.3 13.7 phyA induced 943 2.8 15.0 20.5 12.7 5.5 29.4 5.1 11.2 phyA repressed 763 3.8 16.8 22.8 15.7 7.6 26.3 8.7 12.5 phyB induced 1,161 2.2 13.5 22.4 13.9 5.1 26.6 5.9 9.0 phyB repressed 715 3.5 19.9 18.3 13.6 11.2 26.0 12.3 11.0 phyA + FR induced 1,976 2.2 14.9 25.5 12.9 6.9 27.4 7.1 10.1 phyA + FR repressed 1,066 4.1 17.8 17.8 16.0 7.6 25.6 9.0 13.5 Dusk induced 117 1.7 3.4 19.7 4.3 1.7 26.5 0.9 6.8 Dusk repressed 198 3.0 24.7 27.3 13.1 13.6 22.7 17.7 15.2 Open in new tab Table III. Cis-element frequency in cold-signaling induced and repressed TF regulons Genomic frequency in 500 bp promoters is shown in parentheses. Bold text represents a statistically significant (χ2P < 0.05) difference between the genomic and regulon cis-element frequencies after Bonferroni correction for multiple comparisons. Regulon . No. Genes . HOS9r1 (6.5) . ICEr3 (14.2) . ICEr1 (21.3) . DRE (13.9) . ABRE (6.1) . ZAT12 (29.6) . PhyAr1 (8.5) . ICEr4 (10.0) . % % % % % % % % Wild type induced 3 h 1,429 6.2 20.5 22.3 14.6 9.1 27.8 10.4 13.1 Wild type repressed 3 h 157 19.1 19.1 21.7 8.3 5.1 31.2 7.0 10.8 hos9 induced 138 19.6 17.4 15.9 15.2 6.5 29.7 7.2 14.5 ice1 induced 3 h versus wild type 2,012 3.9 16.0 22.0 11.9 7.4 28.9 7.3 7.3 ice1 repressed 3 h versus wild type 1,475 3.3 15.5 25.6 13.9 9.6 27.1 10.2 12.1 ice1 induced + wild-type cold repressed 72 1.4 27.8 20.8 6.9 4.2 29.2 12.5 15.3 ice1 repressed + wild-type cold induced 190 6.3 27.4 25.3 21.1 14.2 22.1 15.8 17.4 CBF2 induced 1,098 2.8 17.5 21.6 23.2 7.9 27.4 9.4 12.2 CBF2 repressed 836 3.7 15.0 25.1 10.2 5.9 26.0 5.4 11.6 ZAT12 induced 577 3.8 13.9 19.6 13.2 6.9 28.2 6.8 10.2 ZAT12 repressed 6,962 3.8 13.4 17.7 14.4 7.0 27.3 8.3 10.1 sfr6 induced (light) 1,323 2.6 14.2 19.8 13.5 7.5 27.1 8.8 10.7 sfr6 repressed (light) 892 3.9 15.8 20.4 13.2 7.5 26.1 7.3 9.3 sfr6 induced (dark) 1,349 2.5 14.3 20.6 13.7 7.1 28.5 8.1 9.4 sfr6 repressed (dark) 838 2.6 14.8 20.5 15.5 5.0 25.1 6.9 9.8 NAC72 induced 101 2.0 23.8 17.8 14.9 7.9 21.8 11.9 17.8 NAC72 repressed 212 3.8 19.8 24.5 14.6 9.0 22.6 11.3 13.7 phyA induced 943 2.8 15.0 20.5 12.7 5.5 29.4 5.1 11.2 phyA repressed 763 3.8 16.8 22.8 15.7 7.6 26.3 8.7 12.5 phyB induced 1,161 2.2 13.5 22.4 13.9 5.1 26.6 5.9 9.0 phyB repressed 715 3.5 19.9 18.3 13.6 11.2 26.0 12.3 11.0 phyA + FR induced 1,976 2.2 14.9 25.5 12.9 6.9 27.4 7.1 10.1 phyA + FR repressed 1,066 4.1 17.8 17.8 16.0 7.6 25.6 9.0 13.5 Dusk induced 117 1.7 3.4 19.7 4.3 1.7 26.5 0.9 6.8 Dusk repressed 198 3.0 24.7 27.3 13.1 13.6 22.7 17.7 15.2 Regulon . No. Genes . HOS9r1 (6.5) . ICEr3 (14.2) . ICEr1 (21.3) . DRE (13.9) . ABRE (6.1) . ZAT12 (29.6) . PhyAr1 (8.5) . ICEr4 (10.0) . % % % % % % % % Wild type induced 3 h 1,429 6.2 20.5 22.3 14.6 9.1 27.8 10.4 13.1 Wild type repressed 3 h 157 19.1 19.1 21.7 8.3 5.1 31.2 7.0 10.8 hos9 induced 138 19.6 17.4 15.9 15.2 6.5 29.7 7.2 14.5 ice1 induced 3 h versus wild type 2,012 3.9 16.0 22.0 11.9 7.4 28.9 7.3 7.3 ice1 repressed 3 h versus wild type 1,475 3.3 15.5 25.6 13.9 9.6 27.1 10.2 12.1 ice1 induced + wild-type cold repressed 72 1.4 27.8 20.8 6.9 4.2 29.2 12.5 15.3 ice1 repressed + wild-type cold induced 190 6.3 27.4 25.3 21.1 14.2 22.1 15.8 17.4 CBF2 induced 1,098 2.8 17.5 21.6 23.2 7.9 27.4 9.4 12.2 CBF2 repressed 836 3.7 15.0 25.1 10.2 5.9 26.0 5.4 11.6 ZAT12 induced 577 3.8 13.9 19.6 13.2 6.9 28.2 6.8 10.2 ZAT12 repressed 6,962 3.8 13.4 17.7 14.4 7.0 27.3 8.3 10.1 sfr6 induced (light) 1,323 2.6 14.2 19.8 13.5 7.5 27.1 8.8 10.7 sfr6 repressed (light) 892 3.9 15.8 20.4 13.2 7.5 26.1 7.3 9.3 sfr6 induced (dark) 1,349 2.5 14.3 20.6 13.7 7.1 28.5 8.1 9.4 sfr6 repressed (dark) 838 2.6 14.8 20.5 15.5 5.0 25.1 6.9 9.8 NAC72 induced 101 2.0 23.8 17.8 14.9 7.9 21.8 11.9 17.8 NAC72 repressed 212 3.8 19.8 24.5 14.6 9.0 22.6 11.3 13.7 phyA induced 943 2.8 15.0 20.5 12.7 5.5 29.4 5.1 11.2 phyA repressed 763 3.8 16.8 22.8 15.7 7.6 26.3 8.7 12.5 phyB induced 1,161 2.2 13.5 22.4 13.9 5.1 26.6 5.9 9.0 phyB repressed 715 3.5 19.9 18.3 13.6 11.2 26.0 12.3 11.0 phyA + FR induced 1,976 2.2 14.9 25.5 12.9 6.9 27.4 7.1 10.1 phyA + FR repressed 1,066 4.1 17.8 17.8 16.0 7.6 25.6 9.0 13.5 Dusk induced 117 1.7 3.4 19.7 4.3 1.7 26.5 0.9 6.8 Dusk repressed 198 3.0 24.7 27.3 13.1 13.6 22.7 17.7 15.2 Open in new tab ; note that regulon sizes differ from those reported by the original authors because of the sliding-scale cutoff used in MasterDB). The NAC072-OX induced and repressed TF regulons were included as negative controls for this overlap analysis, since NAC072 is strongly down-regulated during the first 5 h of cold treatment (Tran et al., 2004). The ice1 cold-repressed (normally ICE1 cold induced) genes were found to be more likely to be repressed in hos9_1 mutants (i.e. normally HOS9 induced/derepressed), and to be repressed in 35S∷CBF2-OX plants (Table II), suggesting regulatory connection between the ICE1, HOS9, and CBF2 TFs. Furthermore, a statistically significant fraction of the genes belonging to the ice1 cold-repressed regulon were also misregulated in phyA mutants (versus wild type) preconditioned with far-red (FR) light. In contrast, the genes misregulated in phyB mutants (versus wild type) were underrepresented in the ice1 cold repressed regulon. These data indicated cross talk and/or a regulatory connection between the PHYA and ICE1 signaling cascades that were lacking for PHYB. As expected, the NAC072 induced regulon was underrepresented in the ice1 cold-repressed regulon. Analysis of the ice1 cold-induced (normally ICE1 cold repressed) regulon expanded the list of statistically significant ICE1 coregulatory relationships to include the HOS9 induced/repressed regulons, the CBF2 induced/repressed regulons, the ZAT12 induced regulon, the PHYA induced/repressed regulons, and the PHYA + FR induced/repressed regulons. Similar analysis of the regulon overlaps for TFs reported to be downstream of ICE1 (see Supplemental Table III) also showed a strong enrichment for PHYA induced/repressed genes. As reported previously by Vogel et al. (2005), using a uniform 2.5-fold-change cutoff, the CBF2 induced and ZAT12 induced regulons also showed statistically significant overlap (Supplemental Table III). Contrary to previous reports (Knight et al., 1999; Boyce et al., 2003), the sfr6 induced and repressed regulons lacked significant overlap with both the CBF2 regulon and the other investigated TF regulons (Supplemental Table III). Considering the substantial experimental evidence for sfr6 effects on DRE inducibility (Knight et al., 1999; Boyce et al., 2003), we interpreted this as an indication that the SFR6 TF regulons examined were either temporally too far removed to demonstrate overlap, or that the sfr6 mutational effect is low temperature dependent.) to support or refute regulatory connections already indicated by earlier bioinformatic analyses. Consistent with the proposed binding of HOS9 to HOS9r1, the HOS9-repressed (hos9 induced) TF regulon was enriched for the HOS9r1 cis-element versus the genomic average but no other cold-responsive element was significantly enriched in this regulon. Similarly, the CBF2 induced TF regulon was enriched for the DRE cis-element as well as ICEr3 and ABRE. The ZAT12 repressed TF regulon was also enriched for the ABRE but not ZAT12r1. ZAT12r1 was also not enriched in the ZAT12 induced TF regulon, once again indicating that ZAT12r1 is unlikely to be the cis-element that ZAT12 binds. While the 3 h ICE1 induced (ice1 repressed) regulon was enriched for ICEr1, ICEr4, ABRE, and PhyAr1 versus their respective genomic average frequencies, the 3 h ICE1 repressed (ice1 induced) TF regulon was enriched for ICEr3 and ABRE. We found the lack of DRE enrichment in the 3 h ICE1-regulated regulons surprising in light of the fact that the ice1 mutation affects the 3 h cold induction of CBF1 and CBF3 (Chinnusamy et al., 2003) and the fact that the DRE was identified by the original inclusive search of the ice1 3 h repressed regulon (Supplemental Table I). The expected enrichment of DRE and ICEr3 was observed only when the ICE1 induced and repressed TF regulons were filtered to leave only genes induced by cold in wild type at 3 h (versus wild-type control). Nevertheless, the presence of the ICEr4 consensus variant in the 3 h ice1 repressed (ICE1 induced) list before (and after) filtering suggested that it is the ICEr variant most strongly affected by the ice1 mutation. The NAC072 repressed TF regulon narrowly missed the P < 0.05 significance level for ICEr1 enrichment but this was likely due to the small number of genes in this regulon. Investigation of the SFR6 induced and repressed TF regulons in both the light and dark showed no significant enrichment of any of the cis-elements except the PhyAr1 in the SFR6 repressed (in light) TF regulons. Because reports have identified variable inducibility of DRE-containing genes depending on time of day (Kim et al., 2002; Fowler et al., 2005), we examined the cis-element enrichment in genes responsive to phytochrome A and B mutation as well as the PHYA-dependent FR-conditioned gene expression and the groups of genes differentially induced and repressed at dusk (in a circadian time course) versus expression throughout the rest of the day. We observed a statistically significant reduction in the frequency of the HOS9r1, ICEr3, DRE, ABRE, and PhyAr1 cis-elements in the group of genes induced at dusk. In contrast, ICEr3, ICEr1, ABRE, ICEr4, and PhyAr1 cis-elements were enriched in the promoters of genes repressed at dusk. These data support our hypothesis that time of day affects DRE-cis-regulon inducibility through phytochrome A repression of the ICE1-mediated transcriptional cascade (which includes the DRE-binding CBF TFs). In Silico Mutagenesis The most conclusive piece of evidence establishing signaling connectivity is the presence of an upstream TF's preferred cis-element in the 500 bp promoter of a candidate downstream TF. However, TF binding to cis-elements is often promiscuous, with several permitted consensus variants capable of binding the same TF. We therefore used in silico mutagenesis (Geisler et al., 2006) to compare the behavior of the ICEr3 cis-regulon with the behavior of the populations of genes naturally containing different single-mismatch versions of the ICEr3 consensus in their 500 bp promoter (Fig. 4A Figure 4. Open in new tabDownload slide In silico mutagenesis of the ICEr3 (HACACGT)-, HOS9r1 (ACGCGT)-, ICEr1 (CACATG)-, and DRE (RCCGAC)-binding consensus sequences. Identity and position of each nucleic acid mutation in the corresponding consensus sequence are indicated. The number of genes induced or repressed in the genetic background/treatment indicated (black bars) was compared to the expected frequency (white bars), based on the percentage of all present and responding genes on the chip. Statistically significant difference was assessed by χ2 test: * = P < 0.05, ** = P < 0.01. A, ICEr3-containing genes repressed in 6 h cold-treated ice1 plants (versus wild type). B, HOS9r1-containing genes repressed in 24 h cold-treated hos9-1 plants (versus wild type). C, ICEr1-containing genes repressed in 6 h cold-treated ice1 plants (versus wild type). D, DRE-containing genes induced in 35S:CBF2 overexpressors at growth temperatures (versus wild type). Figure 4. Open in new tabDownload slide In silico mutagenesis of the ICEr3 (HACACGT)-, HOS9r1 (ACGCGT)-, ICEr1 (CACATG)-, and DRE (RCCGAC)-binding consensus sequences. Identity and position of each nucleic acid mutation in the corresponding consensus sequence are indicated. The number of genes induced or repressed in the genetic background/treatment indicated (black bars) was compared to the expected frequency (white bars), based on the percentage of all present and responding genes on the chip. Statistically significant difference was assessed by χ2 test: * = P < 0.05, ** = P < 0.01. A, ICEr3-containing genes repressed in 6 h cold-treated ice1 plants (versus wild type). B, HOS9r1-containing genes repressed in 24 h cold-treated hos9-1 plants (versus wild type). C, ICEr1-containing genes repressed in 6 h cold-treated ice1 plants (versus wild type). D, DRE-containing genes induced in 35S:CBF2 overexpressors at growth temperatures (versus wild type). ). Single-base mutations of ICEr3 at positions 2 (A to C, yielding the ICEr4 element), 5 (C to A or T), and 6 (G to A) did not result in a loss of significant cis-regulon cold repression in the ice1 mutant. All 12 possible combinations of these permitted positional variations were then examined to determine whether their cis-regulons were preferentially induced during cold, ABA, salt, and dehydration treatments in wild-type plants (Table IV Table IV. Frequency and enrichment (versus frequency predicted by chance) of stress-responsive ICE1 and HOS9 consensus variants in the 500 bp promoters of all genes in the Arabidopsis genome χ2P values for induced gene group enrichment versus predicted gene group frequency for stress and hormonal treatments shown. Bold indicates P < 0.05 after Bonferroni correction for multiple comparisons. TF . Element . No. Genes . Element Probability . Expected No. Genes . Genomic Enrichment . Cold 3 h . ABA 3 h . Salt 6 h . Dehydration 0.5 h . ICE1 HACACGT 4,014 1.4E-04 3,921 1.02 4.3E-09 6.0E-42 4.1E-22 1.9E-10 HCCACGT 2,805 6.4E-05 1,910 1.47 9.9E-05 3.6E-15 3.6E-12 1.3E-04 HACAAGT 8,861 2.9E-04 7,750 1.14 7.3E-03 6.3E-02 7.3E-01 8.4E-02 HOS9 ACGCGT 2,038 7.6E-05 2,261 0.90 1.3E-22 2.4E-04 8.8E-01 6.8E-06 CCGCGT 1,139 3.6E-05 1,085 1.05 6.9E-21 3.0E-05 8.1E-01 1.1E-05 ACGCGC 1,273 3.6E-05 1,085 1.17 2.3E-07 4.8E-01 2.5E-01 9.8E-02 TF . Element . No. Genes . Element Probability . Expected No. Genes . Genomic Enrichment . Cold 3 h . ABA 3 h . Salt 6 h . Dehydration 0.5 h . ICE1 HACACGT 4,014 1.4E-04 3,921 1.02 4.3E-09 6.0E-42 4.1E-22 1.9E-10 HCCACGT 2,805 6.4E-05 1,910 1.47 9.9E-05 3.6E-15 3.6E-12 1.3E-04 HACAAGT 8,861 2.9E-04 7,750 1.14 7.3E-03 6.3E-02 7.3E-01 8.4E-02 HOS9 ACGCGT 2,038 7.6E-05 2,261 0.90 1.3E-22 2.4E-04 8.8E-01 6.8E-06 CCGCGT 1,139 3.6E-05 1,085 1.05 6.9E-21 3.0E-05 8.1E-01 1.1E-05 ACGCGC 1,273 3.6E-05 1,085 1.17 2.3E-07 4.8E-01 2.5E-01 9.8E-02 Open in new tab Table IV. Frequency and enrichment (versus frequency predicted by chance) of stress-responsive ICE1 and HOS9 consensus variants in the 500 bp promoters of all genes in the Arabidopsis genome χ2P values for induced gene group enrichment versus predicted gene group frequency for stress and hormonal treatments shown. Bold indicates P < 0.05 after Bonferroni correction for multiple comparisons. TF . Element . No. Genes . Element Probability . Expected No. Genes . Genomic Enrichment . Cold 3 h . ABA 3 h . Salt 6 h . Dehydration 0.5 h . ICE1 HACACGT 4,014 1.4E-04 3,921 1.02 4.3E-09 6.0E-42 4.1E-22 1.9E-10 HCCACGT 2,805 6.4E-05 1,910 1.47 9.9E-05 3.6E-15 3.6E-12 1.3E-04 HACAAGT 8,861 2.9E-04 7,750 1.14 7.3E-03 6.3E-02 7.3E-01 8.4E-02 HOS9 ACGCGT 2,038 7.6E-05 2,261 0.90 1.3E-22 2.4E-04 8.8E-01 6.8E-06 CCGCGT 1,139 3.6E-05 1,085 1.05 6.9E-21 3.0E-05 8.1E-01 1.1E-05 ACGCGC 1,273 3.6E-05 1,085 1.17 2.3E-07 4.8E-01 2.5E-01 9.8E-02 TF . Element . No. Genes . Element Probability . Expected No. Genes . Genomic Enrichment . Cold 3 h . ABA 3 h . Salt 6 h . Dehydration 0.5 h . ICE1 HACACGT 4,014 1.4E-04 3,921 1.02 4.3E-09 6.0E-42 4.1E-22 1.9E-10 HCCACGT 2,805 6.4E-05 1,910 1.47 9.9E-05 3.6E-15 3.6E-12 1.3E-04 HACAAGT 8,861 2.9E-04 7,750 1.14 7.3E-03 6.3E-02 7.3E-01 8.4E-02 HOS9 ACGCGT 2,038 7.6E-05 2,261 0.90 1.3E-22 2.4E-04 8.8E-01 6.8E-06 CCGCGT 1,139 3.6E-05 1,085 1.05 6.9E-21 3.0E-05 8.1E-01 1.1E-05 ACGCGC 1,273 3.6E-05 1,085 1.17 2.3E-07 4.8E-01 2.5E-01 9.8E-02 Open in new tab , only bioactive variants shown). This analysis revealed that the reported ICE1 induction detected by northern blotting at 3 h cold, 3 h ABA (100 μm), and 5 h salt (Chinnusamy et al., 2003) corresponded to the preferential induction of genes containing the consensus HACACGT (ICEr3) and HCCACGT (ICEr4) by 3 h cold, 3 h ABA (100 μm), and 6 h salt. Genes containing these two consensus variants in their 500 bp promoters were all also preferentially induced on Affymetrix microarrays by a 30 min dehydration treatment. Surprisingly, the permitted mutation of the ICEr3 element from G to A at position 6 in ice1 mutant plants (Fig. 4A) did not result in any ICE element variant that was functional during these treatments in wild-type plants, even when combined with all other permitted mutations at positions 2 and 5. This observation was interesting in light of the fact that such a mutation creates an ICEr1-like element [i.e. HACACAT(G)] known to be bound by both mutant and wild-type forms of the ICE1 protein (Chinnusamy et al., 2003). Similar in silico mutagenesis of the palindromic HOS9r1 consensus, using cis-regulon induction in the hos9 genetic background as the baseline, revealed that HOS9r1 could vary at positions 1 (A, C, or G) or 3 (G or T) without loss of cis-regulon induction (Fig. 4B). The cold, ABA, and salt responsiveness of these consensus sequences were also examined (Table IV) and unlike ICEr3, neither HOS9r1 nor its variant CCGCGT (that we named HOS9r2) was salt responsive. In silico mutagenesis of ICEr1 using cold repression in ice1 mutants as the reference showed that the bioactivity of ICEr1 (Fig. 4C) was unaffected by mutations at positions 4 (A, G, or T), 5 (T or A), or 6 (G or A). DRE mutagenesis reported previously in cold-stressed wild-type plants (Geisler et al., 2006) was repeated in the CBF2 overexpressor background, confirming previous observations of the functionality of the A/GCCGAC variants (Fig. 4D). Using TF Promoter Data to Create a Map of Cold-Stress Regulation With these bioinformatic analyses complete, it was possible to revise the current model of the ICE1 transcriptional cascade (Fig. 1), this time incorporating both the results of our bioinformatic analyses and TF promoter element searches that considered cis-element variants identified by in silico mutagenesis (Fig. 5 Figure 5. Open in new tabDownload slide Schematic representation of low temperature signal transduction. Solid arrows represent connections with at least two pieces of bioinformatic support (triangular arrowhead represents gene activation, flattened arrowhead represents gene repression). Dashed arrows represent regulatory connections with experimental/bioinformatic support but accomplished by unknown (and possibly posttranscriptional) mechanisms. Boxes represent cis-elements within the 500 bp promoters of the genes listed (ICE1, AT3G26744; ZAT12, AT5G59820; HOS9, AT2G01500; CBF2, AT4G25470; NAC072, AT4G27410; CBF1, AT4G25490; and CBF3, AT4G25480) and are color coded according to cis-element identity (red box, ICEr3; orange box, ICEr4; blue box, DRE; green box, ICEr1; purple box, I HOS9r1; and yellow box, EP2). Protein phosphorylation events, where known, are indicated by circles containing “P” attached to the protein TFs. Unknown types of posttranslational modification are indicated by circles containing “?” attached to the protein TFs. The ICEr4 element present in the CBF3 promoter contains the CCACGT core, but lacks an “H” in the first position. Figure 5. Open in new tabDownload slide Schematic representation of low temperature signal transduction. Solid arrows represent connections with at least two pieces of bioinformatic support (triangular arrowhead represents gene activation, flattened arrowhead represents gene repression). Dashed arrows represent regulatory connections with experimental/bioinformatic support but accomplished by unknown (and possibly posttranscriptional) mechanisms. Boxes represent cis-elements within the 500 bp promoters of the genes listed (ICE1, AT3G26744; ZAT12, AT5G59820; HOS9, AT2G01500; CBF2, AT4G25470; NAC072, AT4G27410; CBF1, AT4G25490; and CBF3, AT4G25480) and are color coded according to cis-element identity (red box, ICEr3; orange box, ICEr4; blue box, DRE; green box, ICEr1; purple box, I HOS9r1; and yellow box, EP2). Protein phosphorylation events, where known, are indicated by circles containing “P” attached to the protein TFs. Unknown types of posttranslational modification are indicated by circles containing “?” attached to the protein TFs. The ICEr4 element present in the CBF3 promoter contains the CCACGT core, but lacks an “H” in the first position. ). From this analysis, HOS9 was placed within the ICE1 signaling cascade, downstream of ICE1, based on the following evidence: (1) the induction of the HOS9r1 cis-regulon temporally follows the induction of the ICEr3 cis-regulon (Fig. 2); (2) the induction of the HOS9r1 cis-regulon is reduced and delayed in ice1 mutant plants responding to cold (Fig. 3C); (3) the HOS9 TF regulon shares significant overlap with the ICE1 TF regulon (Table II); and (4) the HOS9 promoter contains elements shown to be affected by ICE1 mutation, namely ICEr1 and HOS9r1 (Fig. 3, C and E). Similarly, though previously thought to lie outside the ICE1 signaling cascade, we include ZAT12 in the ICE1-mediated cascade based on the presence of ICEr3 and ICEr4 sites in its 500 bp promoter (that are significantly affected in their cold induction by the ice1 mutation; Fig. 3), and the significant overlap of both the ice1 induced and CBF2 induced TF regulons with the ZAT12 induced TF regulon (Table II; Supplemental Table V). Our bioinformatic analyses provide no support for the suggestion that ZAT12 acts through the ZAT12r1 element (CATTGA; see Vogel et al., 2005). The ZAT12r1 element is not enriched in any examined cold-responsive regulons (Table III) nor is the ZAT12r1 cis-regulon cold responsive (Fig. 2, G and N). Because further independent inclusive motif searches failed to uncover any bioactive enriched elements within the ZAT12 induced or repressed TF regulons, we conclude that the ZAT12-binding consensus is either larger than the 8 bp windows examined, or it is not sufficient to confer cold regulation. Although the EP2 element [ACTX(3–4)AGT or AGTX(3–4)ACT; Sakamoto et al., 2004] represents a larger potential binding consensus sequence, our analyses determined that it also lacked both bioactivity and ZAT12 TF regulon enrichment (data not shown). The experimentally established connections between ICE1 and the CBF TFs were supported by our bioinformatics analyses. Elements affected by the ice1 mutation and potentially bound by ICE1 were present in all three CBF promoters. However, the number and identity of the ICEr variants present differed between the CBF paralogs (Fig. 5), possibly explaining why the ice1 mutation differentially affects the cold induction of the different CBF paralogs. The A/C variation at position 2 of our proposed ICE1-binding consensus, yielding ICEr3 and ICEr4, respectively, had important consequences for bioactivity with respect to ICE1 (Fig. 4A). Of the two ICEr variants, ICEr4 was more strongly affected by the ice1 mutation after 3 h cold (Fig. 3, A versus B; Table III) but the presence of additional neighboring elements (Fig. 5) also contributed to paralog inducibility. The cold inducibility of cis-element bound by CBF1, 2, and 3, DRE (RCCGAC) was suppressed by ice1 mutation (Fig. 3D) and was underrepresented in the ice1 induced regulon (Table III). NAC072 was placed within the ICE1 cascade based on the fact that its promoter contains ICEr3 and DRE sites. The NAC072 repressed TF regulon is enriched (χ2, P < 0.06) for ICEr1 (Table III), and the statistically significant tendency for ICEr1 cis-regulon members to be repressed late in the cold treatment time course (Fig. 2M) agrees with the experimental observation that NAC072 transcripts are induced after 10 h of cold treatment (Tran et al., 2004). DISCUSSION Near full-genome transcriptomic studies of response of Arabidopsis to low temperature have generated complex data sets that are difficult to analyze because of their scale and variability. This necessitates the use of mathematical and statistical approaches to resolve questions about the interconnectivity of gene regulatory networks, just as statistical approaches were necessary in Mendel's discovery of the particulate model of genetic inheritance (see Janick, 1989). The advantage of modeling the low temperature transcriptional signaling cascade from a genomic perspective (documenting regulon connectivity) instead of a gene perspective (documenting connectivity gene by gene) is that observing the behavior of an entire cis-regulon of hundreds to thousands of genes in vivo during a particular stress or in a mutant background lends the statistical power to resolve our experimental observations and minimizes the risk that the chosen experimental subject/system represents a behavioral outlier. The large number of data points, much like voting in a democracy, smoothes away the effect of individual nonrepresentative genes. Meta-analyses of publicly available microarray data allowed us to construct an ICE1 signaling model (Fig. 5) that predicted regulatory connections already established by conventional bench-top methods (e.g. ICE1 transcriptional activation of the DRE-binding CBF TFs) and identified previously unknown regulatory connections between ICE1 and HOS9, ZAT12, NAC072, and PHYA that can be tested in future experiments. Evidence generated from our meta-analyses of TF regulon cis-element enrichment (Supplemental Fig. 1; Table III), cis-regulon induction order (Figs. 2 and 3), TF regulon overlap (Table II), and in silico mutagensis (Fig. 4) provided the anchors we used to create our bioinformatically generated regulatory map. Our observation that the ICEr3 cis-element was both enriched in the promoters of genes affected by the ice1 mutation (Supplemental Fig. 1; Table III) and associated with a cis-regulon whose induction preceded the induction of the DRE cis-regulon in wild-type plants (Fig. 2) supports our conclusion that ICE1 binds ICEr3 (and/or ICEr4) to induce expression of the DRE-binding transcriptional activators CBF1, CBF2, and CBF3 (along with the ZAT12 and NAC072 transcriptional repressors). Conversely, the nonfunctionality of the ICEr1 cis-element in wild-type plants (Fig. 2F) argues against it playing a role in cold gene activation in vivo, though its acquired cis-regulon inducibility in ice1 plants (Fig. 3E) may explain why ICEr1 is enriched in the ice1 3 h cold-repressed TF regulon (Table III). In agreement with previous experiments implicating circadian-gating/phytochrome regulation of CBF inducibility (Kim et al., 2002; Fowler et al., 2005), we were able to detect statistically significant depletion of ICEr3-containing genes (and the downstream signaling elements DRE, HOS9r1, ABRE, and ZAT12r1) from the list of all genes induced at dusk (relative to the rest of the circadian day; Table III). Although DRE-reporter constructs have been shown to be cold inducible in a phyA mutant background (Kim et al., 2002) and to be unresponsive in phyB mutants (suggesting PHYB mediation of the light signal; Kim et al., 2002), we were unable to establish the link between PHYB- and the ICE1-induced regulon through regulon overlap (Table II). In fact, there was a statistically significant exclusion of the ICE1- and CBF2-controlled regulon members from the phyB regulon. In contrast, the regulon of genes differentially regulated in FR light preconditioned phyA mutant plants (versus wild type) shows significant overlap with the ICE1-repressed regulon and is enriched for the ICEr3, DRE, and ICEr4 elements (Table III). While our results cannot exclude the influence of PHYB on the ICE1-mediated low temperature signaling pathway, they clearly implicate PHYA as an ICE1 transduction pathway mediator. This observation is consistent with reports that decreased temperature can modulate the functional relationships between phytochromes, and that PHYA, PHYD, and PHYE play greater roles with respect to PHYB in controlling flowering at lower temperatures (Halliday et al., 2003; Halliday and Whitelam, 2003). The exact nature of the interaction between ICE1 and PHYA needs to be determined, but it is noteworthy that the ICE1 gene promoter does not contain the previously reported PHYA regulatory site PhyAr1 (Hudson and Quail, 2003) and transcript levels of ICE1 are not affected in phyA mutants (as measured by microarray). The PHYA-ICE1 coupling may therefore be caused either via direct interaction with ICE1 (plausible, as PHYA is known to interact with basic helix-loop-helix TFs) via PHYA-mediated phosphorylation of ICE1, or through modulation of ICE1 proteolysis via a PHYA-mediated COP1-like ubiquitin-ligase degradation pathway such as HOS1 (Lee et al., 2001). Because the individual microarray experiments used to construct our model represent snapshots of gene expression at single points in time, when interpreting the results it is necessary to be cognizant of the hidden temporal influences on the data and resulting model. For example, based on current evidence it appears that a phosphorylation event triggered after a drop in temperature converts the constitutively expressed ICE1 protein into a transcriptional activator (Gilmour et al., 1988; Chinnusamy et al., 2003). Looking at the behavior of the ICEr3 cis-regulon at a time point before this phosphorylation event occurred during an experimental time course would introduce the possibility of spurious conclusions being drawn about ICE1 bioactivity and connectivity with downstream TFs. A similar example drawn from our analysis is the bioinformatic and experimental data collected to position NAC072/RD26 in our model (Fig. 5). NAC072 is induced only after long period of low temperature treatment (10 h according to northern blotting; Tran et al., 2004), its promoter contains the ICEr3 consensus, and NAC072 is known to bind an element containing the ICEr1 consensus core (in its reverse complementary form CATGTG; Tran et al., 2004). Our meta-analysis of microarray data demonstrated that ICEr1-containing genes are repressed at 24 h in the cold (Fig. 2), suggesting that ICE1 may induce NAC072, leading to repression of ICEr1-containing genes. This implicated NAC072 in the longer-term repression of the CBF TFs and their downstream regulons via the ICEr1 sites present in the CBF2 and CBF3 promoters. We observed no overlap between the NAC072 induced and 3 h ice1 repressed TF regulons (Table II) despite enrichment of the NAC072 TF regulon promoters for the ICEr3 sites (Table III) due to the fact that the ICE1 regulon used for the TF regulon overlap analysis was composed of genes differentially regulated at a time point when NAC072 is not detectable by northern blotting (Tran et al., 2004). When we checked for ICE1 TF regulon overlap with the NAC072 repressed regulon after 24 h of cold treatment, significant enrichment was found (χ2, P < 0.05; Supplemental Table III). The complexity of our proposed low temperature signaling model (Fig. 5) will likely increase as more low temperature signaling TFs are isolated and more time points are added to the microarray databases. Nevertheless, our model highlights several important considerations. First, the ICE-/CBF-mediated low temperature signaling pathway may be ABA independent (Gilmour and Thomashow, 1991), but ABA can act to amplify the ICE-/CBF-mediated signal via ICE1 transcript and ICEr3 cis-regulon induction (Chinnusamy et al., 2003; Knight et al., 2004; Table I). Second, the ICE-/CBF-mediated low temperature signaling pathway contains both positive and negative feedback loops. Expression of the NAC072/RD26 gene, which could positively feed back on ICE1-mediated signaling by increasing ABA sensitivity (Fujita et al., 2004), is activated by CBF TFs via the DRE element in its promoter, and we propose that it feeds back negatively on CBF expression via the ICEr1 site. This proposal is supported by gel-shift studies (Tran et al., 2004) and the agreement between the accumulation of NAC072 transcripts (at 10 h cold; Tran et al., 2004) and time of response for the ICEr1 regulon (24 h cold). Third, as suggested by our analysis and the experimental results of other authors (Gong et al., 2002; Kim et al., 2002), light affects the transcriptional activity of ICE1 and CBFs, with the ICEr3 and DRE element being less inducible than expected at dusk (versus the whole day) and strong interactions between phytochrome- and ICE1-mediated signaling pathways. Finally, while our analysis has focused on the ICE-meditated cold-signaling pathway, the gene-voting principle can be applied to a wide range of biological questions and should facilitate the mapping of bioactive linkages within complex signaling cascades, developing models that can be tested and verified by fewer bench-top experiments. MATERIALS AND METHODS Searching for Arabidopsis cis-Regulon Members The annotated Arabidopsis (Arabidopsis thaliana) genome database (The Arabidopsis Information Resource; www.arabidopsis.org) was searched to determine the number of genes with at least one copy of each candidate element within their 500 bp promoter (5′ upstream region) using the online PATMATCH string-search tool. This tool returns a list that includes gene identity, number of elements, position(s), and element sequence (where degenerate base codes are used). The probability of the cis-element occurring by random chance was calculated based on average promoter nucleotide frequencies (68% AT, 32% GC) in Arabidopsis, assuming that nucleotides are arranged at random in the promoter, and that no evolutionary selection pressure has operated on that sequence. The expected number of genes containing the element was calculated by multiplying the expected frequency by the total number of genes searched. Cis-element enrichment was calculated as the ratio of observed to expected number of times that cis-element occurred in the annotated Arabidopsis genome. cis-Regulon Correlation Analysis We took the list of all Arabidopsis genes belonging to each cis-regulon and identified the behavior of those regulon members in different microarray experiments. We compared the number of induced/suppressed genes in each cis-regulon to the number of genes expected to be induced/suppressed based on the fraction of all genes that responded to that treatment (see “Creating the Voter Registry” section of the “Results” for full cutoff description). Our null hypothesis predicts that a nonfunctional cis-element produces a cis-regulon whose number of induced/suppressed genes is not significantly different than a similarly sized regulon chosen randomly from the genome. When the cis-regulon is significantly more induced or suppressed using a χ2 test (P < 0.01), the corresponding cis-element is deemed to have functionally correlated with regulation. Searching for New Arabidopsis cis-Elements The 500 bp promoters of all genes induced (or separately suppressed) by the specific cold-signaling TF mutation or overexpression in question were assembled and searched using a Gibbs sampling program called MotifSampler2.0, online at http://www.esat.kuleuven.ac.be/∼thijs/Work/MotifSampler.html (Thijs et al., 2002). We used the Arabidopsis background model (in Motifsampler) and looked for motifs eight nucleotides in length on either DNA strand. Prior probability was set to 0.5 and a maximum overlap of two nucleotides was allowed. Microarray Data Sources Normalized microarrays were taken from the NASC (http://affymetrix.arabidopsis.info/narrays/experimentbrowse.pl), The Arabidopsis Information Resource public databases, or WeigelWorld (http://www.weigelworld.org/resources/microarray/AtGenExpress/). Experimental statistics and normalization were carried out by the original authors or the database provider. Meta-analysis of cis-element responsiveness in wild-type Arabidopsis to cold (4°C) and other abiotic stresses (UVB, methyl viologen, drought, salt, mannitol, and wounding) was carried out on the WeigelWorld abiotic stress treatment set. These experiments are fully described at the project Web site (URL above). NASC experiments were carried out on the Affymetrix ATH1 Arabidopsis genome array. Ecotype Columbia-0 (Col-0) cell culture was transferred to a high-light growth cabinet (240 μmol photons m−2 s−1; 16-h light/8-h dark) for 24 h prior to start of treatments. After this time, these wild-type/sfr6 plants either remained in this high-light treatment or were subjected to dark treatment by being wrapped in aluminum foil for 3 h before being harvested (sfr6; H. Knight, M. Knight, NASC experiment identification no. 194, 2005). Ler-0 ecotype plants were grown at 20°C (16 h d) and rosette and cauline leaf RNA extracted (phyB, phyAphyBcry1, and cry2; J. Casal, NASC experiment identification number 21, 2002). Col-0 ecotype seedlings on Murashige and Skoog media (no Suc) were preconditioned in the dark or FR light before transfer to continuous white light. Whole-seedling RNA was extracted (phyA and phyA + FR; A. McCormac, M. Terry, NASC experiment identification number 89, 2003). Ecotype Wassilewskija of Arabidopsis-2 was grown in long days and treated with ABA solution for 3 h, whole-plant RNA extracted (ABA; H. Okamoto, M. Knight, NASC experiment identification number 57, 2002). Col-0 seeds were sown on Murashige and Skoog agar plates (3% Suc), imbibed at 4°C for 96 h. Seeds were then entrained for 7 d at 22°C, in cycles of 12 h white light, 12 h darkness. After 7 d they were transferred to constant white light at 22°C (this is time 0 h). Tissue was harvested at 4 h intervals after time 0 (Dusk; K. Edwards, A. Millar, NASC experiment identification number 108, 2004). These experiments were entered into MasterDB so that the expression of each gene (in rows) could quickly be read across to find induction/suppression calls for all treatments (in columns). In MasterDB, gene induction was annotated as 1, suppression as −1, no expression change as 0, and no data available as 2. We used fold-change cutoffs for induction/suppression calls that decreased with increasing gene expression level as described in the “Results” section of this article (Supplemental Table II). Our normalized microarray dataset and filtering-statistics formula spreadsheets (in Microsoft Excel) are available upon request (currently a 70 Mb file). PCA PCA is a multivariate projection method designed to extract and display the systematic variation in a data matrix X (Jackson, 1991). PCA is often used to get an overview of a data table X, detect clusters, and identify anomalies and outliers. The first two principal components define a plane that approximates all the variation in X. The coordinates of the points projected down onto this hyperplane are called scores (tan, a = 1,2,3,…A). The direction of each dimension in the hyperplane is its loading (pak, a = 1,2,3,…A). The part of X that is not explained by the model forms the residuals (Enk). The scores, loadings, and residuals together describe all of the variation in X. \[\mathrm{Model}{\,}\mathrm{of}{\,}X:{\,}X{=}\mathrm{TP}^{\mathrm{T}}{+}E{=}\mathrm{t}_{1}\mathrm{p}_{1}^{\mathrm{T}}{+}\mathrm{t}_{2}\mathrm{p}_{2}^{\mathrm{T}}{+}{\ldots}{+}E.\] The PCA model calculations and visualization were performed using the SIMCA-P+ 11.0 software from Umetrics AB, Umeå, Sweden (www.umetrics.com). ACKNOWLEDGMENTS We thank Professor Detlef Weigel, Max Planck Institute for Developmental Biology, Tübingen, and the AtGenExpress expression atlas project for the use of their microarray data sets, as well as the NASC database contributors listed above. We also thank Dr. Åsa Strand and Jan Eklöf, Umeå Plant Science Centre, for help revising the manuscript. LITERATURE CITED Boyce JM, Knight H, Deyholos M, Openshaw MR, Galbraith DW, Warren G, Knight MR ( 2003 ) The sfr6 mutant of Arabidopsis is defective in transcriptional activation via CBF/DREB1 and DREB2 and shows sensitivity to osmotic stress. Plant J 34 : 395 –406 Chinnusamy V, Ohta M, Kanrar S, Lee BH, Hong XH, Agarwal M, Zhu JK ( 2003 ) ICE1: a regulator of cold-induced transcriptome and freezing tolerance in Arabidopsis. Genes Dev 17 : 1043 –1054 Chinnusamy V, Zhu J, Zhu J-K ( 2006 ) Gene regulation during cold acclimation in plants. Physiol Plant 126 : 52 –61 Coessens B, Thijs G, Aerts S, Marchal K, De Smet F, Engelen K, Glenisson P, Moreau Y, Mathys J, De Moor B ( 2003 ) INCLUSive: a web portal and service registry for microarray and regulatory sequence analysis. Nucleic Acids Res 31 : 3468 –3470 Fowler S, Thomashow MF ( 2002 ) Arabidopsis transcriptome profiling indicates that multiple regulatory pathways are activated during cold acclimation in addition to the CBF cold response pathway. Plant Cell 14 : 1675 –1690 Fowler SG, Cook D, Thomashow ME ( 2005 ) Low temperature induction of Arabidopsis CBF1, 2, and 3 is gated by the circadian clock. Plant Physiol 137 : 961 –968 Fujita M, Fujita Y, Maruyama K, Seki M, Hiratsu K, Ohme-Takagi M, Tran LSP, Yamaguchi-Shinozaki K, Shinozaki K ( 2004 ) A dehydration-induced NAC protein, RD26, is involved in a novel ABA-dependent stress-signaling pathway. Plant J 39 : 863 –876 Geisler M, Kleczkowski LA, Karpinski S ( 2006 ) A universal algorithm for genome-wide in silico identification of biologically significant gene promoter putative cis-regulatory-elements; identification of new elements for reactive oxygen species and sucrose signaling in Arabidopsis. Plant J 45 : 384 –398 Gilmour SJ, Hajela RK, Thomashow MF ( 1988 ) Cold acclimation in Arabidopsis thaliana. Plant Physiol 87 : 745 –750 Gilmour SJ, Thomashow MF ( 1991 ) Cold acclimation and cold regulated gene expression in ABA mutants of Arabidopsis thaliana. Plant Mol Biol 17 : 1233 –1240 Gong ZZ, Lee H, Xiong LM, Jagendorf A, Stevenson B, Zhu JK ( 2002 ) RNA helicase-like protein as an early regulator of transcription factors for plant chilling and freezing tolerance. Proc Natl Acad Sci USA 99 : 11507 –11512 Halliday KJ, Salter MG, Thingnaes E, Whitelam GC ( 2003 ) Phytochrome control of flowering is temperature sensitive and correlates with expression of the floral integrator FT. Plant J 33 : 875 –885 Halliday KJ, Whitelam GC ( 2003 ) Changes in photoperiod or temperature alter the functional relationships between phytochromes and reveal roles for phyD and phyE. Plant Physiol 131 : 1913 –1920 Hudson ME, Quail PH ( 2003 ) Identification of promoter motifs involved in the network of phytochrome A-regulated gene expression by combined analysis of genomic sequence and microarray data. Plant Physiol 133 : 1605 –1616 Jackson JE ( 1991 ) A Users Guide to Principle Components. Wiley, New York Janick J ( 1989 ) Gregor Mendel. In J Janick, ed, Classic Papers in Horticultural Science. Prentice Hall, Englewood Cliffs, NJ, pp 406–412 Kim HJ, Kim YK, Park JY, Kim J ( 2002 ) Light signalling mediated by phytochrome plays an important role in cold-induced gene expression through the C-repeat/dehydration responsive element (C/DRE) in Arabidopsis thaliana. Plant J 29 : 693 –704 Knight H, Veale EL, Warren GJ, Knight MR ( 1999 ) The sfr6 mutation in Arabidopsis suppresses low-temperature induction of genes dependent on the CRT DRE sequence motif. Plant Cell 11 : 875 –886 Knight H, Zarka DG, Okamoto H, Thomashow ME, Knight MR ( 2004 ) Abscisic acid induces CBF gene transcription and subsequent induction of cold-regulated genes via the CRT promoter element. Plant Physiol 135 : 1710 –1717 Lee BH, Henderson DA, Zhu JK ( 2005 ) The Arabidopsis cold-responsive transcriptome and its regulation by ICE1. Plant Cell 17 : 3155 –3175 Lee HJ, Xiong LM, Gong ZZ, Ishitani M, Stevenson B, Zhu JK ( 2001 ) The Arabidopsis HOS1 gene negatively regulates cold signal transduction and encodes a RING finger protein that displays cold-regulated nucleo-cytoplasmic partitioning. Genes Dev 15 : 912 –924 Maruyama K, Sakuma Y, Kasuga M, Ito Y, Seki M, Goda H, Shimada Y, Yoshida S, Shinozaki K, Yamaguchi-Shinozaki K ( 2004 ) Identification of cold-inducible downstream genes of the Arabidopsis DREB1A/CBF3 transcriptional factor using two microarray systems. Plant J 38 : 982 –993 Nakashima K, Yamaguchi-Shinozaki K ( 2006 ) Regulons involved in osmotic stress-responsive and cold stress-responsive gene expression in plants. Physiol Plant 126 : 62 –71 Sakamoto H, Maruyama K, Sakuma Y, Meshi T, Iwabuchi M, Shinozaki K, Yamaguchi-Shinozaki K ( 2004 ) Arabidopsis Cys2/His2-type zinc-finger proteins function as transcription repressors under drought, cold, and high-salinity stress conditions. Plant Physiol 136 : 2734 –2746 Thijs G, Moreau Y, De Smet F, Mathys J, Lescot M, Rombauts S, Rouze P, De Moor B, Marchal K ( 2002 ) INCLUSive: INtegrated clustering, upstream of sequence retrieval and motif sampling. Bioinformatics 18 : 331 –332 Tran LSP, Nakashima K, Sakuma Y, Simpson SD, Fujita Y, Maruyama K, Fujita M, Seki M, Shinozaki K, Yamaguchi-Shinozaki K ( 2004 ) Isolation and functional analysis of Arabidopsis stress-inducible NAC transcription factors that bind to a drought-responsive cis-element in the early responsive to dehydration stress 1 promoter. Plant Cell 16 : 2481 –2498 Vogel JT, Zarka DG, Van Buskirk HA, Fowler SG, Thomashow MF ( 2005 ) Roles of the CBF2 and ZAT12 transcription factors in configuring the low temperature transcriptome of Arabidopsis. Plant J 41 : 195 –211 Zarka DG, Vogel JT, Cook D, Thomashow MF ( 2003 ) Cold induction of Arabidopsis CBF genes involves multiple ICE (Inducer of CBF expression) promoter elements and a cold-regulatory circuit that is desensitized by low temperature. Plant Physiol 133 : 910 –918 Zhu JH, Shi HZ, Lee BH, Damsz B, Cheng S, Stirm V, Zhu JK, Hasegawa PM, Bressan RA ( 2004 ) An Arabidopsis homeodomain transcription factor gene, HOS9, mediates cold tolerance through a CBF-independent pathway. Proc Natl Acad Sci USA 101 : 9873 –9878 Author notes 1 This work was supported by grants from the Swedish Forestry and Agricultural Research Council (to V.H.). * Corresponding author; e-mail [email protected]; fax 46–90–786–676. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Catherine Benedict ([email protected]). [W] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.106.083527. © 2006 American Society of Plant Biologists This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

journal article

LitStream Collection

How Far Can a Molecule of Weak Acid Travel in the Apoplast or Xylem?

Kramer, Eric M.

2006 Plant Physiology

doi: 10.1104/pp.106.083790pmid: 16896235

The plant hormones auxin, abscisic acid (ABA), and the gibberellins (GAs) are all weak acids subject to the ion-trapping mechanism that tends to remove them from the extracellular space and concentrate them in the cytoplasm of plant cells. If a molecule of one of these compounds enters the extracellular space, it can therefore travel only a limited distance before reentering a cell. Influx carriers can only shorten this distance. Here I present a simple but quantitative estimate of this distance, and discuss its relevance for various models of short- and long-range signaling in plants. To review, the weak acids of interest all have one or more carboxyl groups, with dissociation constants between 4 and 5 (Table I Table I. Decay length for extracellular movement of some plant hormones Estimates for the decay length of travel through the apoplast (pH = 5.5) and the xylem (pH = 5.5) are given. The diffusion coefficient for all molecules is estimated to be 10% of the aqueous value for auxin, D = D aq/10 = 0.0024 cm2 h−1. Decay lengths in the xylem assume a vessel radius of R = 100 μm. All values for the GAs, except pK a, should be regarded as order-of-magnitude estimates only. The bottom row, an estimate for any hormone subject to the activity of an influx carrier, assumes P eff = 1.0 cm h−1. . pK a a . P AH b . L apo . . L xylem . . . . . If h = 0.1 μm . If h = 1.0 μm . If v = 1 m h−1 . If v = 10 m h−1 . cm/h μm m Auxin 4.8 0.2 13 42 0.31 3.1 ABA 4.7 0.04 35 110 2.2 22 GA3 4.0 0.008 150 500 50 500 Early-hydroxylation pathway GAs GA20 4.2 0.08 40 150 3 30 GA1 4.0 0.006 200 500 50 500 Late-hydroxylation pathway GAs GA9 4.3 4 5 15 0.04 0.4 GA4 4.2 0.4 20 60 0.6 6.0 With influx carrier NA NA 2.5 8.0 0.01 0.1 . pK a a . P AH b . L apo . . L xylem . . . . . If h = 0.1 μm . If h = 1.0 μm . If v = 1 m h−1 . If v = 10 m h−1 . cm/h μm m Auxin 4.8 0.2 13 42 0.31 3.1 ABA 4.7 0.04 35 110 2.2 22 GA3 4.0 0.008 150 500 50 500 Early-hydroxylation pathway GAs GA20 4.2 0.08 40 150 3 30 GA1 4.0 0.006 200 500 50 500 Late-hydroxylation pathway GAs GA9 4.3 4 5 15 0.04 0.4 GA4 4.2 0.4 20 60 0.6 6.0 With influx carrier NA NA 2.5 8.0 0.01 0.1 a pK a values are provided by the LogD software suite, version 9.0 (ACD Labs). Values generally agree with experiment within ±0.1 (Tidd, 1964; Rubery and Sheldrake, 1973; Kaiser and Hartung, 1981; Tomlin, 2000). b The permeability of protonated auxin has been discussed previously (Swarup et al., 2005). The permeability of protonated ABA comes from Astle and Rubery (1980) and Baier et al. (1990). GA permeabilities are estimates based on the octanol-water partition coefficient predictions of the LogD software suite. The value for GA1 falls between the experimental values of Drake and Carr (1981) and Nour and Rubery (1984). Open in new tab Table I. Decay length for extracellular movement of some plant hormones Estimates for the decay length of travel through the apoplast (pH = 5.5) and the xylem (pH = 5.5) are given. The diffusion coefficient for all molecules is estimated to be 10% of the aqueous value for auxin, D = D aq/10 = 0.0024 cm2 h−1. Decay lengths in the xylem assume a vessel radius of R = 100 μm. All values for the GAs, except pK a, should be regarded as order-of-magnitude estimates only. The bottom row, an estimate for any hormone subject to the activity of an influx carrier, assumes P eff = 1.0 cm h−1. . pK a a . P AH b . L apo . . L xylem . . . . . If h = 0.1 μm . If h = 1.0 μm . If v = 1 m h−1 . If v = 10 m h−1 . cm/h μm m Auxin 4.8 0.2 13 42 0.31 3.1 ABA 4.7 0.04 35 110 2.2 22 GA3 4.0 0.008 150 500 50 500 Early-hydroxylation pathway GAs GA20 4.2 0.08 40 150 3 30 GA1 4.0 0.006 200 500 50 500 Late-hydroxylation pathway GAs GA9 4.3 4 5 15 0.04 0.4 GA4 4.2 0.4 20 60 0.6 6.0 With influx carrier NA NA 2.5 8.0 0.01 0.1 . pK a a . P AH b . L apo . . L xylem . . . . . If h = 0.1 μm . If h = 1.0 μm . If v = 1 m h−1 . If v = 10 m h−1 . cm/h μm m Auxin 4.8 0.2 13 42 0.31 3.1 ABA 4.7 0.04 35 110 2.2 22 GA3 4.0 0.008 150 500 50 500 Early-hydroxylation pathway GAs GA20 4.2 0.08 40 150 3 30 GA1 4.0 0.006 200 500 50 500 Late-hydroxylation pathway GAs GA9 4.3 4 5 15 0.04 0.4 GA4 4.2 0.4 20 60 0.6 6.0 With influx carrier NA NA 2.5 8.0 0.01 0.1 a pK a values are provided by the LogD software suite, version 9.0 (ACD Labs). Values generally agree with experiment within ±0.1 (Tidd, 1964; Rubery and Sheldrake, 1973; Kaiser and Hartung, 1981; Tomlin, 2000). b The permeability of protonated auxin has been discussed previously (Swarup et al., 2005). The permeability of protonated ABA comes from Astle and Rubery (1980) and Baier et al. (1990). GA permeabilities are estimates based on the octanol-water partition coefficient predictions of the LogD software suite. The value for GA1 falls between the experimental values of Drake and Carr (1981) and Nour and Rubery (1984). Open in new tab ). In the weakly acidic apoplast, a fraction of each hormone will be protonated and thus membrane permeable. However, once in the approximately neutral cytoplasm, the molecules dissociate and become membrane-impermeable anions. In the absence of transmembrane efflux carriers, the molecules will accumulate in the cytoplasm—the so-called ion-trapping mechanism. Although the principle of ion trapping has been known for decades (Rubery and Sheldrake, 1973), its contribution to the overall hormone economy has remained vague. The actual accumulation of a weakly acidic plant hormone in any cell is dominated by the activity of influx and efflux carriers, if present, with additional contributions from biosynthesis and metabolism pathways (Davies, 2004). However, the fact remains that a weak acid, upon entering the extracellular space, will tend to be trapped by adjacent cells. How far can we expect a molecule of hormone to travel? Consider the idealized situation shown in Figure 1A Figure 1. Open in new tabDownload slide Sketch of the movement of a weak acid (blue circles) in the apoplast. A, Transmitting cell (T) is separated from the receiving cell (R) by a cell wall of length x. B, Transmitting and receiving cells share a common wall of width w. S, Parenchyma cells that act as sinks for the hormone. Figure 1. Open in new tabDownload slide Sketch of the movement of a weak acid (blue circles) in the apoplast. A, Transmitting cell (T) is separated from the receiving cell (R) by a cell wall of length x. B, Transmitting and receiving cells share a common wall of width w. S, Parenchyma cells that act as sinks for the hormone. . A transmitter cell secretes a pulse of hormone into the apoplast. The hormone then moves through the apoplast between two sink cells. What fraction of the excreted hormone reaches the receiver cell, a distance x away? The answer (derived in the supplemental data) is 10∧(−x/L apo), where L apo is a characteristic decay length \[L_{\mathrm{apo}}{=}1.63\sqrt{\frac{Dh}{P_{\mathrm{eff}}}},\] (1) where h is the thickness of the wall, D is the diffusion coefficient of the hormone in the wall, and P eff is the effective permeability of the sink cell membranes. The latter depends on a variety of factors, but in the absence of influx carriers it reduces to \[P_{\mathrm{eff}}{=}P_{\mathrm{AH}}\left(\frac{1}{1{+}10^{\mathrm{pH}{-}\mathrm{pK}_{\mathrm{a}}}}\right),\] (2) where P AH is the membrane permeability of the protonated form of the hormone and the term in parenthesis is the fraction of the hormone that is protonated in the apoplast. There are several important caveats to this discussion. First, Equation 1 is exact only for the simplified geometry shown in Figure 1. Cell walls seldom have uniform thickness, so the parameter h is an approximate width (see supplemental data for additional discussion). Second, arrangements with more than one layer of sink cells between the transmitter and the receiver will have a shorter decay length due to the presence of more cell surface area available for import. Third, eukaryotic cell membranes are complex structures—inhomogeneous in composition, crowded with membrane proteins, and subject to rapid turnover (Engelman, 2005). The diffusive permeability of protonated hormones may therefore be expected to vary from cell to cell and even between different microdomains of the same membrane. Last, auxin and other hormones can trigger proton secretion, leading to changes in apoplastic pH on a time scale of minutes (Davies, 2004). A decrease in apoplastic pH by 0.5 will decrease L apo by about one-half. For all these reasons, a value for the decay length in plant tissues should be regarded as approximate. However, it should also be noted that the square root in Equation 1 tends to limit the impact of parameter variations. In Table I, I estimate typical values for the decay length of auxin, ABA, and several GAs. These data constrain proposed models of hormone action. The decay length is the distance over which the hormone concentration decreases by a factor of 10. Thus, a distance of 3L apo can be taken as a practical upper bound on the distance between the transmitting and receiving cell. A pulse of hormone can travel much farther than 3L apo only if sink cells export the hormone back into the apoplast via efflux carriers or into the cytoplasm of adjacent cells via the plasmodesmata. Table I thus implies that, in thin-walled meristematic tissue, an apoplastic pulse of auxin or late-hydroxylation pathway GAs can travel just a few cell diameters. In mature tissues with well-developed cell walls, the decay lengths are uniformly larger by a factor of about 3. Note, in particular, GA1 and GA3. These are sufficiently membrane impermeable that an apoplastic signal can travel farther than 1 mm. Experiments applying radiolabeled GA1 or GA3 to sectioned plant tissues often find transport over distances >10 mm (Phillips and Hartung, 1976; Drake and Carr, 1979), which suggests a role for both apoplastic transport and efflux carriers. The decay length also provides a useful bound on the efficiency of signaling between adjacent cells. Figure 1B shows a sketch of the apoplastic interface between a transmitter cell and a receiver cell—analogous to the synapse between two neurons. If the width w of the interface is large compared to L apo, then most of the hormone secreted into the interface enters the receiver cell (or reenters the transmitting cell). If w = L apo, less than one-half of the hormone secreted by the transmitting cell remains at the interface. The rest diffuses into the adjacent cell walls. For w = 0.1L apo, 98% leaves the interface via the cell walls (see proof in supplemental data). Table I indicates that, for thin-walled cells with no influx carriers, only the late-hydroxylation GAs and possibly auxin would be efficient paracrine signals. Of course, specific influx carriers in the receiver cell membrane would permit efficient paracrine signaling with any hormone. There is good evidence for influx carriers specific for ABA and some GAs (Astle and Rubery, 1983; Nour and Rubery, 1984; Perras et al., 1994; Yamaguchi et al., 2001). For auxin, at least one gene family of influx facilitators is known (Parry et al., 2001; Terasaka et al., 2005). This analysis of interface efficiency is relevant for models of auxin transport and auxin-mediated morphogenesis. Auxin transport is transcellular, which means auxin moving through a file of cells traverses the cell wall between each pair of neighbors in the file (Goldsmith et al., 1981). In addition, auxin transport is often concentrated in a narrow file or a uniserrate layer of cells (Swarup et al., 2005; de Reuille et al., 2006; Jonsson et al., 2006; Smith et al., 2006). Efficient transport thus requires that the cell interfaces not be too leaky; in other words, w > L apo. Swarup et al. (2005) estimate that influx facilitators in the root epidermis of Arabidopsis thaliana increase membrane permeability by a factor of 15, giving L apo = 3 μm for thin-walled cells. Apoplastic diffusion is not negligible in this system. Regarding auxin-mediated morphogenesis, consider the case of spiral phyllotaxis in the shoot apical meristem. Three recently published computer models of phyllotaxis all couple the auxin flux between cells with cell differentiation triggered by high cytoplasmic auxin concentration (de Reuille et al., 2006; Jonsson et al., 2006; Smith et al., 2006). The results are patterns of primordia initiation that match observations. However, two of these models assume that cell interfaces are 100% efficient (i.e. no apoplastic diffusion; de Reuille et al., 2006; Smith et al., 2006), whereas the third uses a value for the membrane permeability of protonated auxin that is 60 times larger than the measured value in plant cells (Delbarre et al., 1996; Jonsson et al., 2006). From the previous paragraph, it is clear that apoplastic diffusion can be neglected only if the width of a cell interface is large compared to L apo. Even if auxin influx carriers increase the effective membrane permeability by a factor of 15, L apo will still be comparable to the typical interfacial width of 5 μm. Similar considerations apply to the movement of a weak acid in the xylem (Fig. 2 Figure 2. Open in new tabDownload slide Sketch of the movement of a weak acid (blue circles) in a xylem vessel (Xy). Arrow indicates the direction of water movement. All other labels are as in Figure 1. Note that the calculation of L xylem in the text only considers the loss of hormone across plasma membranes that face the xylem vessel. The additional loss of hormone due to diffusion along the radial walls between sink cells is expected to be small. Figure 2. Open in new tabDownload slide Sketch of the movement of a weak acid (blue circles) in a xylem vessel (Xy). Arrow indicates the direction of water movement. All other labels are as in Figure 1. Note that the calculation of L xylem in the text only considers the loss of hormone across plasma membranes that face the xylem vessel. The additional loss of hormone due to diffusion along the radial walls between sink cells is expected to be small. ). In this case, the decay length (derived in supplemental data) is \[L_{\mathrm{xylem}}{=}1.15\frac{Rv}{P_{\mathrm{eff}}},\] (3) where v is the speed of the xylem sap and R is the radius of the xylem vessel. Values for L xylem are estimated in Table I. In the absence of carriers, most of the weak acids have a decay length of approximately 2 m or greater. The only exceptions are the late-hydroxylation GAs and (at low transpiration rates) auxin, due to their relatively high membrane permeabilities. Conversely, GA1 and GA3 again have the longest range, due to their low membrane permeability. This is consistent with autoradiography studies that show considerable movement of exogenously applied GAs in the xylem (Zweig et al., 1961; Couillerot and Bonnemain, 1975). The most thoroughly studied hormone in the xylem is ABA, which is a drought stress signal that moves from the roots to the leaves in the transpiration stream (Sauter et al., 2001). The decay length for ABA is on the order of 10 m, consistent with a role in long-range signaling. Note also that L xylem is proportional to the speed of flow in the xylem. At low transpiration rates, losses to the xylem parenchyma cells are proportionately higher. In this way, the concentration of ABA in the xylem may provide the plant with a local measure of transpiration rates. In this letter, I have discussed the kinetics of acid trapping. There is a general tendency in the literature to regard membrane permeability as an all-or-nothing phenomenon. However, it is clear from the above discussion that the relative degree of membrane permeability has important consequences for models of hormone transport and signaling. ACKNOWLEDGMENTS This article was written while the author was on sabbatical in the Biology Department of the University of Massachusetts, Amherst. LITERATURE CITED Astle M, Rubery PH ( 1980 ) A study of abscisic acid uptake by apical and proximal root segments of Phaseolus coccineus L. Planta 150 : 312 –320 Astle M, Rubery PH ( 1983 ) Carriers for abscisic acid and indole-3-acetic acid in primary roots: their regional localisation and thermodynamic driving forces. Planta 157 : 53 –63 Baier M, Gimmler H, Hartung W ( 1990 ) The permeability of the guard cell plasma membrane and tonoplast. J Exp Bot 41 : 351 –358 Couillerot JP, Bonnemain JL ( 1975 ) Transport et devenir des molecules marquees apres l'application d'acide gibberellique-14C sur les jeunes feuilles de tomate. C R Acad Sci Paris D280 : 1453 –1456 Davies P, editor ( 2004 ) Plant Hormones: Biosynthesis, Signal Transduction, Action! Ed 3. Kluwer Academic Publishers, London de Reuille PB, Bohn-Courseau I, Ljung K, Morin H, Carraro N, Godin C, Traas J ( 2006 ) Computer simulations reveal properties of the cell-cell signalling network at the shoot apex in Arabidopsis. Proc Natl Acad Sci USA 103 : 1627 –1632 Delbarre A, Muller P, Guern J ( 1996 ) Comparison of mechanisms controlling uptake and accumulation of 2,4-dichlorophenoxy acetic acid, naphthalene-1-acetic acid, and indole-3-acetic acid in suspension-cultured tobacco cells. Planta 198 : 532 –541 Drake GA, Carr DJ ( 1979 ) Symplastic transport of gibberellins: evidence from flux and inhibitor studies. J Exp Bot 30 : 439 –447 Drake GA, Carr DJ ( 1981 ) Flux studies and compartmentation analysis of gibberellin A1 in oat coleoptiles. J Exp Bot 32 : 103 –119 Engelman DM ( 2005 ) Membranes are more mosaic than fluid. Nature 438 : 578 –580 Goldsmith MHM, Goldsmith TH, Martin MH ( 1981 ) Mathematical analysis of the chemosmotic polar diffusion of auxin through plant tissues. Proc Natl Acad Sci USA 78 : 976 –980 Jonsson H, Heisler MG, Shapiro BE, Meyerowitz EM, Mjolsness E ( 2006 ) An auxin-driven polarized transport model for phyllotaxis. Proc Natl Acad Sci USA 103 : 1633 –1638 Kaiser W, Hartung W ( 1981 ) Uptake and release of abscisic acid by isolated photoautotrophic mesophyll cells, depending on pH gradient. Plant Physiol 68 : 202 –206 Nour J, Rubery PH ( 1984 ) The uptake of gibberellin A1 by suspension-cultured Spinacia oleracea cells has a carrier-mediated component. Planta 160 : 436 –443 Parry G, Marchant A, May S, Swarup R, Swarup K, James N, Graham N, Allen T, Martucci T, Yemm A, et al ( 2001 ) Quick on the uptake: characterization of a family of plant auxin influx carriers. J Plant Growth Regul 20 : 217 –225 Perras M, Abrams S, Balsevich J ( 1994 ) Characterization of an abscisic acid carrier in suspension-cultured barley cells. J Exp Bot 45 : 1565 –1573 Phillips IDJ, Hartung W ( 1976 ) Longitudinal and lateral transport of [3,4-3H] gibberellin A1 and 3-indolyl (acetic acid-2-14C) in upright and geotropically responding green internode segments from Helianthus annuus. New Phytol 76 : 1 –9 Rubery PH, Sheldrake AR ( 1973 ) Effect of pH and surface charge on cell uptake of auxin. Nat New Biol 244 : 285 –288 Sauter A, Davies WJ, Hartung W ( 2001 ) The long-distance abscisic acid signal in the droughted plant: the fate of the hormone on its way from root to shoot. J Exp Bot 52 : 1991 –1997 Smith R, Guyomarch S, Mandel T, Reinhardt D, Kuhlemeier C, Prusinkiewicz P ( 2006 ) A plausible model of phyllotaxis. Proc Natl Acad Sci USA 103 : 1301 –1306 Swarup R, Kramer EM, Perry P, Knox K, Leyser O, Haseloff J, Beemster G, Bhalerao R, Bennett M ( 2005 ) Root gravitropism requires lateral root cap and epidermal cells for transport and response to a mobile auxin signal. Nat Cell Biol 7 : 1057 –1065 Terasaka K, Blakeslee J, Titapiwatanakun B, Peer WA, Bandyopadhyay A, Makam S, Lee O, Richards E, Murphy A, Sato F, et al ( 2005 ) PGP4, an ATP binding cassette p-glycoprotein, catalyzes auxin transport in Arabidopsis thaliana roots. Plant Cell 17 : 2922 –2939 Tidd B ( 1964 ) Dissociation constants of the gibberellins. J Chem Soc 1521 –1523 Tomlin CDS, editor ( 2000 ) The Pesticide Manual, Ed 12. British Crop Protection Council, Farnham, UK Yamaguchi S, Kamiya Y, Sun T ( 2001 ) Distinct cell-specific expression patterns of early and late gibberellin biosynthetic genes during Arabidopsis seed germination. Plant J 28 : 443 –453 Zweig G, Yamaguchi S, Mason GW ( 1961 ) Translocation of applied C14-gibberellin in red kidney bean, normal corn, and dwarf corn. In Gibberellins. American Chemical Society, Washington, DC Author notes 1 This work was supported in part by the U.S. National Science Foundation (grant no. 0316876) and by the hospitality of the labs of Tobias I. Baskin and Peter Hepler. Software was purchased with funds provided by the Biotechnology and Biological Sciences Research Council (UK). 2 Present address: Physics Department, Simon's Rock College, 84 Alford Rd., Great Barrington, MA 01230. * E-mail [email protected]; fax 413–528–7365. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Eric M. Kramer ([email protected]). [W] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.106.083790. © 2006 American Society of Plant Biologists This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

journal article

LitStream Collection

Molecular and Structural Characterization of Hexameric β-d-Glucosidases in Wheat and Rye

Sue, Masayuki; Yamazaki, Kana; Yajima, Shunsuke; Nomura, Taiji; Matsukawa, Tetsuya; Iwamura, Hajime; Miyamoto, Toru

2006 Plant Physiology

doi: 10.1104/pp.106.077693pmid: 16751439

Abstract The wheat (Triticum aestivum) and rye (Secale cereale) β-d-glucosidases hydrolyze hydroxamic acid-glucose conjugates, exist as different types of isozyme, and function as oligomers. In this study, three cDNAs encoding β-d-glucosidases (TaGlu1a, TaGlu1b, and TaGlu1c) were isolated from young wheat shoots. Although the TaGlu1s share very high sequence homology, the mRNA level of Taglu1c was much lower than the other two genes in 48- and 96-h-old wheat shoots. The expression ratio of each gene was different between two wheat cultivars. Recombinant TaGlu1b expressed in Escherichia coli was electrophoretically distinct fromTaGlu1a and TaGlu1c. Furthermore, coexpression of TaGlu1a and TaGlu1b gave seven bands on a native-PAGE gel, indicating the formation of both homo- and heterohexamers. One distinctive property of the wheat and rye glucosidases is that they function as hexamers but lose activity when dissociated into smaller oligomers or monomers. The crystal structure of hexameric TaGlu1b was determined at a resolution of 1.8 Å. The N-terminal region was located at the dimer-dimer interface and plays a crucial role in hexamer formation. Mutational analyses revealed that the aromatic side chain at position 378, which is located at the entrance to the catalytic center, plays an important role in substrate binding. Additionally, serine-464 and leucine-465 of TaGlu1a were shown to be critical in the relative specificity for DIMBOA-glucose (2-O-β-d-glucopyranosyl-4-hydroxy-7-methoxy-1,4-benzoxazin-3-one) over DIBOA-glucose (7-demethoxy-DIMBOA-glucose). β-d-Glucosidase (EC. 3.2.1.21) is a major member of the family GH1 and GH3 glycoside hydrolases and is responsible for hydrolysis of terminal non-reducing β-d-Glc residues in oligosaccharides (or polysaccharides) or glucoconjugates. In plants, β-d-glucosidases are involved in various functions, including lignification (Dharmawardhana et al., 1995), regulation of the biological activity of cytokinins (Brzobohatý et al., 1993; Falk and Rask, 1995; Haberer and Kieber, 2002), control of the biosynthesis of indole-3-acetic acid (Ljung et al., 2001; Persans et al., 2001), and chemical defense against pathogens and herbivores (Niemeyer, 1988; Sicker et al., 2000; Zagrobelny et al., 2004). Many secondary products in plants occur as glucoconjugates with one or two Glc units attached to a hydroxy or thiol group. Hydrolysis of the glucosidic linkage in secondary metabolites, such as cyanogenic-, flavonoid-, and hydroxamic acid-glucosides, can drastically alter the biological activity, chemical stability, and water solubility of the molecule. The β-d-glucosidases implicated in the hydrolysis of plant secondary metabolites are members of the family GH1 glycoside hydrolases. The classification system of the glycoside hydrolases is available on the CAZy database at http://afmb.cnrs-mrs.fr/CAZY/. Although β-d-glucosidases possess broad substrate specificity with respect to the aglycone moiety, the preferred aglycone structures vary with each glucosidase, reflecting their wide variety of physiological roles. Indeed, some β-d-glucosidases exhibit strict aglycone specificity. For example, the sorghum (Sorghum bicolor) glucosidase (dhurrinase 1, SbDhr1) acts specifically on its natural substrate, dhurrin. Dhurrin inhibits the activity of the maize (Zea mays) homolog, ZmGlu1, whose amino acid sequence shares about 70% identity with SbDhr1 (Hösel et al., 1987; Cicek and Esen, 1998). Recently, several research groups have investigated aglycone recognition mechanisms by SbDhr1 and ZmGlu1 using a combination of site-directed mutagenesis and x-ray crystallography (Czjzek et al., 2000, 2001; Verdoucq et al., 2003, 2004). In previous studies, we purified β-d-glucosidases from the seedlings of wheat (Triticum aestivum) and rye (Secale cereale; Sue et al., 2000a, 2000b). The seedlings accumulate O-β-d-glucosides of hydroxamic acids (Hxs; 2,4-dihydroxy-1,4-benzoxazin-3-one, DIBOA, and its 7-methoxy derivative, DIMBOA; Fig. 1 Figure 1. Open in new tabDownload slide Structures of the natural substrates in wheat. DIBOA-Glc, 2-O-β-d-Glucopyranosyl-4-hydroxy-1,4-benzoxazin-3-one; DIMBOA-Glc, 2-O-β-d-glucopyranosyl-4-hydroxy-7-methoxy-1,4-benzoxazin-3-one. Figure 1. Open in new tabDownload slide Structures of the natural substrates in wheat. DIBOA-Glc, 2-O-β-d-Glucopyranosyl-4-hydroxy-1,4-benzoxazin-3-one; DIMBOA-Glc, 2-O-β-d-glucopyranosyl-4-hydroxy-7-methoxy-1,4-benzoxazin-3-one. ) as defensive compounds against pathogens and herbivores (Niemeyer, 1988). These compounds are thought to be stored in intact plants as glucosides within a different subcellular compartment from the glucosidase. Although the wheat and rye glucosidases hydrolyze DIBOA-Glc and DIMBOA-Glc, the preferred natural substrate for each enzyme is consistent with the predominant Hx found in each respective plant: DIMBOA-Glc in wheat and DIBOA-Glc in rye. These findings raise interesting questions concerning the evolution of this chemical defense mechanism. Recently, the cloning of a cDNA encoding the rye glucosidase (ScGlu) was reported by Nikus et al. (2003). The primary structure showed about 70% similarity to ZmGlu1 for which the preferred substrate is DIMBOA-Glc. Four hydrophobic amino acids (Trp-378, Phe-198, Phe-205, and Phe-466) of the maize glucosidase sandwich the aromatic aglycone moiety of the substrate (Czjzek et al., 2000, 2001; Zouhar et al., 2001) and are thought to be essential for the recognition of the aglycone moiety. Furthermore, Ala-467 was demonstrated to make contact with the methoxy group of DIMBOA-Glc. While Trp-378 and Phe-198 are conserved in rye at the corresponding positions, Phe-205, Phe-466, and Ala-467 are substituted by His, Gly, and Ser, respectively. Nikus et al. (2003) proposed that these substitutions allow the rye glucosidase to accept DIBOA-Glc as the preferred substrate. Because wheat is closely related to rye, the primary amino acid sequence of the wheat glucosidase is likely to resemble that of the rye enzyme. Indeed, the N-terminal sequences of the two enzymes are very similar to one other. Thus, the wheat glucosidase, with the preferred substrate DIMBOA-Glc, is an excellent target to investigate the structural factors that determine substrate preference between DIMBOA-Glc and DIBOA-Glc. The β-d-glucosidases from both plants are thought to exist in an oligomeric form (probably tetrameric or hexameric; Sue et al., 2000a, 2000b). Specifically, the wheat glucosidase comprises 60- and 58-kD subunits and the rye enzyme comprises only 60-kD subunits, as determined by gel filtration and SDS-PAGE analysis. This is different from the maize homolog that is a homodimer of 60-kD subunits. Furthermore, the two subunits in the wheat enzyme are thought to make up several types of heterooligomers by assembling in different ratios, which is observed in a zymogram as multiple activity bands (Sue et al., 2000b). The N-terminal 12 amino acids of both subunits are identical, suggesting they are closely related isozymes. The zymogram of the maize glucosidase from a hybrid line has been shown to display three activity bands caused by two subunits derived from an allele (Stuber et al., 1977). The situation in wheat is more complicated because it has a hexaploid genome and oligomeric glucosidases. Furthermore, the origin of the subunits in the wheat enzyme, their substrate specificity, the actual quaternary structure, and the influence of oligomerization on activity are not yet known. The crystal structures of four family GH1 glucosidases from plants have been solved (Barrett et al., 1995; Burmeister et al., 1997; Czjzek et al., 2001; Verdoucq et al., 2004), and biochemical analysis has shown these enzymes function as dimers. Some bacterial enzymes have shown tetrameric or octameric quaternary structure in the crystal structure (Aguilar et al., 1997; Sanz-Aparicio et al., 1998; Chi et al., 1999; Hakulinen et al., 2000). However, the amino acid sequence of rye glucosidase, which forms an oligomer, shows a higher level of similarity to the plant enzymes than to the bacterial enzymes. Additionally, the monomer-monomer interaction and orientation of the bacterial glucosidases are different from those of the plant enzymes ZmGlu1 and SbDhr1. Thus, it is of interest to elucidate the quaternary structure of wheat glucosidase and investigate the structure-activity relationships of this enzyme. In this study, we cloned three cDNAs encoding the wheat β-glucosidase monomers (TaGlu1a–TaGlu1c) and characterized the hexameric enzymes of wheat and rye. In addition, we crystallized one of the wheat β-d-glucosidases (N-His-tagged TaGlu1b) as a hexamer and investigated the aglycone binding site using site-directed mutagenesis. RESULTS Primary Structure of the Wheat β-d-Glucosidase Three cDNAs encoding wheat β-d-glucosidases were isolated by screening a cDNA library prepared from 48-h-old wheat shoots (cv Chinese Spring [CS]) and designated Taglu1a, Taglu1b, and Taglu1c (supplemental text named as “Cloning of the wheat glucosidases”). Taglu1a, Taglu1b, and Taglu1c comprised open reading frames of 1,710-, 1,710-, and 1,713-bp encoding polypeptides of 569, 569, and 570 amino acids, respectively (Supplemental Fig. 1). The deduced amino acid sequence of TaGlu1a shows 91%, 95%, and 95% identity and 95%, 98%, and 97% similarity to ScGlu, TaGlu1b, and TaGlu1c, respectively. Each TaGlu1 included the N-terminal sequence of the mature protein as confirmed by sequence analysis of the natural wheat glucosidases (Sue et al., 2000b). The predictive programs ChloroP v.1.1 (Emanuelsson et al., 1999) and TargetP v.1.01 (Emanuelsson et al., 2000) indicate that TaGlu1 may possess a signal peptide for a plastid (Supplemental Fig. 1) similar to the known monocot β-d-glucosidases from maize, rye, sorghum, and oats (Avena sativa). Transcript Profiles of the Genes Encoding TaGlu1 Isozymes in Young Wheat In our previous work, we reported that the glucosidase activity changes transiently, peaking 36 to 48 h after imbibition (Sue et al., 2000b). Northern-blot analysis showed that the genes are expressed at a high level 36 to 48 h after imbibition and that expression level then gradually decreases as the plant grows (Fig. 2A Figure 2. Open in new tabDownload slide Expression level of Taglu1 mRNA. A, Northern-blot analysis. Taglu1, probed with Taglu1a; rRNA, ribosomal RNA stained with ethidium bromide. B, Quantification of Taglu1a to Taglu1c mRNA in 48- and 96-h-old wheat shoots (CS and Ak). The data of each growth stage are described as the ratio of each gene to the sum of Taglu1a to Taglu1c. Figure 2. Open in new tabDownload slide Expression level of Taglu1 mRNA. A, Northern-blot analysis. Taglu1, probed with Taglu1a; rRNA, ribosomal RNA stained with ethidium bromide. B, Quantification of Taglu1a to Taglu1c mRNA in 48- and 96-h-old wheat shoots (CS and Ak). The data of each growth stage are described as the ratio of each gene to the sum of Taglu1a to Taglu1c. ). This pattern correlated well with that of glucosidase activity. However, northern-blot analysis could not discriminate between each of the three Taglu1 genes because of the high level of sequence homology. Therefore, the expression of each glucosidase gene was analyzed by quantitative PCR using primers specific to each gene. In this experiment, RNA was also prepared from another bread wheat cultivar, Asakazekomugi (Ak), in addition to CS, to examine whether the expression pattern of each gene is conserved among the two cultivars. In CS, Taglu1a was most highly expressed (67%) in 48-h-old wheat, with Taglu1b expressed at about one-half this level (32%; Fig. 2B). The ratio was slightly different in a 96-h-old plant where Taglu1a expression increased to 85% and Taglu1b decreased to 14%, though the total amount of glucosidase gene expression declined as shown by the northern analysis. In both growth stages, the expression levels of Taglu1c were much less (0.4% and 0.1% in 48- and 96-h-old plants, respectively) than those of other glucosidase genes. These data were supported by the results of the cDNA library screening; 37 clones of Taglu1a, 11 clones of Taglu1b, and only one clone of Taglu1c were obtained. In contrast to the results of CS, Taglu1a and Taglu1b (Fig. 2B) were shown to be expressed in almost equal amounts in 48-h-old Ak (Taglu1a, 51%; Taglu1b, 48%). As the plant grows, however, the expression of Taglu1a and Taglu1b changed to a percentage comparable to that in CS. Similarly to CS, Taglu1c was expressed at a low level in both 48- and 96-h-old AK plants (1.3% and 4.7%, respectively). Oligomeric Structures of the Wheat and Rye Glucosidases The diversity in the theoretical molecular mass of mature TaGlu1a to TaGlu1c (TaGlu1a, 59,155 D; TaGlu1b, 59,099 D; TaGlu1c, 59,245 D) is insufficient to explain the 2-kD disparity between the two subunits observed by SDS-PAGE of the natural wheat glucosidase (Sue et al., 2000b). Thus, we engineered the three wheat glucosidase genes for heterologous expression in Escherichia coli and examined the SDS-PAGE profiles of the respective recombinant proteins. The wheat glucosidases corresponding to the mature enzymes were overexpressed with and without an N-terminal His-tag in the BL21 CodonPlus(DE3)-RIL strain. While the native (without His-tag) TaGlu1a and TaGlu1c had similar mobility by SDS-PAGE, they evidently both migrated faster than TaGlu1b (Fig. 3A Figure 3. Open in new tabDownload slide SDS- and native-PAGE of the wheat and rye glucosidases. A, The glucosidase monomers were analyzed by SDS-PAGE. For the native (without His-tag) glucosidases, the crude E. coli cell extracts were directly subjected to SDS-PAGE. The His-tagged glucosidases were electrophoresed after purification by affinity chromatography on a nickel-charged column. B, The cell extracts were subjected to native-PAGE (on an 8% separating gel for 4 h) and stained with Coomassie Brilliant Blue. The arrowheads indicate the seven types of glucosidase hexamers expressed in coexpression lines. C, The bands with β-glucosidase activity were detected by activity staining. The crude extracts of E. coli cells and 48-h-old shoots were separated under nondenaturing conditions. In each segment, M, a, b, c, a/b, b/c, W, and R indicate marker proteins, TaGlu1a, TaGlu1b, TaGlu1c, coexpressed TaGlu1a and TaGlu1b, coexpressed TaGlu1b and TaGlu1c, wheat shoots, and rye shoots, respectively. Figure 3. Open in new tabDownload slide SDS- and native-PAGE of the wheat and rye glucosidases. A, The glucosidase monomers were analyzed by SDS-PAGE. For the native (without His-tag) glucosidases, the crude E. coli cell extracts were directly subjected to SDS-PAGE. The His-tagged glucosidases were electrophoresed after purification by affinity chromatography on a nickel-charged column. B, The cell extracts were subjected to native-PAGE (on an 8% separating gel for 4 h) and stained with Coomassie Brilliant Blue. The arrowheads indicate the seven types of glucosidase hexamers expressed in coexpression lines. C, The bands with β-glucosidase activity were detected by activity staining. The crude extracts of E. coli cells and 48-h-old shoots were separated under nondenaturing conditions. In each segment, M, a, b, c, a/b, b/c, W, and R indicate marker proteins, TaGlu1a, TaGlu1b, TaGlu1c, coexpressed TaGlu1a and TaGlu1b, coexpressed TaGlu1b and TaGlu1c, wheat shoots, and rye shoots, respectively. ). Their molecular sizes on the gel were estimated as 58 kD (TaGlu1a and TaGlu1c) and 60 kD (TaGlu1b) that correspond to the two bands of the natural glucosidases purified from wheat shoots (Sue et al., 2000b). The same difference in mobility was evident with the N-His-tagged glucosidases (Fig. 3A). The precise Mr of N-His-tagged TaGlu1a and TaGlu1b was determined by mass spectrometry to make sure that they were not digested during extraction. TaGlu1a had a Mr of 64,141.25 ± 3.25 (theoretical Mr: 64,143), while TaGlu1b had a Mr of 64,085.41 ± 3.70 (theoretical Mr: 64,085). In contrast to the SDS-PAGE results, TaGlu1b showed increased mobility on a native-PAGE gel over the other two isozyme (Fig. 3, B and C). The glucosidases overexpressed in E. coli were verified as constituting active homooligomers because the activity-stainable bands were detected on a native-PAGE gel at a similar position to the naturally occurring wheat glucosidase. However, the mobility of TaGlu1b and TaGlu1a (or TaGlu1c) was different (Fig. 3, B and C). To examine if the recombinant glucosidases had an ability to form heterooligomers, we coexpressed TaGlu1b and TaGlu1a (or TaGlu1c) in E. coli. Each glucosidase isozyme was expressed as the native form (without an N-terminal His-tag). As shown in Figure 3, B and C, both coexpression lines (Glu-1a/b and Glu-1b/c) exhibited seven bands when the cell extracts were subjected to native-PAGE. The uppermost and lowermost bands had the same electrophoretic mobility as the homooligomers. The results suggested that the wheat glucosidase monomers can form homo- and heterohexamers because formation of seven types of oligomers from two kinds of subunits can only be achieved when the subunits aggregate in a ratio from 0:6 to 6:0. The rye glucosidase had a similar mobility on a native-PAGE gel to the wheat glucosidase (Fig. 3C). Together with the gel filtration results (Fig. 4B Figure 4. Open in new tabDownload slide Gel filtration of TaGlu1a, ScGlu, and Zm-Glu1a. A, The enzyme solution eluted from the nickel column was further purified by gel filtration using a Superdex 200 column. Each fraction volume was 0.5 mL. The eluted protein was monitored by A280. The two protein peaks correspond to hexamer and monomer. B, Solid line, Gel filtration of ScGlu was performed after affinity chromatography. Two peaks of hexamer and monomer were observed. Dashed line, TaGlu1a (wild type) purified by affinity chromatography followed by gel filtration was dialyzed against HEPES buffer without NaCl and subjected to gel filtration analysis. Only a monomeric protein was detected. Dash-dot line, Zm-TaGlu1a purified by affinity chromatography was further purified by gel filtration. While the hexamer peak was not detected, the dimer and monomer peaks were observed. Figure 4. Open in new tabDownload slide Gel filtration of TaGlu1a, ScGlu, and Zm-Glu1a. A, The enzyme solution eluted from the nickel column was further purified by gel filtration using a Superdex 200 column. Each fraction volume was 0.5 mL. The eluted protein was monitored by A280. The two protein peaks correspond to hexamer and monomer. B, Solid line, Gel filtration of ScGlu was performed after affinity chromatography. Two peaks of hexamer and monomer were observed. Dashed line, TaGlu1a (wild type) purified by affinity chromatography followed by gel filtration was dialyzed against HEPES buffer without NaCl and subjected to gel filtration analysis. Only a monomeric protein was detected. Dash-dot line, Zm-TaGlu1a purified by affinity chromatography was further purified by gel filtration. While the hexamer peak was not detected, the dimer and monomer peaks were observed. ; Sue et al., 2000a), the rye enzyme appeared to form a hexamer. N-His-tagged TaGlu1a expressed in E. coli was purified by metal chelation chromatography to almost complete homogeneity as judged by SDS-PAGE analysis (Fig. 3A). However, two major protein peaks were detected on the chromatogram of gel filtration corresponding to the hexamer and monomer, and only the hexamer peak showed activity with the natural (DIMBOA-Glc) and artificial (pNP-Glc) substrate (Fig. 4A). During the gel filtration experiments, minor peaks corresponding to the dimer and tetramer were occasionally observed (data not shown). Thus, the wheat glucosidase also can exist as a dimer and a tetramer. However, the fractions containing these oligomers, as well as the monomer, showed only slight activity with DIMBOA-Glc. The activity was so low that we could not eliminate the possibility that the activity was derived from minor contamination with the active hexamer. When the purified hexameric glucosidase was dialyzed against 50 mm HEPES without NaCl, the monomer readily formed (Fig. 4B), resulting in loss of activity. Among the isozymes, TaGlu1b showed the highest activity with DIBOA-Glc and DIMBOA-Glc (the kcat/Km values 149 and 4,138 s−1/mm, respectively), while TaGlu1a, the major isozyme in 48-h-old CS, showed the lowest activity (4.5-fold lower than TaGlu1b). The lower activity of TaGlu1a may be caused by instability of the active hexamer, as suggested by gel filtration where the amount of hexameric TaGlu1a was lower than that of TaGlu1b or TaGlu1c (data not shown). Crystal Structure of the Wheat β-d-Glucosidase and the Role of Its N-Terminal Region in Hexamer Formation The structure of TaGlu1b in complex with DIMBOA was determined at 1.8-Å resolution from a crystal that was soaked in the DIMBOA solution. The final refinement statistics are shown in Table I Table I. Refinement statistics R factor = Σ‖Fobs| − Σ|Fcalc‖/Σ|Fobs|. Rfree was calculated with the 5% of reflections set aside randomly throughout the refinement. . TaGlu1b . Space group P4132 Cell dimensions a, b, c (Å) 194.6, 194.6, 194.6 Resolution (Å) 50-1.8 No. reflections 2,667,849 Unique reflections 217,826 Redundancy 12.2 Completeness (%) 98.6 Rwork/Rfree 19.0/20.4 No. atoms Water 620 Luzzati ESD (obs) 0.19 Luzzati Sigma A (obs) 0.13 Luzzati ESD (Rfree) 0.21 Luzzati Sigma A (Rfree) 0.13 Root mean square deviations Bond lengths (Å) 0.005 Bond angles (°) 1.3 . TaGlu1b . Space group P4132 Cell dimensions a, b, c (Å) 194.6, 194.6, 194.6 Resolution (Å) 50-1.8 No. reflections 2,667,849 Unique reflections 217,826 Redundancy 12.2 Completeness (%) 98.6 Rwork/Rfree 19.0/20.4 No. atoms Water 620 Luzzati ESD (obs) 0.19 Luzzati Sigma A (obs) 0.13 Luzzati ESD (Rfree) 0.21 Luzzati Sigma A (Rfree) 0.13 Root mean square deviations Bond lengths (Å) 0.005 Bond angles (°) 1.3 Open in new tab Table I. Refinement statistics R factor = Σ‖Fobs| − Σ|Fcalc‖/Σ|Fobs|. Rfree was calculated with the 5% of reflections set aside randomly throughout the refinement. . TaGlu1b . Space group P4132 Cell dimensions a, b, c (Å) 194.6, 194.6, 194.6 Resolution (Å) 50-1.8 No. reflections 2,667,849 Unique reflections 217,826 Redundancy 12.2 Completeness (%) 98.6 Rwork/Rfree 19.0/20.4 No. atoms Water 620 Luzzati ESD (obs) 0.19 Luzzati Sigma A (obs) 0.13 Luzzati ESD (Rfree) 0.21 Luzzati Sigma A (Rfree) 0.13 Root mean square deviations Bond lengths (Å) 0.005 Bond angles (°) 1.3 . TaGlu1b . Space group P4132 Cell dimensions a, b, c (Å) 194.6, 194.6, 194.6 Resolution (Å) 50-1.8 No. reflections 2,667,849 Unique reflections 217,826 Redundancy 12.2 Completeness (%) 98.6 Rwork/Rfree 19.0/20.4 No. atoms Water 620 Luzzati ESD (obs) 0.19 Luzzati Sigma A (obs) 0.13 Luzzati ESD (Rfree) 0.21 Luzzati Sigma A (Rfree) 0.13 Root mean square deviations Bond lengths (Å) 0.005 Bond angles (°) 1.3 Open in new tab . The overall structure of TaGlu1b was almost the same as those of known β-d-glucosidases, which have classical (β/α)8 barrel folds. The β-strands and α-helices within each β/α repeat were connected by loops at the top barrels. One disulfide bond was observed between Cys-210 and Cys-216, which is conserved among β-glucosidases in sorghum and maize. Although 11 residues at the N terminus and 18 at the C terminus of the mature enzyme were not included in the structure due to the lack of electron density, both termini were modeled three residues longer than those of sorghum or maize structures. The program PROCHECK (Laskowski et al., 1993) was applied to validate the structure; 374 residues were in most favored, 48 in additional allowed, and two (Ala-78 and Trp-463) were in generously allowed regions. When we superposed the TaGlu1b structure with those from sorghum (PDB code 1V03) and maize (PDB code 1E4N), they mostly fitted well, except that slightly different conformations were observed in the region from Val-413 to Pro-422 (loop D; Fig. 5C Figure 5. Open in new tabDownload slide Tertiary and quaternary structures of β-d-glucosidases. A, A ribbon diagram representation of the functional TaGlu1b hexamer. The dimers generated by crystallographic 2-fold symmetry operation are displayed as the same color. B, Side view of A (close-up view of dimer-dimer interface). The N-terminal regions of the monomer are positioned at the dimer-dimer interface. The four residues participating in direct hydrogen bonds to the monomer in the adjacent dimer are shown in yellow. These intermolecular hydrogen bonds are formed between T16 (side chain) and S366 (back bone), K17 (back bone) and Q270 (side chain), K19 (side chain) and S272 (side chain), and Q22 (side chain) and D271 (back bone). C, Superimposition of the back bones of TaGlu1b (magenta), ZmGlu1-E191D (PDB code 1E56; green), and SbDhr1 (PDB code 1V03; cyan). The structures of DIBOA-Glc and dhurrin in the substrate binding pockets of ZmGlu1 mutant and SbDhr1, respectively, are shown. D, Close-up view of the aglycone binding site. The crystal structures of TaGlu1b and ZmGlu1-E191D (PDB code 1E56) and the modeled structure of ScGlu are superimposed. A natural substrate, DIMBOA-Glc, bound to the maize enzyme is also shown. Blue, TaGlu1b; magenta, ScGlu; yellow, ZmGlu1-E191D. Identical residues found in TaGlu1b and ScGlu are shown in black. All of the molecular graphics images were produced using the UCSF Chimera package (Pettersen et al., 2004) from the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco (supported by National Institutes of Health P41 RR–01081). Figure 5. Open in new tabDownload slide Tertiary and quaternary structures of β-d-glucosidases. A, A ribbon diagram representation of the functional TaGlu1b hexamer. The dimers generated by crystallographic 2-fold symmetry operation are displayed as the same color. B, Side view of A (close-up view of dimer-dimer interface). The N-terminal regions of the monomer are positioned at the dimer-dimer interface. The four residues participating in direct hydrogen bonds to the monomer in the adjacent dimer are shown in yellow. These intermolecular hydrogen bonds are formed between T16 (side chain) and S366 (back bone), K17 (back bone) and Q270 (side chain), K19 (side chain) and S272 (side chain), and Q22 (side chain) and D271 (back bone). C, Superimposition of the back bones of TaGlu1b (magenta), ZmGlu1-E191D (PDB code 1E56; green), and SbDhr1 (PDB code 1V03; cyan). The structures of DIBOA-Glc and dhurrin in the substrate binding pockets of ZmGlu1 mutant and SbDhr1, respectively, are shown. D, Close-up view of the aglycone binding site. The crystal structures of TaGlu1b and ZmGlu1-E191D (PDB code 1E56) and the modeled structure of ScGlu are superimposed. A natural substrate, DIMBOA-Glc, bound to the maize enzyme is also shown. Blue, TaGlu1b; magenta, ScGlu; yellow, ZmGlu1-E191D. Identical residues found in TaGlu1b and ScGlu are shown in black. All of the molecular graphics images were produced using the UCSF Chimera package (Pettersen et al., 2004) from the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco (supported by National Institutes of Health P41 RR–01081). ), probably due to low sequence similarity (Supplemental Fig. 1). The electron density of the amino acid chain was clearly observed, whereas that of the aglycone was not defined. This may have been due to interference caused by several glycerol molecules bound in the active site. The binding of a glycerol molecule at the active site was reported for the maize glucosidase Zm-p60.1 (Zouhar et al., 2001). A sulfate ion derived from LiSO4 in the crystallization buffer was also observed, which was fixed by Ser-366 and Asp-271, and Arg-434 of an adjacent subunit as well. Although the asymmetric unit contained one monomer of the enzyme, the symmetrical operation produced a hexamer conformation (Fig. 5A), where the dimer is obtained by the crystallographic 2-fold symmetry operation of the monomer and the hexamer by the 3-fold symmetry operation of the dimer. Both symmetrical operation axes are located perpendicular to each other. If we regard the hexamer as three molecules of dimer (equivalent to the maize and sorghum dimers), the N-terminal region of TaGlu1b is located at the interface between the adjacent dimers. With respect to this region, the subunits are linked by four direct hydrogen bonds (Fig. 5B). To examine the role of this region in the formation of hexameric structure, the N-terminal 25 residues of the mature TaGlu1a and TaGlu1b were replaced with the corresponding residues of ZmGlu1 (Supplemental Fig. 1), considering that this enzyme is known to exist as a dimer. The chimeric glucosidases (Zm-TaGlu1a and Zm-TaGlu1b) completely lost their ability to form a hexamer, which was confirmed by gel filtration chromatography (Fig. 4B). Instead of being hexameric, the dimeric structure was the major component on the chromatogram, suggesting the crucial role of the N-terminal sequence in maintaining the dimer-dimer association. The fractions containing dimeric Zm-TaGlu1a or Zm-TaGlu1b exhibited little activity toward DIMBOA-Glc (data not shown). Site-Directed Mutagenesis of the Substrate Binding Pocket The recombinant enzyme with an N-terminal His-tag displayed an activity comparable to the naturally occurring glucosidase; the Vmax value of the natural wheat glucosidase for DIMBOA-Glc was 4,100 nkat/mg protein (Sue et al., 2000b) and that of the recombinant TaGlu1a was 5,200 nkat/mg protein. We therefore used the N-His-tagged enzyme for site-directed mutagenesis of residues at the substrate binding pocket. The amino acid residues involved in the substrate binding pocket are absolutely conserved among TaGlu1a to TaGlu1c (Supplemental Fig. 1), deduced from the structural and the sequence alignments with ZmGlu1-E191D mutant in complex with DIMBOA-Glc (Czjzek et al., 2000). The data of enzyme activity are shown in Tables II Table II. Kinetic parameters of the TaGlu1s and TaGlu1a mutants The relative efficiency toward each substrate is the percent ratio of the kcat/Km values to that of TaGlu1a. n.d., Not detected. –, Not determined. . DIBOA-Glc . . . . DIMBOA-Glc . . . . pNP-Glc . . . . . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . mm s−1 s−1/mm % mm s−1 s−1/mm % mm s−1 s−1/mm % TaGlu1a 1.40 48.8 34.6 100 0.36 338 939 100 1.85 99.2 53.6 100 TaGlu1b 1.44 214 149 429 0.29 1,201 4,138 441 2.16 128.2 59.3 111 TaGlu1c 1.05 137 131 377 0.39 773 1979 211 1.75 235.9 135 252 E191A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 F198A 27.5 6.7 0.24 0.7 35.7 110 3.1 0.3 9.28 10.7 1.2 2.2 Y378A 6.05 26.4 4.36 12.6 1.19 124 104 11.4 1.33 25.1 18.9 35.3 Y378F 0.41 4.7 11.5 33.2 0.13 85.1 655 69.8 1.23 46.1 37.5 70.0 E407A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 S464F 2.89 84.0 29.1 84.1 1.42 73.3 51.6 5.5 2.31 31.2 13.5 25.2 F471Y 4.21 83.5 19.8 57.2 0.70 192 274 29.2 2.93 54.6 18.6 34.7 . DIBOA-Glc . . . . DIMBOA-Glc . . . . pNP-Glc . . . . . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . mm s−1 s−1/mm % mm s−1 s−1/mm % mm s−1 s−1/mm % TaGlu1a 1.40 48.8 34.6 100 0.36 338 939 100 1.85 99.2 53.6 100 TaGlu1b 1.44 214 149 429 0.29 1,201 4,138 441 2.16 128.2 59.3 111 TaGlu1c 1.05 137 131 377 0.39 773 1979 211 1.75 235.9 135 252 E191A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 F198A 27.5 6.7 0.24 0.7 35.7 110 3.1 0.3 9.28 10.7 1.2 2.2 Y378A 6.05 26.4 4.36 12.6 1.19 124 104 11.4 1.33 25.1 18.9 35.3 Y378F 0.41 4.7 11.5 33.2 0.13 85.1 655 69.8 1.23 46.1 37.5 70.0 E407A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 S464F 2.89 84.0 29.1 84.1 1.42 73.3 51.6 5.5 2.31 31.2 13.5 25.2 F471Y 4.21 83.5 19.8 57.2 0.70 192 274 29.2 2.93 54.6 18.6 34.7 Open in new tab Table II. Kinetic parameters of the TaGlu1s and TaGlu1a mutants The relative efficiency toward each substrate is the percent ratio of the kcat/Km values to that of TaGlu1a. n.d., Not detected. –, Not determined. . DIBOA-Glc . . . . DIMBOA-Glc . . . . pNP-Glc . . . . . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . mm s−1 s−1/mm % mm s−1 s−1/mm % mm s−1 s−1/mm % TaGlu1a 1.40 48.8 34.6 100 0.36 338 939 100 1.85 99.2 53.6 100 TaGlu1b 1.44 214 149 429 0.29 1,201 4,138 441 2.16 128.2 59.3 111 TaGlu1c 1.05 137 131 377 0.39 773 1979 211 1.75 235.9 135 252 E191A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 F198A 27.5 6.7 0.24 0.7 35.7 110 3.1 0.3 9.28 10.7 1.2 2.2 Y378A 6.05 26.4 4.36 12.6 1.19 124 104 11.4 1.33 25.1 18.9 35.3 Y378F 0.41 4.7 11.5 33.2 0.13 85.1 655 69.8 1.23 46.1 37.5 70.0 E407A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 S464F 2.89 84.0 29.1 84.1 1.42 73.3 51.6 5.5 2.31 31.2 13.5 25.2 F471Y 4.21 83.5 19.8 57.2 0.70 192 274 29.2 2.93 54.6 18.6 34.7 . DIBOA-Glc . . . . DIMBOA-Glc . . . . pNP-Glc . . . . . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . mm s−1 s−1/mm % mm s−1 s−1/mm % mm s−1 s−1/mm % TaGlu1a 1.40 48.8 34.6 100 0.36 338 939 100 1.85 99.2 53.6 100 TaGlu1b 1.44 214 149 429 0.29 1,201 4,138 441 2.16 128.2 59.3 111 TaGlu1c 1.05 137 131 377 0.39 773 1979 211 1.75 235.9 135 252 E191A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 F198A 27.5 6.7 0.24 0.7 35.7 110 3.1 0.3 9.28 10.7 1.2 2.2 Y378A 6.05 26.4 4.36 12.6 1.19 124 104 11.4 1.33 25.1 18.9 35.3 Y378F 0.41 4.7 11.5 33.2 0.13 85.1 655 69.8 1.23 46.1 37.5 70.0 E407A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 S464F 2.89 84.0 29.1 84.1 1.42 73.3 51.6 5.5 2.31 31.2 13.5 25.2 F471Y 4.21 83.5 19.8 57.2 0.70 192 274 29.2 2.93 54.6 18.6 34.7 Open in new tab and III Table III. Kinetic parameters of ScGlu and its mutants The relative efficiency toward each substrate is the % ratio of the kcat/Km values to that of ScGlu. n.d., Not detected. . DIBOA-Glc . . . . DIMBOA-Glc . . . . pNP-Glc . . . . . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . mm s−1 s−1/mm % mm s−1 s−1/mm % mm s−1 s−1/mm % ScGlu 0.80 118 148 100 1.30 158 122 100 1.78 22.2 12.5 100 E191A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 F198A 44.1 233 5.28 3.6 39.3 194 4.94 4.1 3.05 8.3 2.72 21.8 Y378A 7.99 1,098 137 92.6 1.42 826 582 477 2.02 102 50.5 404 Y378F 0.52 12.0 23.1 15.6 1.07 195 182 149 1.34 63.0 47.0 376 E407A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 G464F 1.60 54.6 34.1 23.0 5.12 121 23.6 19.3 1.70 25.5 15.0 120 G464S 1.02 86.9 85.2 57.6 0.94 229 244 200 2.89 124 42.9 343 S465L 0.83 46.6 56.1 37.9 0.76 225 296 243 2.03 75.6 37.2 298 G464S/S465L 1.55 83.5 53.9 36.4 0.59 288 488 400 1.38 75.4 54.6 437 F471Y 2.03 243 120 81.1 1.04 321 309 253 0.92 46.3 50.3 402 . DIBOA-Glc . . . . DIMBOA-Glc . . . . pNP-Glc . . . . . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . mm s−1 s−1/mm % mm s−1 s−1/mm % mm s−1 s−1/mm % ScGlu 0.80 118 148 100 1.30 158 122 100 1.78 22.2 12.5 100 E191A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 F198A 44.1 233 5.28 3.6 39.3 194 4.94 4.1 3.05 8.3 2.72 21.8 Y378A 7.99 1,098 137 92.6 1.42 826 582 477 2.02 102 50.5 404 Y378F 0.52 12.0 23.1 15.6 1.07 195 182 149 1.34 63.0 47.0 376 E407A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 G464F 1.60 54.6 34.1 23.0 5.12 121 23.6 19.3 1.70 25.5 15.0 120 G464S 1.02 86.9 85.2 57.6 0.94 229 244 200 2.89 124 42.9 343 S465L 0.83 46.6 56.1 37.9 0.76 225 296 243 2.03 75.6 37.2 298 G464S/S465L 1.55 83.5 53.9 36.4 0.59 288 488 400 1.38 75.4 54.6 437 F471Y 2.03 243 120 81.1 1.04 321 309 253 0.92 46.3 50.3 402 Open in new tab Table III. Kinetic parameters of ScGlu and its mutants The relative efficiency toward each substrate is the % ratio of the kcat/Km values to that of ScGlu. n.d., Not detected. . DIBOA-Glc . . . . DIMBOA-Glc . . . . pNP-Glc . . . . . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . mm s−1 s−1/mm % mm s−1 s−1/mm % mm s−1 s−1/mm % ScGlu 0.80 118 148 100 1.30 158 122 100 1.78 22.2 12.5 100 E191A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 F198A 44.1 233 5.28 3.6 39.3 194 4.94 4.1 3.05 8.3 2.72 21.8 Y378A 7.99 1,098 137 92.6 1.42 826 582 477 2.02 102 50.5 404 Y378F 0.52 12.0 23.1 15.6 1.07 195 182 149 1.34 63.0 47.0 376 E407A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 G464F 1.60 54.6 34.1 23.0 5.12 121 23.6 19.3 1.70 25.5 15.0 120 G464S 1.02 86.9 85.2 57.6 0.94 229 244 200 2.89 124 42.9 343 S465L 0.83 46.6 56.1 37.9 0.76 225 296 243 2.03 75.6 37.2 298 G464S/S465L 1.55 83.5 53.9 36.4 0.59 288 488 400 1.38 75.4 54.6 437 F471Y 2.03 243 120 81.1 1.04 321 309 253 0.92 46.3 50.3 402 . DIBOA-Glc . . . . DIMBOA-Glc . . . . pNP-Glc . . . . . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . Km . kcat . kcat/Km . Relative Efficiency . mm s−1 s−1/mm % mm s−1 s−1/mm % mm s−1 s−1/mm % ScGlu 0.80 118 148 100 1.30 158 122 100 1.78 22.2 12.5 100 E191A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 F198A 44.1 233 5.28 3.6 39.3 194 4.94 4.1 3.05 8.3 2.72 21.8 Y378A 7.99 1,098 137 92.6 1.42 826 582 477 2.02 102 50.5 404 Y378F 0.52 12.0 23.1 15.6 1.07 195 182 149 1.34 63.0 47.0 376 E407A n.d. n.d. – 0 n.d. n.d. – 0 n.d. n.d. – 0 G464F 1.60 54.6 34.1 23.0 5.12 121 23.6 19.3 1.70 25.5 15.0 120 G464S 1.02 86.9 85.2 57.6 0.94 229 244 200 2.89 124 42.9 343 S465L 0.83 46.6 56.1 37.9 0.76 225 296 243 2.03 75.6 37.2 298 G464S/S465L 1.55 83.5 53.9 36.4 0.59 288 488 400 1.38 75.4 54.6 437 F471Y 2.03 243 120 81.1 1.04 321 309 253 0.92 46.3 50.3 402 Open in new tab . The primary structures of the aglycone binding sites of TaGlu1a and ScGlu diverge from each other at the residues Ser-464 and Leu-465 in TaGlu1a and Gly-464 and Ser-465 in ScGlu. While the mutations G464S and S465L of ScGlu decreased the relative efficiency for DIBOA-Glc by 42% and 62%, respectively, they increased the efficiency for DIMBOA-Glc by 100% and 143%, respectively. The effects were enhanced by introduction of the double mutation G464S/S465L. Introduction of Phe at position 464 of both enzymes resulted in decreased efficiency for the natural substrates. The influence was most obvious in TaGlu1a with DIMBOA-Glc (relative efficiency 5.5%). The kcat/Km values of ScGlu-F198A for DIMBOA-Glc and DIBOA-Glc decreased by a factor of 24- to 28-fold as compared with those of wild-type ScGlu. However, the decrease in the kcat/Km value of this mutant was solely due to an increased Km, whereas the kcat value actually increased. TaGlu1a-F198A mutants showed some activity (kcat) toward the natural substrates, although the Km values increased dramatically. The Y378A mutation of ScGlu enhanced kcat/Km values for DIMBOA-Glc and pNP-Glc by 300% to 380%, whereas that of TaGlu1a was lowered to 11% to 35% for all substrates tested. Although the catalytic efficiency (kcat/Km) of the ScGlu mutant for DIBOA-Glc was comparable to that of wild type, both the Km and kcat were about 10-fold greater. Replacing Phe-471 in TaGlu1a with Tyr decreased the catalytic efficiency for all the substrates tested. However, the same mutation on ScGlu increased the kcat for the three substrates by about 100% and decreased Km for DIMBOA-Glc and pNP-Glc. TaGlu1 and ScGlu contain the well-conserved TFNEP and ITNEG motifs at the catalytic center of family GH1 glucosidases, and replacement of either of the two Glu residues in the motifs (Glu-191 and Glu-407, respectively) with Ala resulted in a complete loss of enzyme activity. DISCUSSION Taglu1a and Taglu1b Genes Are Highly Expressed in Young Wheat Shoots We isolated three genes encoding family GH1 glucosidases that are responsible for the hydrolysis of Hx-Glcs. The transcript profile of Taglu1 genes, as analyzed by northern hybridization, agreed with the transient occurrence of glucosidase activity and Hx-Glc occurrence in young plants (Sue et al., 2000b). However, the expression profiles of each gene after 48- and 96-h imbibition do not appear to be synchronized, suggesting that the expression of these three genes is controlled independently of each other. This is similar to the expression profiles of the three homoeologous genes of each of the five Hx biosynthetic genes, which vary between developmental stages of wheat seedlings (Nomura et al., 2005). The mechanism underlying the different expression profiles of the three Taglu1s as well as the biosynthetic genes has not yet been uncovered. The ratio of each Taglu1 gene differs between two wheat cultivars at the same growth stage. At 48 h, CS expressed the Taglu1a gene at a higher ratio than Taglu1b, whereas Ak expressed both genes almost equally. In consideration of the result that TaGlu1a moves more slowly than TaGlu1b on a native-PAGE gel, the higher expression level of Taglu1a over Taglu1b in CS can explain the different zymogram patterns of the two cultivars; the intensity of cathodic bands was stronger than those of the anodic bands in CS, while the intermediate bands were of higher intensity for Ak (Sue et al., 2000b). The purified glucosidase from 48-h-old Ak showed seven bands on a native-PAGE gel. Because TaGlu1c was considered to be expressed at a much lower level than TaGlu1a and TaGlu1b in the plant, the natural glucosidase would mainly exist as homo- and heterooligomers of TaGlu1a and TaGlu1b. The Wheat and Rye Glucosidases Function as a Hexamer The recombinant TaGlu1b showed greater electrophoretic mobility than TaGlu1a and TaGlu1c on a native-PAGE gel. The mobility of proteins in native-PAGE is influenced by the shape, or holding, and charge of the proteins as well as their Mr. The results of ion-spray mass spectrometric analyses of recombinant TaGlu1a and TaGlu1b subunits excluded the possibility of proteolytic digestion during cell disruption. Furthermore, the overall holding of the TaGlu1b must be almost identical with the other isozymes because it shares high homology with TaGlu1a and TaGlu1c with respect to the primary structures and has similar enzyme activity and substrate specificity. Therefore, the greater mobility of TaGlu1b is presumably due to its more substantial net negative charge compared with TaGlu1a and TaGlu1c. This hypothesis is supported by the lower theoretical pI value of TaGlu1b (pI 5.2) in comparison with that of TaGlu1a and TaGlu1c (both with pI 5.8). Nevertheless, the anomalously low mobility of TaGlu1b under denaturing conditions cannot be predicted from the pI value. There may be additional factors that cause the mobility discrepancy by SDS-PAGE. The slight differences in the amino acid composition might result in different migration rates for the protein even under denaturing condition. In the previous study, we demonstrated that the wheat glucosidase exhibits activity as an oligomer (Sue et al., 2000b), but the number of subunits constituting the oligomer remained unclear. These results from coexpression of two TaGlu1 isozymes (TaGlu1a and TaGlu1b) and the corresponding gel filtration analysis have established that TaGlu1 exhibits its activity only as a hexamer. Because wheat and rye glucosidases show similar gel filtration and electrophoretic profiles, it is reasonable to assume that the active form of ScGlu is also a hexamer. Although only three bands were detected in the sample of natural ScGlu on a native-PAGE gel (Fig. 3C), they must comprise a larger number of stacking bands because the rye glucosidase isolated from shoots separated into at least six protein peaks by anion-exchange chromatography (Sue et al., 2000a). Presumably, the rye used in our studies contains multiple isozymes whose electrophoretic mobility is very similar to each other. The maize glucosidase gene (Zmglu1) is known to be a highly polymorphic gene, and a zymogram using hybrid lines shows three bands of β-glucosidase (Stuber et al., 1977) representing two homodimers and one heterodimer consisting of two allozymes. Bread wheat is a hexaploid plant consisting of homoeologous A, B, and D genomes. Therefore, it is not surprising that the three almost identical Taglu1 genes were isolated from a bread wheat cultivar, CS. However, the chromosomal locations of the Taglu1a to Taglu1c genes are unknown at present. Multimeric forms of some plant β-d-glucosidases have been reported previously. Among monocot glucosidases, the β-d-glucosidases from oat (avenacosidase), sorghum (SbDhr1), and maize (ZmGlu1) have about 75% similarity to TaGlu1 and ScGlu. Although the maize glucosidase can form a large complex mediated by an aggregating factor (Esen and Blanchard, 2000), it primarily exists as a dimer. SbDhr1 was demonstrated to exist as a tetramer in the plant (Hösel et al., 1987), but it does not lose activity even when it dissociates into a dimer. The avenacosidase is known to form 300- to 350-kD aggregates and multimers thereof (Gus-Mayer et al., 1994; Kim and Kim, 1998). Gus-Mayer et al. (1994) reported that the avenacosidase (As-p60) dissociates into a dimer by treatment with urea or freeze-thawing accompanied by a loss of activity, although the recombinant T7. Tag-As-Glu2 (an isozyme of As-p60) was shown to form an active dimer (Kim et al., 2000). Thus, TaGlu1 and ScGlu seem to be unique in that they require a hexameric structure to exhibit enzyme activity. On replacing the N-terminal 25 residues with the corresponding region of ZmGlu1, the wheat glucosidase (Zm-TaGlu1a or Zm-TaGlu1b) monomer could not make up a hexamer but a dimer (Fig. 4B). The amino acid sequences of the N-terminal region are least conserved among the family GH1 β-d-glucosidases. This result suggests a novel function of the N-terminal region in maintaining the quaternary structure of the β-d-glucosidases. The dimer of Zm-TaGlu1 easily dissociated into monomers (data not shown), suggesting that the monomer of TaGlu1 interacts very weakly in comparison with the ZmGlu1 or SbDhr1 monomers. These characteristics differ from the monocot family GH1 β-d-glucosidases, where the dimeric forms represent the stable structural units from which the oligo- and multimers are composed. The crystal structure of the TaGlu1b monomer resembles the known structures of the maize and sorghum homologs (Fig. 5C). The two residues in ZmGlu1 (Arg-295 and Asp-342) that were shown to form the intermonomer salt bridges (Czjzek et al., 2001) are conserved in TaGlu1b (Arg-296 and Asp-343). The intramolecular disulfide bond responsible for maintaining the dimeric form of Zm-p60.1 (Zouhar et al., 2001) is also conserved in the wheat glucosidase. Furthermore, the other amino acids mapping to the monomer-monomer interface of ZmGlu1 show high similarity to the corresponding regions of the wheat and rye glucosidases. Thus, the reasons for the weak monomer-monomer interaction within the dimer are unclear at present. F198A Mutation Reduces the Catalytic Efficiency of TaGlu1a and ScGlu The crystal structure of a β-glucosidase responsible for hydrolysis of Hx-Glc was first solved for maize (Czjzek et al., 2000). The aglycone moiety of DIMBOA-Glc was shown to be stabilized by four aromatic side chains: Trp-378 on one side and three Phe residues (Phe-198, Phe-205, and Phe-466) on the other side (Czjzek et al., 2001; Zouhar et al., 2001). Although Trp-378 and Phe-198 are conserved among the five monocot β-d-glucosidases (glucosidases from wheat, rye, maize, sorghum, and oats), Phe-205 and Phe-466 are substituted by a His and Ser, respectively, in TaGlu1 and His and Gly, respectively, in ScGlu. These amino acid substitutions change the electrostatic and spatial environment of the aglycone binding pocket, although substrate recognition and binding must resemble that of ZmGlu1 because all three enzymes favor Hx-Glcs. Assuming the mechanism of substrate recognition is the same for TaGlu1a, ScGlu, and ZmGlu1, the sole aromatic residue (Phe-198) positioned opposite Trp-379 in TaGlu1a and ScGlu is likely to play a significant role in substrate binding. This is supported by the results of the F198A mutants, where the catalytic efficiency (kcat/Km) decreased by more than 95% except for the activity of ScGlu-F198A against pNP-Glc. However, the kcat of ScGlu for DIBOA-Glc and DIMBOA-Glc was enhanced by this mutation, and the kcat of TaGlu1a-F198A toward DIMBOA-Glc was about 30% that of the wild-type protein. These results are somewhat different from those for the maize glucosidase, where replacement of Phe-198 by smaller amino acids caused a more drastic reduction in kcat (Zouhar et al., 2001; Verdoucq et al., 2003). The crystal structure of the ZmGlu1 mutants revealed that the F198V mutation changes the orientation of the side chain of Phe-466, one of the Phe residues constituting the hydrophobic binding pocket, leading to complete loss of activity against pNP-Glc (Verdoucq et al., 2003). In the case of TaGlu1a and ScGlu, the Phe-466 is substituted by Ser and Gly, respectively. Thus, the mutation may alter the environment around the binding pocket to a lesser extent in the wheat and rye glucosidases. Additionally, the Tyr at the entrance to the aglycone binding sites in both glucosidases (Tyr-378) may compensate for the deletion of Phe-198. Bulky Aromatic Side Chains at the Entrance of the Substrate Binding Site Play a Significant Role in TaGlu1a One of the greatest differences around the aglycone binding pocket between the wheat and rye glucosidases and the maize glucosidase is Tyr-378 in TaGlu1 and ScGlu (equivalent to Pro-377 in ZmGlu1). The bulky side chain of Tyr-378 in TaGlu1a and ScGlu may obstruct access to the binding pocket by the substrate. Indeed, we initially assumed that replacing the Tyr with a small residue might increase the activity of the enzymes. Substitution of the Tyr with Ala caused the kcat/Km of ScGlu to increase for DIMBOA-Glc and pNP-Glc by a factor of 4- to 5-fold. However, the efficiency for all substrates was lower for the TaGlu1a-Y378A mutant, suggesting a different role for the Tyr residue in the wheat enzyme. The Ser-464 and Leu-465 residues in TaGlu1a, which are bulkier than the corresponding residues in ScGlu, may require a lid to maintain the substrate in a favorable position. The higher Km values of the Y378A mutants and the lower Km values in the Y378F mutants indicate the significance of an aromatic residue positioned at the entrance of the binding pockets. In the case of the maize glucosidase, the lack of an aromatic residue may still give high activity toward DIMBOA-Glc because the binding pockets, comprising four aromatic side chains, can still hold the aromatic aglycone moiety firmly. The lower kcat values of TaGlu1a- and ScGlu-Y378F mutants compared to those of wild-type proteins may indicate the hydroxy group is necessary for placing the glucosidic bond of the substrate at the optimal position for attack by the catalytic residues. However, a more hydrophobic environment seems to be more favorable for enzyme-substrate binding. The Effects of the Mutations at Positions 464, 465, and 471 This result that the S465L mutation of ScGlu increased the efficiency toward DIMBOA-Glc seems to match with the results of ZmGlu1, where the maize counterpart of the residue is involved in the recognition of the methoxy group in DIMBOA-Glc (Czjzek et al., 2000). However, the single G464S mutation alone could raise the specificity to DIMBOA-Glc. Therefore, amino acid 464 as well as 465 plays an important role in distinguishing DIMBOA-Glc from DIBOA-Glc. The substitutions of Gly-464 and Ser-465 of ScGlu by the TaGlu1 counterparts made the substrate preference of ScGlu resemble that of TaGlu1s. However, the kcat/Km value toward DIMBOA-Glc was about 30 times larger than that toward DIBOA-Glc in TaGlu1a, although the value against DIMBOA-Glc of ScGlu-G464S/S465L mutant was only about 10 times larger than that against DIBOA-Glc. This would suggest that there are other unknown factors in TaGlu1a that contribute to distinguish DIMBOA-Glc from DIBOA-Glc. In ZmGlu1, Tyr-473 was shown to form a hydrogen bond with Trp-378, and the mutation of Tyr-473 into Phe caused an increase in efficiency (Verdoucq et al., 2003). The authors suggested that the loss of the hydrogen bond increased the flexibility of Trp-378, resulting in an enhanced catalytic efficiency. Replacing the corresponding residue of TaGlu1a, Phe-471, into Tyr resulted in a decrease in efficiency, which agrees with the results of ZmGlu1. However, the same mutation in ScGlu enhanced the efficiency toward DIMBOA-Glc and pNP-Glc, while it decreased the efficiency toward DIBOA-Glc. It is notable that every ScGlu mutant that showed a higher efficiency toward DIMBOA-Glc than the wild type exhibited an increased value toward pNP-Glc but not toward DIBOA-Glc, although the detailed mechanisms are not known at present. Because none of the mutations introduced in this study enhanced the efficiency of TaGlu1a, a more precise organization of the substrate binding site is likely to be required to recognize and hydrolyze the substrates in TaGlu1a by comparison with ScGlu. The broader substrate specificity of ScGlu may be due to the wider aglycone binding site compared with those of ZmGlu1 and TaGlu1a. Further structural analyses of TaGlu1 and ScGlu crystals will allow elucidation of the details of substrate recognition and organization of the active hexamers. MATERIALS AND METHODS Plant Materials Two cultivars of bread wheat (Triticum aestivum), cv Ak and cv CS, and rye (Secale cereale) were grown at 25°C as described previously (Sue et al., 2000a, 2000b). Northern-Blot Analysis and Quantitative-PCR Total RNA was isolated from wheat shoots (Ak) using RNeasy Plant Mini kit (QIAGEN). The RNA (15 μg) was separated on a 1.2% agarose MOPS-formaldehyde gel and transferred to a nylon membrane, Hybond N+ (Amersham Biosciences). The membrane was probed with an [α-32P]-labeled cDNA fragment from the 3′-RACE (see supplemental text) as described by Church and Gilbert (1984). The hybridization and washing procedures were carried out at 65°C. The membrane was autoradiographed with Imaging Plate and analyzed by a bio-imaging analyzer, BAS2000 (Fujifilm). The primers used for quantitative PCR are shown in Supplemental Table I. Total RNA was isolated from 48- and 96-h-old wheat shoots (Ak and CS), and cDNA was synthesized using 1 μg of total RNA and Omniscript reverse transcriptase (QIAGEN) by priming with oligo(dT) primer. Prior to performing real-time PCR, the sequences of the DNA fragments amplified with the above primer sets were confirmed to be identical to the corresponding cDNA sequences. The cDNA (2 μL of 1/50 dilution of the RT sample) was used for real-time PCR on a LightCycler with LightCycler FastStart DNA Master SYBR Green kit (Roche). Each sample was quantified at least three times with respect to standard DNA (ranging from 102 to 106 copies/reaction tube) quantified under the same conditions. An annealing temperature of 64°C was employed. The PCR buffer contained 3 mm MgCl2. All other PCR conditions were set according to the manufacturer's instructions. The purity of the DNA products was confirmed by agarose gel electrophoreses after every real-time PCR reaction. Expression of the β-d-Glucosidases in Escherichia coli The full-length cDNAs (Taglu1a–Taglu1c) were used as template for PCR to prepare DNA fragments flanked by NcoI and XhoI recognition sites at the 5′ and 3′ ends, respectively. The sequences of the primers used in the PCR are shown in Supplemental Table I. Scglu was amplified by RT-PCR using total RNA prepared from 48-h-old rye shoots (cv Haru-ichiban) with the primers given in Supplemental Table I. The primer for Scglu were designed based on the sequence of the rye glucosidase (GenBank accession no. AF293849) reported by Nikus et al. (2003). The reaction was carried out utilizing KOD polymerase (TOYOBO) with denaturing at 98°C, annealing at 58°C, and polymerization at 74°C. The amplified DNA fragments corresponding to the mature glucosidases were digested by NcoI and XhoI and then cloned into pET21d or pET30a. By introducing the DNA fragments into the NcoI and XhoI sites of pET21d and pET30a, we obtained plasmids for native (without His-tag) and N-terminal His-tagged glucosidases, respectively. The ligation products were transformed into BL21 CondonPlus(DE3)-RIL (Stratagene) competent cells. For coexpression of the glucosidase genes, Taglu1a (or Taglu1c) was cloned into pET21d and Taglu1b into pCDFDuet-1 (Novagen). Both plasmids were then introduced into the BL21-CodonPlus(DE3)-RIL strain. The E. coli was cultured in 50 mL of Luria-Bertani broth supplemented with appropriate antibiotics at 37°C with shaking until the OD600 reached approximately 0.5. Heterologous gene expression was then induced by adding 1 mm isopropylthio-β-galactoside followed by an overnight culture at 20°C. The cells were pelleted by centrifugation (1,500g for 15 min) and then resuspended in 5 mL of 50 mm HEPES, pH 7.2, containing a protease inhibitor cocktail (Sigma). The cells were disrupted by sonication on ice (several 20-s pulses at a power setting of 100 W). The soluble protein fraction was recovered by collecting the supernatant after centrifugation at 15,000g for 15 min. The recombinant His-tagged protein was purified by metal chelation chromatography. The soluble fraction was applied to a HiTrap Chelating HP column (Amersham Biosciences) charged with Ni2+ and equilibrated in 0.02 m phosphate buffer, pH 7.4, containing 0.5 m NaCl and 10 mm imidazole. After washing the column with the same buffer containing 0.5 m NaCl and 60 mm imidazole, the glucosidase was eluted by increasing the concentration of imidazole to 300 mm. The eluate was concentrated by ultracentrifugation and then subjected to gel filtration chromatography on Superdex 200 (Amersham Biosciences) equilibrated with 50 mm HEPES and 150 mm NaCl, pH 7.2. To estimate the molecular mass on the gel filtration column, the following proteins were used as standards: ferritin (440 kD), human IgG (160 kD), transferrin (81 kD), ovalbumin (43 kD), and myoglobin (17.6 kD). Electrophoresis and Activity Staining The protein profile was analyzed by SDS-PAGE and native-PAGE using an 8% gel as described previously (Sue et al., 2000b). The bands corresponding to β-glucosidase were detected on a native-PAGE gel using a chromogenic substrate, 6-bromo-2-naphthyl-β-d-glucopyranoside, as described previously (Sue et al., 2000b). Mass Analyses of TaGlu1a and TaGlu1b The purified N-His-tagged TaGlu1a and TaGlu1b were concentrated by ultrafiltration, followed by dilution with MilliQ water to reduce the salt concentration in the buffer. After several rounds of concentration and dilution, acetonitrile and formic acid were added to give a final concentration of 20% and 0.1%, respectively. The protein concentration was adjusted to 1 mg/mL. The molecular masses of the glucosidases were measured using a Perkin-Elmer-Sciex API-165 (ion-spray voltage 5 kV, orifice voltage 30 V, nebulizer gas N2, curtain gas N2). The theoretical Mr and pI were determined by the Compute pI/Mw program on the ExPASy server (http://kr.expasy.org/tools/). Structure Determination The recombinant β-glucosidase was expressed, purified, and crystallized as described previously (Sue et al., 2005). To obtain the complex of TaGlu1b and its substrate aglycone, DIMBOA, crystals were soaked in the crystallization buffer with 0.5 mm DIMBOA and 30% glycerol as a cryoprotectant for 15 min and then cooled in a nitrogen stream at 100 K. The diffraction data set was collected on beamline BL-6A at Photon Factory and processed by the program HKL2000 (Otwinowski and Minor, 1997). The initial model was obtained by the molecular replacement method using the program MOLREP (Vagin and Teplyakov, 1997) and the β-glucosidase molecule from sorghum (Sorghum bicolor; PDB code; 1v03; Verdoucq et al., 2004) as the search model. The crystal belonged to the space group P4132 and contains one monomer in an asymmetric unit. The iterative refinement was performed using the programs CNS (Brunger et al., 1998) and XtalView (McRee, 1992). Site-Directed Mutagenesis of His-Tagged TaGlu1a and ScGlu Mutated DNA fragments for expression of TaGlu1a and ScGlu mutants were prepared by PCR-mediated overlap extension. The sequences of the mutagenic PCR primers used in this study are shown in Supplemental Table I. The DNA fragments for Zm-TaGlu1a and Zm-TaGlu1b were amplified by PCR using the primers shown in the Supplemental Table I. The amplified fragments were digested by NcoI and XhoI, introduced into pET30a, expressed in E. coli, and purified by affinity chromatography followed by gel filtration chromatography as described above. Substrate Preparation and Enzyme Assay DIBOA-Glc and DIMBOA-Glc were isolated from shoots of 48-h-old rye and maize (Zea mays), respectively, according to methods described previously (Sue et al., 2000b). Enzyme activity was measured in 100 mm citrate-200 mm phosphate buffer, pH 5.5, at 30°C. The product was quantified by HPLC (eluent, 30% [v/v] methanol containing 0.1% [v/v] acetic acid; column, Inertsil ODS-3 [4.6 × 150 mm; GL Sciences]; temperature, 40°C). The amount of protein and reaction time were carefully chosen so that the product did not exceed 5% to 10% of the residual substrate. Kinetic parameters were determined from several independent experiments. The preliminary Km and kcat values were determined from reaction rates at various concentrations of substrates ranging from 0.02 mm to 10 mm. Precise kinetic parameters were subsequently obtained by varying the substrate concentration from 20% to 200% of the preliminary Km value. Km and kcat values were calculated by fitting the data to the Michaelis-Menten equation using SigmaPlot 2000 (SYSTAT Software). The protein concentration was measured by the method of Bradford (Bradford, 1976) using bovine serum albumin as a standard. Modeling of ScGlu The three-dimensional model of ScGlu was calculated by homology modeling with MODELLER6v2 program (Sali and Blundell, 1993; Marti-Renom et al., 2000) using the structures of TaGlu1b and ZmGlu1 (PDB code 1E56). First, the primary structures of ScGlu, TaGlu1b, and ZmGlu1 were aligned using ClustalW, and the result was checked based on the actual (TaGlu1b and ZmGlu1) or predicted (ScGlu) secondary structures. Then, five models were calculated by MODELLER6v2 using the above sequence alignment. Since all structures obtained were almost identical to each other, one model was employed as a predicted structure of ScGlu. The model was evaluated by PROCHECK. Although one residue (Lys-251) was in a disallowed region, it was located on the surface of the protein and far from the substrate binding pocket. Sequence data from this article can be found in the DDBJ/EMBL/GenBank data libraries under the accession numbers AB100035 (Taglu1a), AB236422 (Taglu1b), and AB236423 (Taglu1c). The atomic coordinates of TaGlu1b were deposited in the Protein Data Bank under the code 2DGA. ACKNOWLEDGMENTS We thank Dr. Atsushi Ishihara (Division of Applied Life Sciences, Kyoto University) for the mass analyses of the wheat glucosidases. LITERATURE CITED Aguilar CF, Sanderson I, Moracci M, Ciaramella M, Nucci R, Rossi M, Pearl LH ( 1997 ) Crystal structure of the β-glycosidase from the hyperthermophilic archeon Sulfolobus solfataricus: resilience as a key factor in thermostability. J Mol Biol 271 : 789 –802 Barrett T, Suresh CG, Tolley SP, Dodson EJ, Hughes MA ( 1995 ) The crystal structure of a cyanogenic β-glucosidase from white clover, a family 1 glycosyl hydrolase. Structure 3 : 951 –960 Bradford MM ( 1976 ) A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem 72 : 248 –254 Brunger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, et al ( 1998 ) Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr D 54 : 905 –921 Brzobohatý B, Moore I, Kristoffersen P, Bako L, Campos N, Schell J, Palme K ( 1993 ) Release of active cytokinin by a β-glucosidase localized to the maize root meristem. Science 262 : 1051 –1054 Burmeister WP, Cottaz S, Driguez H, Iori R, Palmieri S, Henrissat B ( 1997 ) The crystal structures of Sinapis alba myrosinase and a covalent glycosyl-enzyme intermediate provide insights into the substrate recognition and active-site machinery of an S-glycosidase. Structure 5 : 663 –675 Chi YI, Martinez-Cruz LA, Jancarik J, Swanson RV, Robertson DE, Kim SH ( 1999 ) Crystal structure of the β-glycosidase from the hyperthermophile Thermosphaera aggregans: insights into its activity and thermostability. FEBS Lett 445 : 375 –383 Church GM, Gilbert W ( 1984 ) Genomic sequencing. Proc Natl Acad Sci USA 81 : 1991 –1995 Cicek M, Esen A ( 1998 ) Structure and expression of a dhurrinase (β-glucosidase) from sorghum. Plant Physiol 116 : 1469 –1478 Czjzek M, Cicek M, Zamboni V, Bevan DR, Henrissat B, Esen A ( 2000 ) The mechanism of substrate (aglycone) specificity in β-glucosidases is revealed by crystal structures of mutant maize β-glucosidase-DIMBOA, -DIMBOAGlc, and -dhurrin complexes. Proc Natl Acad Sci USA 97 : 13555 –13560 Czjzek M, Cicek M, Zamboni V, Burmeister WP, Bevan DR, Henrissat B, Esen A ( 2001 ) Crystal structure of a monocotyledon (maize ZMGlu1) β-glucosidase and a model of its complex with p-nitrophenyl β-D-thioglucoside. Biochem J 354 : 37 –46 Dharmawardhana DP, Ellis BE, Carlson JE ( 1995 ) A β-glucosidase from lodgepole pine xylem specific for the lignin precursor coniferin. Plant Physiol 107 : 331 –339 Emanuelsson O, Nielsen H, Brunak S, von Heijne G ( 2000 ) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300 : 1005 –1016 Emanuelsson O, Nielsen H, von Heijne G ( 1999 ) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci 8 : 978 –984 Esen A, Blanchard DJ ( 2000 ) A specific β-glucosidase-aggregating factor is responsible for the β-glucosidase null phenotype in maize. Plant Physiol 122 : 563 –572 Falk A, Rask L ( 1995 ) Expression of a zeatin-O-glucoside-degrading β-glucosidase in Brassica napus. Plant Physiol 108 : 1369 –1377 Gus-Mayer S, Brunner H, Schneider-Poetsch HA, Rudiger W ( 1994 ) Avenacosidase from oat: purification, sequence analysis and biochemical characterization of a new member of the BGA family of β-glucosidases. Plant Mol Biol 26 : 909 –921 Haberer G, Kieber JJ ( 2002 ) Cytokinins. New insights into a classic phytohormone. Plant Physiol 128 : 354 –362 Hakulinen N, Paavilainen S, Korpela T, Rouvinen J ( 2000 ) The crystal structure of β-glucosidase from Bacillus circulans sp. alkalophilus: ability to form long polymeric assemblies. J Struct Biol 129 : 69 –79 Hösel W, Tober I, Eklund SH, Conn EE ( 1987 ) Characterization of β-glucosidases with high specificity for the cyanogenic glucoside dhurrin in Sorghum bicolor (L.) moench seedlings. Arch Biochem Biophys 252 : 152 –162 Kim YW, Kang KS, Kim SY, Kim IS ( 2000 ) Formation of fibrillar multimers of oat β-glucosidase isoenzymes is mediated by the As-Glu1 monomer. J Mol Biol 303 : 831 –842 Kim YW, Kim IS ( 1998 ) Subunit composition and oligomer stability of oat β-glucosidase isozymes. Biochim Biophys Acta 1388 : 457 –464 Laskowski RA, MacArthur MW, Moss DS, Thornton JM ( 1993 ) PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 26 : 283 –291 Ljung K, Ostin A, Lioussanne L, Sandberg G ( 2001 ) Developmental regulation of indole-3-acetic acid turnover in Scots pine seedlings. Plant Physiol 125 : 464 –475 Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A ( 2000 ) Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 29 : 291 –325 McRee DE ( 1992 ) A visual protein crystallographic software system for X11/Xview. J Mol Graph 10 : 44 –46 Niemeyer H ( 1988 ) Hydroxamic acids (4-hydroxy-1,4-benzoxazin-3-ones), defence chemicals in the Gramineae. Phytochemistry 11 : 3349 –3358 Nikus J, Esen A, Jonsson LMV ( 2003 ) Cloning of a plastidic rye (Secale cereale) β-glucosidase cDNA and its expression in Escherichia coli. Physiol Plant 118 : 337 –345 Nomura T, Ishihara A, Yanagita RC, Endo TR, Iwamura H ( 2005 ) Three genomes differentially contribute to the biosynthesis of benzoxazinones in hexaploid wheat. Proc Natl Acad Sci USA 102 : 16490 –16495 Otwinowski Z, Minor W ( 1997 ) Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol 276 : 307 –326 Persans MW, Wang J, Schuler MA ( 2001 ) Characterization of maize cytochrome P450 monooxygenases induced in response to safeners and bacterial pathogens. Plant Physiol 125 : 1126 –1138 Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE ( 2004 ) UCSF Chimera: a visualization system for exploratory research and analysis. J Comput Chem 25 : 1605 –1612 Sali A, Blundell TL ( 1993 ) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234 : 779 –815 Sanz-Aparicio J, Hermoso JA, Martinez-Ripoll M, Lequerica JL, Polaina J ( 1998 ) Crystal structure of β-glucosidase A from Bacillus polymyxa: insights into the catalytic activity in family 1 glycosyl hydrolases. J Mol Biol 275 : 491 –502 Sicker D, Frey M, Schulz M, Gierl A ( 2000 ) Role of natural benzoxazinones in the survival strategy of plants. Int Rev Cytol 198 : 319 –346 Stuber CW, Goodman MM, Johnson FM ( 1977 ) Genetic control and racial variation of β-glucosidase isozymes in maize (Zea mays L.). Biochem Genet 15 : 383 –394 Sue M, Ishihara A, Iwamura H ( 2000 a) Purification and characterization of a β-glucosidase from rye (Secale cereale L.) seedlings. Plant Sci 155 : 67 –74 Sue M, Ishihara A, Iwamura H ( 2000 b) Purification and characterization of a hydroxamic acid glucoside β-glucosidase from wheat (Triticum aestivum L.) seedlings. Planta 210 : 432 –438 Sue M, Yamazaki K, Kouyama J-i, Sasaki Y, Ohsawa K, Miyamoto T, Iwamura H, Yajima S ( 2005 ) Purification, crystallization and preliminary X-ray analysis of a hexameric β-glucosidase from wheat. Acta Crystallogr Sect F Struct Biol Cryst Commun 61 : 864 –866 Vagin A, Teplyakov A ( 1997 ) MOLREP: an automated program for molecular replacement. J Appl Crystallogr 30 : 1022 –1025 Verdoucq L, Czjzek M, Moriniere J, Bevan DR, Esen A ( 2003 ) Mutational and structural analysis of aglycone specificity in maize and sorghum β-glucosidases. J Biol Chem 278 : 25055 –25062 Verdoucq L, Moriniere J, Bevan DR, Esen A, Vasella A, Henrissat B, Czjzek M ( 2004 ) Structural determinants of substrate specificity in family 1 β-glucosidases: novel insights from the crystal structure of sorghum dhurrinase-1, a plant β-glucosidase with strict specificity, in complex with its natural substrate. J Biol Chem 279 : 31796 –31803 Zagrobelny M, Bak S, Rasmussen AV, Jorgensen B, Naumann CM, Moller BL ( 2004 ) Cyanogenic glucosides and plant-insect interactions. Phytochemistry 65 : 293 –306 Zouhar J, Vévodová J, Marek J, Damborský J, Su XD, Brzobahatý BE ( 2001 ) Insights into the functional architecture of the catalytic center of a maize β-glucosidase Zm-p60.1. Plant Physiol 127 : 973 –985 Author notes * Corresponding author; e-mail [email protected]; fax 81–3–5477–2619. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Masayuki Sue ([email protected]). [W] The online version of this article contains Web-only data. Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.106.077693. © 2006 American Society of Plant Biologists This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Showing 1 to 10 of 52 Articles

Articles per page

Abstract The basic/helix-loop-helix (bHLH) transcription factors and their homologs form a large family in plant and animal genomes. They are known to play important roles in the specification of tissue types in animals. On the other hand, few plant bHLH proteins have been studied functionally. Recent completion of whole genome sequences of model plants Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) allows genome-wide analysis and comparison of the bHLH family in flowering plants. We have identified 167 bHLH genes in the rice genome, and their phylogenetic analysis indicates that they form well-supported clades, which are defined as subfamilies. In addition, sequence analysis of potential DNA-binding activity, the sequence motifs outside the bHLH domain, and the conservation of intron/exon structural patterns further support the evolutionary relationships among these proteins. The genome distribution of rice bHLH genes strongly supports the hypothesis that genome-wide and tandem duplication contributed to the expansion of the bHLH gene family, consistent with the birth-and-death theory of gene family evolution. Bioinformatics analysis suggests that rice bHLH proteins can potentially participate in a variety of combinatorial interactions, endowing them with the capacity to regulate a multitude of transcriptional programs. In addition, similar expression patterns suggest functional conservation between some rice bHLH genes and their close Arabidopsis homologs. Since the discovery of the basic/helix-loop-helix (bHLH) motif with DNA-binding and dimerization capabilities (Murre et al., 1989), members of bHLH protein superfamily have been found to have an ever increasing number of functions in essential physiological and developmental processes in animals and, to a lesser extent, plants (Quail, 2000; Ledent and Vervoort, 2001; Toledo-Ortiz et al., 2003; Sonnenfeld et al., 2005). The bHLH domain contains approximately 60 amino acids, with two functionally distinctive regions, the basic region and the HLH region. The basic region is located at the N terminus of the bHLH domain and functions as a DNA-binding motif. It consists of approximately 15 amino acids, which typically include six basic residues (Atchley et al., 1999). The HLH region contains two amphipathic α helices with a linking loop of variable lengths; the amphipathic α helices of two bHLH proteins can interact, allowing the formation of homodimers or heterodimers (Murre et al., 1989; Ellenberger et al., 1994; Nesi et al., 2000). Some bHLH proteins have been shown to bind to sequences containing a consensus core element called the E box (5′-CANNTG-3′), with the G box (5′-CACGTG-3′) being the most common form. In addition, the nucleotides flanking the core element may also have a role in binding specificity (Atchley et al., 1999; Martinez-Garcia et al., 2000; Massari and Murre, 2000; Robinson et al., 2000). According to their phylogenetic relationships, DNA-binding motifs, and functional properties, known bHLH proteins from animals have been divided into six main groups (named as group A to F; Atchley and Fitch, 1997). Group A bHLH proteins include Atonal, D, Delilah, dHand, E12, Hen, Lyl, MyoD, and Twist; they can bind to the E-box sequence CAGCTG. In group B, a number of proteins have diverse functions and bind to the G-box sequence CACGTG; examples of this group include Max, Myc, MITF, SREBP, and USF (Henriksson and Luscher, 1996; Facchini and Penn, 1998; Goding, 2000). Members of Group C contain an additional protein-protein interaction region (the PAS domain) and bind to sequences (NACGTG or NGCGTG) that are unlike the E box. Proteins in group D have the HLH region but lack the basic region; they can form heterodimers with bHLH proteins, thus are functionally related to typical bHLH proteins (Sun et al., 1991). Group E includes E(spl), Gridlock, Hairy, and Hey (Ledent and Vervoort, 2001); these proteins have Pro or Gly residues within the basic region and can bind to the sequence CACGNG preferentially (Fisher and Caudy, 1998; Steidl et al., 2000). Group F consists of COE-bHLH proteins; they have divergent sequences compared with other groups and another domain for dimerization and DNA binding (Crozatier et al., 1996; Fisher and Caudy, 1998; Ledent and Vervoort, 2001). Compared to animals, only a small number of plant bHLH proteins have been characterized functionally. In Arabidopsis (Arabidopsis thaliana), a model for flowering plants (particularly eudicots), 162 bHLH-encoding genes have been identified from the analysis of genome sequences (Bailey et al., 2003; Heim et al., 2003; Toledo-Ortiz et al., 2003). The members of bHLH family of Arabidopsis have been divided into 21 subfamilies by Toledo-Ortiz et al. (2003). Buck and Atchley (2003) also did the phylogenetic analysis of bHLH family in plants. Their study found that a total number of 295 bHLH genes, including 118 from the Arabidopsis, 131 from the rice (Oryza sativa), and 46 from other plants, could be grouped into 15 separate clades. Sequence analysis suggests that most of the plant bHLH proteins belonged to Group B (Atchley and Fitch, 1997). In addition, recent reports have demonstrated that some plant bHLH proteins can interact with proteins that lack a bHLH domain. In particular, protein complexes with MYB, bHLH, and WD40 proteins were proposed to regulate guard cell and root hair differentiation (Ramsay and Glover, 2005). Rice is one of the most important food crops in the world and it has been used as a major model species in plant (especially monocot) functional genomics research because of its relatively small genome size (approximately 390 Mb) and synteny with other cereal genomes (Gale and Devos, 1998). More than 95% of the rice (japonica cultivar Nipponbare) genome has been sequenced by the International Rice Genome Sequencing Project (http://rgp.dna.affrc.go.jp/cgi-bin/statusdb/irgsp-status.cgi; data from April, 2006). The rice bHLH gene family (OsbHLH genes) has not been analyzed in detail, and the phylogenetic relationship with other plant bHLH genes remains poorly understood. In this study, we identified 167 OsbHLH genes from the rice genomic sequence and carried out phylogenetic analyses to understand the relationships among these rice genes. Furthermore, we identified some of the duplication events that likely contributed to the expansion of the bHLH family. Phylogenetic analysis of rice and Arabidopsis bHLH genes allowed the identification of both shared and specific subfamilies and estimated the number of bHLH genes in the most recent common ancestor (MRCA) of rice and Arabidopsis, as well as potential gene birth-and-death events. Moreover, analysis of intron number and location provides evidence for numerous independent intron loss events in the bHLH family. Finally, expression studies indicate that bHLH proteins exhibit a variety of expression patterns, suggesting diverse functions. RESULTS AND DISCUSSION Identification of 167 OsbHLH Genes To obtain sequences of bHLH genes in the rice genome, we used the criteria developed by Atchley et al. (1999) to define a bHLH protein. Briefly, the bHLH motif contains 19 conserved amino acids, five amino acids in the basic region, five in the first helix, one in the loop, and eight in the second helix (Atchley et al., 1999). Using TBLASTN against the rice genome database, we obtained all putative bHLH proteins that had more than 11 conserved amino acids among the 19 residues. In addition, because members of group C and group D do not have the typical basic region (Murre et al., 1994; Swanson et al., 1995; Bailey et al., 2003) and can interact with some bHLH proteins (Sun et al., 1991), we identified putative HLH proteins that had nine or 10 amino acids that matched the 14 amino acids in the HLH region. Another family called TCP family also has a bHLH structure (Kosugi and Ohashi, 1997), but the structure and DNA-binding specificity of the TCP motif are different from those of the bHLH motif. Therefore, the TCP family is not studied in this article. Initially, we used the bHLH domain (64 amino acids) encoded by a putative rice bHLH gene (GeneBank number XM_463907) as a BLAST query to identify a large number of candidate bHLH sequences in The Institute for Genomic Research (TIGR) database, because this sequence fit the bHLH motif best among the known rice bHLH proteins. Because of the sequence variation among known bHLH domains, to detect additional possible bHLH domain sequences we used position-specific iterated BLAST to search the database of TIGR (version 4, 2006). Subsequently, TBLASTN was used to remove redundant sequences of candidate bHLH genes according to their corresponding sequencing bacterial artificial chromosome clone serial numbers and their chromosome locations, resulting in 167 OsbHLH genes (Table I Table I. LOC (location) number of OsbHLH members The OsbHLH proteins identified in this study are listed according to their OsbHLH numbers determined by the multiple sequence alignment in Supplemental Figure 1. The Osg numbers are designated according to the system of TIGR. *, The OsbHLH001 is different from the OsbHLH1 (OsbHLH062 in this study), which is identified by Wang et al. (2003). **, The OsbHLH026 protein has two bHLH domains. OsbHLH No. . LOC No. . OsbHLH No. . LOC No. . OsbHLH No. . LOC No. . OsbHLH001* LOC_Os01g70310 OsbHLH057 LOC_Os07g35870 OsbHLH113 LOC_Os10g40740 OsbHLH002 LOC_Os11g32100 OsbHLH058 LOC_Os05g38140 OsbHLH114 LOC_Os02g55250 OsbHLH003 LOC_Os03g04310 OsbHLH059 LOC_Os02g02480 OsbHLH115 LOC_Os06g08500 OsbHLH004 LOC_Os10g39750 OsbHLH060 LOC_Os08g04390 OsbHLH116 LOC_Os12g40730 OsbHLH005 LOC_Os02g02820 OsbHLH061 LOC_Os11g38870 OsbHLH117 LOC_Os01g38610 OsbHLH006 LOC_Os04g23550 OsbHLH062 LOC_Os07g43530 OsbHLH118 LOC_Os01g51140 OsbHLH007 LOC_Os04g23440 OsbHLH063 LOC_Os03g26210 OsbHLH119 LOC_Os05g46370 OsbHLH008 LOC_Os01g13460 OsbHLH064 LOC_Os02g23823 OsbHLH120 LOC_Os09g28210 OsbHLH009 LOC_Os10g42430 OsbHLH065 LOC_Os04g41570 OsbHLH121 LOC_Os08g36740 OsbHLH010 LOC_Os01g50940 OsbHLH066 LOC_Os03g55220 OsbHLH122 LOC_Os06g10820 OsbHLH011 LOC_Os08g43070 OsbHLH067 LOC_Os05g42180 OsbHLH123 LOC_Os01g61480 OsbHLH012 LOC_Os01g39480 OsbHLH068 LOC_Os04g53990 OsbHLH124 LOC_Os08g01700 OsbHLH013 LOC_Os04g47080 OsbHLH069 LOC_Os01g57580 OsbHLH125 LOC_Os01g02110 OsbHLH014 LOC_Os11g15210 OsbHLH070 LOC_Os08g08160 OsbHLH126 LOC_Os02g48060 OsbHLH015 LOC_Os04g47040 OsbHLH071 LOC_Os01g01600 OsbHLH127 LOC_Os06g30090 OsbHLH016 LOC_Os04g47059 OsbHLH072 LOC_Os02g17680 OsbHLH128 LOC_Os07g39940 OsbHLH017 LOC_Os07g11020 OsbHLH073 LOC_Os05g14010 OsbHLH129 LOC_Os03g10770 OsbHLH018 LOC_Os03g51580 OsbHLH074 LOC_Os01g13000 OsbHLH130 LOC_Os12g39850 OsbHLH019 LOC_Os03g12760 OsbHLH075 LOC_Os04g47810 OsbHLH131 LOC_Os03g42100 OsbHLH020 LOC_Os03g46860 OsbHLH076 LOC_Os02g45010 OsbHLH132 LOC_Os11g41640 OsbHLH021 LOC_Os12g43620 OsbHLH077 LOC_Os07g28890 OsbHLH133 LOC_Os12g32400 OsbHLH022 LOC_Os03g46790 OsbHLH078 LOC_Os03g17130 OsbHLH134 LOC_Os03g55550 OsbHLH023 LOC_Os10g01530 OsbHLH079 LOC_Os02g47660 OsbHLH135 LOC_Os12g40590 OsbHLH024 LOC_Os01g39330 OsbHLH080 LOC_Os08g42470 OsbHLH136 LOC_Os12g40710 OsbHLH025 LOC_Os01g09990 OsbHLH081 LOC_Os09g33580 OsbHLH137 LOC_Os12g40630 OsbHLH026** LOC_Os01g09930 OsbHLH082 LOC_Os09g33580 OsbHLH138 LOC_Os03g27390 OsbHLH027 LOC_Os01g09900 OsbHLH083 LOC_Os05g01256 OsbHLH139 LOC_Os02g21090 OsbHLH028 LOC_Os05g11070 OsbHLH084 LOC_Os03g51910 OsbHLH140 LOC_Os03g39432 OsbHLH029 LOC_Os02g12820 OsbHLH085 LOC_Os09g29830 OsbHLH141 LOC_Os04g51070 OsbHLH030 LOC_Os06g37410 OsbHLH086 LOC_Os06g16400 OsbHLH142 LOC_Os01g18870 OsbHLH031 LOC_Os08g38210 OsbHLH087 LOC_Os08g38080 OsbHLH143 LOC_Os09g31300 OsbHLH032 LOC_Os09g29930 OsbHLH088 LOC_Os03g12940 OsbHLH144 LOC_Os04g35010 OsbHLH033 LOC_Os01g65080 OsbHLH089 LOC_Os03g58830 OsbHLH145 LOC_Os04g35000 OsbHLH034 LOC_Os02g49480 OsbHLH090 LOC_Os01g68700 OsbHLH146 LOC_Os02g34320 OsbHLH035 LOC_Os01g06640 OsbHLH091 LOC_Os08g41320 OsbHLH147 LOC_Os02g34370 OsbHLH036 LOC_Os05g07120 OsbHLH092 LOC_Os09g32510 OsbHLH148 LOC_Os03g53020 OsbHLH037 LOC_Os01g11910 OsbHLH093 LOC_Os04g28280 OsbHLH149 LOC_Os01g64560 OsbHLH038 LOC_Os08g33590 OsbHLH094 LOC_Os07g09590 OsbHLH150 LOC_Os12g06330 OsbHLH039 LOC_Os09g24490 OsbHLH095 LOC_Os06g41060 OsbHLH151 LOC_Os11g06010 OsbHLH040 LOC_Os03g15440 OsbHLH096 LOC_Os06g09370 OsbHLH152 LOC_Os03g56950 OsbHLH041 LOC_Os03g59670 OsbHLH097 LOC_Os02g35660 OsbHLH153 LOC_Os03g07540 OsbHLH042 LOC_Os08g37290 OsbHLH098 LOC_Os03g58330 OsbHLH154 LOC_Os04g54900 OsbHLH043 LOC_Os09g28900 OsbHLH099 LOC_Os07g08440 OsbHLH155 LOC_Os06g50900 OsbHLH044 LOC_Os03g08930 OsbHLH100 LOC_Os09g25040 OsbHLH156 LOC_Os04g31290 OsbHLH045 LOC_Os10g23050 OsbHLH101 LOC_Os04g52770 OsbHLH157 LOC_Os02g08220 OsbHLH046 LOC_Os09g29360 OsbHLH102 LOC_Os12g41650 OsbHLH158 LOC_Os06g44320 OsbHLH047 LOC_Os08g37730 OsbHLH103 LOC_Os03g43810 OsbHLH159 LOC_Os05g06520 OsbHLH048 LOC_Os02g52190 OsbHLH104 LOC_Os07g05010 OsbHLH160 LOC_Os11g02054 OsbHLH049 LOC_Os02g46560 OsbHLH105 LOC_Os01g18290 OsbHLH161 LOC_Os12g02020 OsbHLH050 LOC_Os04g50090 OsbHLH106 LOC_Os05g04740 OsbHLH162 LOC_Os05g27090 OsbHLH051 LOC_Os05g50900 OsbHLH107 LOC_Os02g56140 OsbHLH163 LOC_Os07g47960 OsbHLH052 LOC_Os03g03000 OsbHLH108 LOC_Os06g06900 OsbHLH164 LOC_Os07g36460 OsbHLH053 LOC_Os02g15760 OsbHLH109 LOC_Os01g67480 OsbHLH165 LOC_Os01g39580 OsbHLH054 LOC_Os06g33450 OsbHLH110 LOC_Os02g39140 OsbHLH166 LOC_Os03g21970 OsbHLH055 LOC_Os05g51820 OsbHLH111 LOC_Os04g41229 OsbHLH167 LOC_Os09g34330 OsbHLH056 LOC_Os01g72370 OsbHLH112 LOC_Os08g39630 OsbHLH No. . LOC No. . OsbHLH No. . LOC No. . OsbHLH No. . LOC No. . OsbHLH001* LOC_Os01g70310 OsbHLH057 LOC_Os07g35870 OsbHLH113 LOC_Os10g40740 OsbHLH002 LOC_Os11g32100 OsbHLH058 LOC_Os05g38140 OsbHLH114 LOC_Os02g55250 OsbHLH003 LOC_Os03g04310 OsbHLH059 LOC_Os02g02480 OsbHLH115 LOC_Os06g08500 OsbHLH004 LOC_Os10g39750 OsbHLH060 LOC_Os08g04390 OsbHLH116 LOC_Os12g40730 OsbHLH005 LOC_Os02g02820 OsbHLH061 LOC_Os11g38870 OsbHLH117 LOC_Os01g38610 OsbHLH006 LOC_Os04g23550 OsbHLH062 LOC_Os07g43530 OsbHLH118 LOC_Os01g51140 OsbHLH007 LOC_Os04g23440 OsbHLH063 LOC_Os03g26210 OsbHLH119 LOC_Os05g46370 OsbHLH008 LOC_Os01g13460 OsbHLH064 LOC_Os02g23823 OsbHLH120 LOC_Os09g28210 OsbHLH009 LOC_Os10g42430 OsbHLH065 LOC_Os04g41570 OsbHLH121 LOC_Os08g36740 OsbHLH010 LOC_Os01g50940 OsbHLH066 LOC_Os03g55220 OsbHLH122 LOC_Os06g10820 OsbHLH011 LOC_Os08g43070 OsbHLH067 LOC_Os05g42180 OsbHLH123 LOC_Os01g61480 OsbHLH012 LOC_Os01g39480 OsbHLH068 LOC_Os04g53990 OsbHLH124 LOC_Os08g01700 OsbHLH013 LOC_Os04g47080 OsbHLH069 LOC_Os01g57580 OsbHLH125 LOC_Os01g02110 OsbHLH014 LOC_Os11g15210 OsbHLH070 LOC_Os08g08160 OsbHLH126 LOC_Os02g48060 OsbHLH015 LOC_Os04g47040 OsbHLH071 LOC_Os01g01600 OsbHLH127 LOC_Os06g30090 OsbHLH016 LOC_Os04g47059 OsbHLH072 LOC_Os02g17680 OsbHLH128 LOC_Os07g39940 OsbHLH017 LOC_Os07g11020 OsbHLH073 LOC_Os05g14010 OsbHLH129 LOC_Os03g10770 OsbHLH018 LOC_Os03g51580 OsbHLH074 LOC_Os01g13000 OsbHLH130 LOC_Os12g39850 OsbHLH019 LOC_Os03g12760 OsbHLH075 LOC_Os04g47810 OsbHLH131 LOC_Os03g42100 OsbHLH020 LOC_Os03g46860 OsbHLH076 LOC_Os02g45010 OsbHLH132 LOC_Os11g41640 OsbHLH021 LOC_Os12g43620 OsbHLH077 LOC_Os07g28890 OsbHLH133 LOC_Os12g32400 OsbHLH022 LOC_Os03g46790 OsbHLH078 LOC_Os03g17130 OsbHLH134 LOC_Os03g55550 OsbHLH023 LOC_Os10g01530 OsbHLH079 LOC_Os02g47660 OsbHLH135 LOC_Os12g40590 OsbHLH024 LOC_Os01g39330 OsbHLH080 LOC_Os08g42470 OsbHLH136 LOC_Os12g40710 OsbHLH025 LOC_Os01g09990 OsbHLH081 LOC_Os09g33580 OsbHLH137 LOC_Os12g40630 OsbHLH026** LOC_Os01g09930 OsbHLH082 LOC_Os09g33580 OsbHLH138 LOC_Os03g27390 OsbHLH027 LOC_Os01g09900 OsbHLH083 LOC_Os05g01256 OsbHLH139 LOC_Os02g21090 OsbHLH028 LOC_Os05g11070 OsbHLH084 LOC_Os03g51910 OsbHLH140 LOC_Os03g39432 OsbHLH029 LOC_Os02g12820 OsbHLH085 LOC_Os09g29830 OsbHLH141 LOC_Os04g51070 OsbHLH030 LOC_Os06g37410 OsbHLH086 LOC_Os06g16400 OsbHLH142 LOC_Os01g18870 OsbHLH031 LOC_Os08g38210 OsbHLH087 LOC_Os08g38080 OsbHLH143 LOC_Os09g31300 OsbHLH032 LOC_Os09g29930 OsbHLH088 LOC_Os03g12940 OsbHLH144 LOC_Os04g35010 OsbHLH033 LOC_Os01g65080 OsbHLH089 LOC_Os03g58830 OsbHLH145 LOC_Os04g35000 OsbHLH034 LOC_Os02g49480 OsbHLH090 LOC_Os01g68700 OsbHLH146 LOC_Os02g34320 OsbHLH035 LOC_Os01g06640 OsbHLH091 LOC_Os08g41320 OsbHLH147 LOC_Os02g34370 OsbHLH036 LOC_Os05g07120 OsbHLH092 LOC_Os09g32510 OsbHLH148 LOC_Os03g53020 OsbHLH037 LOC_Os01g11910 OsbHLH093 LOC_Os04g28280 OsbHLH149 LOC_Os01g64560 OsbHLH038 LOC_Os08g33590 OsbHLH094 LOC_Os07g09590 OsbHLH150 LOC_Os12g06330 OsbHLH039 LOC_Os09g24490 OsbHLH095 LOC_Os06g41060 OsbHLH151 LOC_Os11g06010 OsbHLH040 LOC_Os03g15440 OsbHLH096 LOC_Os06g09370 OsbHLH152 LOC_Os03g56950 OsbHLH041 LOC_Os03g59670 OsbHLH097 LOC_Os02g35660 OsbHLH153 LOC_Os03g07540 OsbHLH042 LOC_Os08g37290 OsbHLH098 LOC_Os03g58330 OsbHLH154 LOC_Os04g54900 OsbHLH043 LOC_Os09g28900 OsbHLH099 LOC_Os07g08440 OsbHLH155 LOC_Os06g50900 OsbHLH044 LOC_Os03g08930 OsbHLH100 LOC_Os09g25040 OsbHLH156 LOC_Os04g31290 OsbHLH045 LOC_Os10g23050 OsbHLH101 LOC_Os04g52770 OsbHLH157 LOC_Os02g08220 OsbHLH046 LOC_Os09g29360 OsbHLH102 LOC_Os12g41650 OsbHLH158 LOC_Os06g44320 OsbHLH047 LOC_Os08g37730 OsbHLH103 LOC_Os03g43810 OsbHLH159 LOC_Os05g06520 OsbHLH048 LOC_Os02g52190 OsbHLH104 LOC_Os07g05010 OsbHLH160 LOC_Os11g02054 OsbHLH049 LOC_Os02g46560 OsbHLH105 LOC_Os01g18290 OsbHLH161 LOC_Os12g02020 OsbHLH050 LOC_Os04g50090 OsbHLH106 LOC_Os05g04740 OsbHLH162 LOC_Os05g27090 OsbHLH051 LOC_Os05g50900 OsbHLH107 LOC_Os02g56140 OsbHLH163 LOC_Os07g47960 OsbHLH052 LOC_Os03g03000 OsbHLH108 LOC_Os06g06900 OsbHLH164 LOC_Os07g36460 OsbHLH053 LOC_Os02g15760 OsbHLH109 LOC_Os01g67480 OsbHLH165 LOC_Os01g39580 OsbHLH054 LOC_Os06g33450 OsbHLH110 LOC_Os02g39140 OsbHLH166 LOC_Os03g21970 OsbHLH055 LOC_Os05g51820 OsbHLH111 LOC_Os04g41229 OsbHLH167 LOC_Os09g34330 OsbHLH056 LOC_Os01g72370 OsbHLH112 LOC_Os08g39630 Open in new tab Table I. LOC (location) number of OsbHLH members The OsbHLH proteins identified in this study are listed according to their OsbHLH numbers determined by the multiple sequence alignment in Supplemental Figure 1. The Osg numbers are designated according to the system of TIGR. *, The OsbHLH001 is different from the OsbHLH1 (OsbHLH062 in this study), which is identified by Wang et al. (2003). **, The OsbHLH026 protein has two bHLH domains. OsbHLH No. . LOC No. . OsbHLH No. . LOC No. . OsbHLH No. . LOC No. . OsbHLH001* LOC_Os01g70310 OsbHLH057 LOC_Os07g35870 OsbHLH113 LOC_Os10g40740 OsbHLH002 LOC_Os11g32100 OsbHLH058 LOC_Os05g38140 OsbHLH114 LOC_Os02g55250 OsbHLH003 LOC_Os03g04310 OsbHLH059 LOC_Os02g02480 OsbHLH115 LOC_Os06g08500 OsbHLH004 LOC_Os10g39750 OsbHLH060 LOC_Os08g04390 OsbHLH116 LOC_Os12g40730 OsbHLH005 LOC_Os02g02820 OsbHLH061 LOC_Os11g38870 OsbHLH117 LOC_Os01g38610 OsbHLH006 LOC_Os04g23550 OsbHLH062 LOC_Os07g43530 OsbHLH118 LOC_Os01g51140 OsbHLH007 LOC_Os04g23440 OsbHLH063 LOC_Os03g26210 OsbHLH119 LOC_Os05g46370 OsbHLH008 LOC_Os01g13460 OsbHLH064 LOC_Os02g23823 OsbHLH120 LOC_Os09g28210 OsbHLH009 LOC_Os10g42430 OsbHLH065 LOC_Os04g41570 OsbHLH121 LOC_Os08g36740 OsbHLH010 LOC_Os01g50940 OsbHLH066 LOC_Os03g55220 OsbHLH122 LOC_Os06g10820 OsbHLH011 LOC_Os08g43070 OsbHLH067 LOC_Os05g42180 OsbHLH123 LOC_Os01g61480 OsbHLH012 LOC_Os01g39480 OsbHLH068 LOC_Os04g53990 OsbHLH124 LOC_Os08g01700 OsbHLH013 LOC_Os04g47080 OsbHLH069 LOC_Os01g57580 OsbHLH125 LOC_Os01g02110 OsbHLH014 LOC_Os11g15210 OsbHLH070 LOC_Os08g08160 OsbHLH126 LOC_Os02g48060 OsbHLH015 LOC_Os04g47040 OsbHLH071 LOC_Os01g01600 OsbHLH127 LOC_Os06g30090 OsbHLH016 LOC_Os04g47059 OsbHLH072 LOC_Os02g17680 OsbHLH128 LOC_Os07g39940 OsbHLH017 LOC_Os07g11020 OsbHLH073 LOC_Os05g14010 OsbHLH129 LOC_Os03g10770 OsbHLH018 LOC_Os03g51580 OsbHLH074 LOC_Os01g13000 OsbHLH130 LOC_Os12g39850 OsbHLH019 LOC_Os03g12760 OsbHLH075 LOC_Os04g47810 OsbHLH131 LOC_Os03g42100 OsbHLH020 LOC_Os03g46860 OsbHLH076 LOC_Os02g45010 OsbHLH132 LOC_Os11g41640 OsbHLH021 LOC_Os12g43620 OsbHLH077 LOC_Os07g28890 OsbHLH133 LOC_Os12g32400 OsbHLH022 LOC_Os03g46790 OsbHLH078 LOC_Os03g17130 OsbHLH134 LOC_Os03g55550 OsbHLH023 LOC_Os10g01530 OsbHLH079 LOC_Os02g47660 OsbHLH135 LOC_Os12g40590 OsbHLH024 LOC_Os01g39330 OsbHLH080 LOC_Os08g42470 OsbHLH136 LOC_Os12g40710 OsbHLH025 LOC_Os01g09990 OsbHLH081 LOC_Os09g33580 OsbHLH137 LOC_Os12g40630 OsbHLH026** LOC_Os01g09930 OsbHLH082 LOC_Os09g33580 OsbHLH138 LOC_Os03g27390 OsbHLH027 LOC_Os01g09900 OsbHLH083 LOC_Os05g01256 OsbHLH139 LOC_Os02g21090 OsbHLH028 LOC_Os05g11070 OsbHLH084 LOC_Os03g51910 OsbHLH140 LOC_Os03g39432 OsbHLH029 LOC_Os02g12820 OsbHLH085 LOC_Os09g29830 OsbHLH141 LOC_Os04g51070 OsbHLH030 LOC_Os06g37410 OsbHLH086 LOC_Os06g16400 OsbHLH142 LOC_Os01g18870 OsbHLH031 LOC_Os08g38210 OsbHLH087 LOC_Os08g38080 OsbHLH143 LOC_Os09g31300 OsbHLH032 LOC_Os09g29930 OsbHLH088 LOC_Os03g12940 OsbHLH144 LOC_Os04g35010 OsbHLH033 LOC_Os01g65080 OsbHLH089 LOC_Os03g58830 OsbHLH145 LOC_Os04g35000 OsbHLH034 LOC_Os02g49480 OsbHLH090 LOC_Os01g68700 OsbHLH146 LOC_Os02g34320 OsbHLH035 LOC_Os01g06640 OsbHLH091 LOC_Os08g41320 OsbHLH147 LOC_Os02g34370 OsbHLH036 LOC_Os05g07120 OsbHLH092 LOC_Os09g32510 OsbHLH148 LOC_Os03g53020 OsbHLH037 LOC_Os01g11910 OsbHLH093 LOC_Os04g28280 OsbHLH149 LOC_Os01g64560 OsbHLH038 LOC_Os08g33590 OsbHLH094 LOC_Os07g09590 OsbHLH150 LOC_Os12g06330 OsbHLH039 LOC_Os09g24490 OsbHLH095 LOC_Os06g41060 OsbHLH151 LOC_Os11g06010 OsbHLH040 LOC_Os03g15440 OsbHLH096 LOC_Os06g09370 OsbHLH152 LOC_Os03g56950 OsbHLH041 LOC_Os03g59670 OsbHLH097 LOC_Os02g35660 OsbHLH153 LOC_Os03g07540 OsbHLH042 LOC_Os08g37290 OsbHLH098 LOC_Os03g58330 OsbHLH154 LOC_Os04g54900 OsbHLH043 LOC_Os09g28900 OsbHLH099 LOC_Os07g08440 OsbHLH155 LOC_Os06g50900 OsbHLH044 LOC_Os03g08930 OsbHLH100 LOC_Os09g25040 OsbHLH156 LOC_Os04g31290 OsbHLH045 LOC_Os10g23050 OsbHLH101 LOC_Os04g52770 OsbHLH157 LOC_Os02g08220 OsbHLH046 LOC_Os09g29360 OsbHLH102 LOC_Os12g41650 OsbHLH158 LOC_Os06g44320 OsbHLH047 LOC_Os08g37730 OsbHLH103 LOC_Os03g43810 OsbHLH159 LOC_Os05g06520 OsbHLH048 LOC_Os02g52190 OsbHLH104 LOC_Os07g05010 OsbHLH160 LOC_Os11g02054 OsbHLH049 LOC_Os02g46560 OsbHLH105 LOC_Os01g18290 OsbHLH161 LOC_Os12g02020 OsbHLH050 LOC_Os04g50090 OsbHLH106 LOC_Os05g04740 OsbHLH162 LOC_Os05g27090 OsbHLH051 LOC_Os05g50900 OsbHLH107 LOC_Os02g56140 OsbHLH163 LOC_Os07g47960 OsbHLH052 LOC_Os03g03000 OsbHLH108 LOC_Os06g06900 OsbHLH164 LOC_Os07g36460 OsbHLH053 LOC_Os02g15760 OsbHLH109 LOC_Os01g67480 OsbHLH165 LOC_Os01g39580 OsbHLH054 LOC_Os06g33450 OsbHLH110 LOC_Os02g39140 OsbHLH166 LOC_Os03g21970 OsbHLH055 LOC_Os05g51820 OsbHLH111 LOC_Os04g41229 OsbHLH167 LOC_Os09g34330 OsbHLH056 LOC_Os01g72370 OsbHLH112 LOC_Os08g39630 OsbHLH No. . LOC No. . OsbHLH No. . LOC No. . OsbHLH No. . LOC No. . OsbHLH001* LOC_Os01g70310 OsbHLH057 LOC_Os07g35870 OsbHLH113 LOC_Os10g40740 OsbHLH002 LOC_Os11g32100 OsbHLH058 LOC_Os05g38140 OsbHLH114 LOC_Os02g55250 OsbHLH003 LOC_Os03g04310 OsbHLH059 LOC_Os02g02480 OsbHLH115 LOC_Os06g08500 OsbHLH004 LOC_Os10g39750 OsbHLH060 LOC_Os08g04390 OsbHLH116 LOC_Os12g40730 OsbHLH005 LOC_Os02g02820 OsbHLH061 LOC_Os11g38870 OsbHLH117 LOC_Os01g38610 OsbHLH006 LOC_Os04g23550 OsbHLH062 LOC_Os07g43530 OsbHLH118 LOC_Os01g51140 OsbHLH007 LOC_Os04g23440 OsbHLH063 LOC_Os03g26210 OsbHLH119 LOC_Os05g46370 OsbHLH008 LOC_Os01g13460 OsbHLH064 LOC_Os02g23823 OsbHLH120 LOC_Os09g28210 OsbHLH009 LOC_Os10g42430 OsbHLH065 LOC_Os04g41570 OsbHLH121 LOC_Os08g36740 OsbHLH010 LOC_Os01g50940 OsbHLH066 LOC_Os03g55220 OsbHLH122 LOC_Os06g10820 OsbHLH011 LOC_Os08g43070 OsbHLH067 LOC_Os05g42180 OsbHLH123 LOC_Os01g61480 OsbHLH012 LOC_Os01g39480 OsbHLH068 LOC_Os04g53990 OsbHLH124 LOC_Os08g01700 OsbHLH013 LOC_Os04g47080 OsbHLH069 LOC_Os01g57580 OsbHLH125 LOC_Os01g02110 OsbHLH014 LOC_Os11g15210 OsbHLH070 LOC_Os08g08160 OsbHLH126 LOC_Os02g48060 OsbHLH015 LOC_Os04g47040 OsbHLH071 LOC_Os01g01600 OsbHLH127 LOC_Os06g30090 OsbHLH016 LOC_Os04g47059 OsbHLH072 LOC_Os02g17680 OsbHLH128 LOC_Os07g39940 OsbHLH017 LOC_Os07g11020 OsbHLH073 LOC_Os05g14010 OsbHLH129 LOC_Os03g10770 OsbHLH018 LOC_Os03g51580 OsbHLH074 LOC_Os01g13000 OsbHLH130 LOC_Os12g39850 OsbHLH019 LOC_Os03g12760 OsbHLH075 LOC_Os04g47810 OsbHLH131 LOC_Os03g42100 OsbHLH020 LOC_Os03g46860 OsbHLH076 LOC_Os02g45010 OsbHLH132 LOC_Os11g41640 OsbHLH021 LOC_Os12g43620 OsbHLH077 LOC_Os07g28890 OsbHLH133 LOC_Os12g32400 OsbHLH022 LOC_Os03g46790 OsbHLH078 LOC_Os03g17130 OsbHLH134 LOC_Os03g55550 OsbHLH023 LOC_Os10g01530 OsbHLH079 LOC_Os02g47660 OsbHLH135 LOC_Os12g40590 OsbHLH024 LOC_Os01g39330 OsbHLH080 LOC_Os08g42470 OsbHLH136 LOC_Os12g40710 OsbHLH025 LOC_Os01g09990 OsbHLH081 LOC_Os09g33580 OsbHLH137 LOC_Os12g40630 OsbHLH026** LOC_Os01g09930 OsbHLH082 LOC_Os09g33580 OsbHLH138 LOC_Os03g27390 OsbHLH027 LOC_Os01g09900 OsbHLH083 LOC_Os05g01256 OsbHLH139 LOC_Os02g21090 OsbHLH028 LOC_Os05g11070 OsbHLH084 LOC_Os03g51910 OsbHLH140 LOC_Os03g39432 OsbHLH029 LOC_Os02g12820 OsbHLH085 LOC_Os09g29830 OsbHLH141 LOC_Os04g51070 OsbHLH030 LOC_Os06g37410 OsbHLH086 LOC_Os06g16400 OsbHLH142 LOC_Os01g18870 OsbHLH031 LOC_Os08g38210 OsbHLH087 LOC_Os08g38080 OsbHLH143 LOC_Os09g31300 OsbHLH032 LOC_Os09g29930 OsbHLH088 LOC_Os03g12940 OsbHLH144 LOC_Os04g35010 OsbHLH033 LOC_Os01g65080 OsbHLH089 LOC_Os03g58830 OsbHLH145 LOC_Os04g35000 OsbHLH034 LOC_Os02g49480 OsbHLH090 LOC_Os01g68700 OsbHLH146 LOC_Os02g34320 OsbHLH035 LOC_Os01g06640 OsbHLH091 LOC_Os08g41320 OsbHLH147 LOC_Os02g34370 OsbHLH036 LOC_Os05g07120 OsbHLH092 LOC_Os09g32510 OsbHLH148 LOC_Os03g53020 OsbHLH037 LOC_Os01g11910 OsbHLH093 LOC_Os04g28280 OsbHLH149 LOC_Os01g64560 OsbHLH038 LOC_Os08g33590 OsbHLH094 LOC_Os07g09590 OsbHLH150 LOC_Os12g06330 OsbHLH039 LOC_Os09g24490 OsbHLH095 LOC_Os06g41060 OsbHLH151 LOC_Os11g06010 OsbHLH040 LOC_Os03g15440 OsbHLH096 LOC_Os06g09370 OsbHLH152 LOC_Os03g56950 OsbHLH041 LOC_Os03g59670 OsbHLH097 LOC_Os02g35660 OsbHLH153 LOC_Os03g07540 OsbHLH042 LOC_Os08g37290 OsbHLH098 LOC_Os03g58330 OsbHLH154 LOC_Os04g54900 OsbHLH043 LOC_Os09g28900 OsbHLH099 LOC_Os07g08440 OsbHLH155 LOC_Os06g50900 OsbHLH044 LOC_Os03g08930 OsbHLH100 LOC_Os09g25040 OsbHLH156 LOC_Os04g31290 OsbHLH045 LOC_Os10g23050 OsbHLH101 LOC_Os04g52770 OsbHLH157 LOC_Os02g08220 OsbHLH046 LOC_Os09g29360 OsbHLH102 LOC_Os12g41650 OsbHLH158 LOC_Os06g44320 OsbHLH047 LOC_Os08g37730 OsbHLH103 LOC_Os03g43810 OsbHLH159 LOC_Os05g06520 OsbHLH048 LOC_Os02g52190 OsbHLH104 LOC_Os07g05010 OsbHLH160 LOC_Os11g02054 OsbHLH049 LOC_Os02g46560 OsbHLH105 LOC_Os01g18290 OsbHLH161 LOC_Os12g02020 OsbHLH050 LOC_Os04g50090 OsbHLH106 LOC_Os05g04740 OsbHLH162 LOC_Os05g27090 OsbHLH051 LOC_Os05g50900 OsbHLH107 LOC_Os02g56140 OsbHLH163 LOC_Os07g47960 OsbHLH052 LOC_Os03g03000 OsbHLH108 LOC_Os06g06900 OsbHLH164 LOC_Os07g36460 OsbHLH053 LOC_Os02g15760 OsbHLH109 LOC_Os01g67480 OsbHLH165 LOC_Os01g39580 OsbHLH054 LOC_Os06g33450 OsbHLH110 LOC_Os02g39140 OsbHLH166 LOC_Os03g21970 OsbHLH055 LOC_Os05g51820 OsbHLH111 LOC_Os04g41229 OsbHLH167 LOC_Os09g34330 OsbHLH056 LOC_Os01g72370 OsbHLH112 LOC_Os08g39630 Open in new tab ; Supplemental Table I). The number designation of the OsbHLH genes was based on the order of the multiple sequence alignment (Supplemental Fig. 1) and the synonymy between the names of these 167 OsbHLH genes and the previously reported 131 rice genes by Buck and Atchley (2003) is shown in the Supplemental Table I. Among the reported rice bHLH genes by Buck and Atchley (2003), 45 genes were from the GenBank database, and others were predicted genes of rice with temporary designations (Yu et al., 2002). Compared with other transcription factor gene families in rice and Arabidopsis, the bHLH gene family was one of the largest families whose members were only fewer than the MYB family (Xiong et al., 2005). To verify the reliability of our criteria, we performed simple modular architecture research tool (SMART) analysis of the 167 putative OsbHLH protein sequences and found that 164 proteins had a typical bHLH domain and three, OsbHLH157, OsbHLH160, and OsbHLH161, contained a predicted HLH domain with low confidence values. In addition, the OsbHLH026 protein unexpectedly had two HLH domains predicted by SMART, and their E values were 8.46E-13 and 1.56E-02, respectively. The amino acid sequences of these two bHLH domains were 76% similar, with 14 identical amino acids among the 17 amino acids of the basic region and a predicated binding activity to the G box. To date, five Caenorhabditis elegans proteins have been reported to have two bHLH domains (Ledent et al., 2002), but the degrees of sequence similarity between the two bHLH domains in the same protein are not as high as that in OsbHLH026. The biological functions of this kind of bHLH protein remain to be elucidated. Multiple Sequence Alignments, Predicted DNA-Binding Ability, and Conserved Residues To examine sequence features of these rice bHLH domains, we performed multiple sequence alignment of the 167 rice bHLH amino acid sequences (Supplemental Fig. 1). On average, the basic regions (the N-terminal 17 positions; Supplemental Fig. 1) of OsbHLH domains have 5.7 basic residues, even though 26 of these proteins did not have the basic region. Within subsets of OsbHLH domains, there is further conservation of nonbasic residues in the basic region, as well as in the two helices and in a C-terminal region of the second helix (Supplemental Fig. 1). In contrast, the loop was the most divergent region in terms of both length (ranging from 3–18 amino acids) and amino acid composition. From the alignment, we identified 19 residues that are identical in at least 50% of the 167 rice bHLH domains (Supplemental Fig. 1, indicated at the bottom of the alignment). Figure 1 Figure 1. Open in new tabDownload slide Distribution of amino acids in the bHLH consensus motif. In columns labeled a, percentages refer to the 392 bHLH proteins analyzed by Atchley et al. (1999). In columns labeled b, percentages refer to the 147 AtbHLH proteins identified by Toledo-Ortiz et al. (2003). In columns labeled c, percentages refer to the 167 OsbHLH proteins identified in this study. More than 10% of some of the residues presented in columns b or c, which were absent in the defined consensus motif in column a, are also indicated. The numbers below a, b, and c refer to the positions of the residues in the alignments of the studies. Figure 1. Open in new tabDownload slide Distribution of amino acids in the bHLH consensus motif. In columns labeled a, percentages refer to the 392 bHLH proteins analyzed by Atchley et al. (1999). In columns labeled b, percentages refer to the 147 AtbHLH proteins identified by Toledo-Ortiz et al. (2003). In columns labeled c, percentages refer to the 167 OsbHLH proteins identified in this study. More than 10% of some of the residues presented in columns b or c, which were absent in the defined consensus motif in column a, are also indicated. The numbers below a, b, and c refer to the positions of the residues in the alignments of the studies. shows the distribution of amino acid residues at the 19 positions of the consensus motif of the bHLH domain, including the results from two previous reports (Atchley et al., 1999; Toledo-Ortiz et al., 2003) and the results of OsbHLHs from this study. Generally, the distributions of conserved amino acids among the bHLH domains of both rice and Arabidopsis were very similar, but quite different from that of the animal bHLHs (Fig. 1), as expected from the evolutionary distances for bHLHs among plants or between plants and animals. Eleven of the 19 conserved residues were included in the consensus motif used for identifying the bHLH family members (Glu-13, Arg-14, Arg-16, Asn-21, Leu-27, Lys-39, Leu-55, Ala-58, Ile-59, Tyr-61, and Leu-65 in our alignment [Atchley et al., 1999; Toledo-Ortiz et al., 2003]), whereas the other eight were not included in the consensus motif (Arg-17, Leu-30, Val-31, Pro-32, Asp-50, Ala-52, Ser-53, and Lys-63 in our alignment). Glu-13 and Arg-17 play very important roles in DNA binding (Atchley and Fitch, 1997). Although bHLH proteins have the potential to form homodimers or heterodimers, little is known about the residues important for dimerization (Shirakata et al., 1993), except in the case of the mammalian Max protein, which requires Leu-27 (Figs. 1 and 2 Figure 2. Open in new tabDownload slide Predicted DNA-binding characteristics of the bHLH domain of OsbHLH and AtbHLH proteins. The asterisk (*) indicates that the data for AtbHLHs were from Toledo-Ortiz et al. (2003), and the figure is modeled after table III in Toledo-Ortiz et al. (2003). Figure 2. Open in new tabDownload slide Predicted DNA-binding characteristics of the bHLH domain of OsbHLH and AtbHLH proteins. The asterisk (*) indicates that the data for AtbHLHs were from Toledo-Ortiz et al. (2003), and the figure is modeled after table III in Toledo-Ortiz et al. (2003). ) for dimer formation (Brownlie et al., 1997). Leu-27 was conserved in all 167 OsbHLH proteins, suggesting that this residue also is extremely important for dimerization or other functions of OsbHLH proteins. The basic region of the bHLH domain has the ability to bind to DNA and is critical for function (Massari and Murre, 2000). Using the criteria described by Massari and Murre (2000), the OsbHLH proteins were divided into several categories according to sequence information in the N-terminal region of the bHLH domain (Fig. 2). The distribution of the predicted DNA-binding properties, as described below, across various phylogenetic subfamilies is indicated with shaped markers in the phylogenetic tree (Fig. 3 Figure 3. Open in new tabDownload slide Open in new tabDownload slide NJ phylogenetic tree of the OsbHLH members. This tree indicates the predicted DNA-binding activities, the intron distribution pattern, and the conservative sequence out of the bHLH domain. The unrooted tree, constructed using MEGA 3.0, summarizes the evolutionary relationships among the 167 members of the OsbHLH protein family. The proteins are named according to OsbHLH numbers (see Supplemental Fig. 1; Table I). The colorful dots on the nodes indicate the bootstrap values of the tree, which is built by the maximum parsimony method. The variation rates across the amino acid positions were shown by the length of the branch. The tree shows the 22 phylogenetic subfamilies (A–V) with high predictive value. The bootstrap values lower than 50 are not shown in the phylogenetic tree. The markers in front of the OsbHLH numbers indicate the predicted DNA-binding activity of each protein, i.e. the roundish marker indicates putative G-box binders, the square marker indicates putative non-G-box but E-box binders, the triangle marker indicates putative non-E-box binders (i.e. possible DNA-binding capacity but no predicted recognition of an E box), and the upside-down triangle marker indicates putative non-DNA binders (see Fig. 2 for categories). The colors of these markers indicate the numbers and positions of the introns localized in the bHLH domain of each protein, which are identical to those of the intron patterns shown in Figure 4. The conserved motifs outside bHLH domain among the members of the same subfamilies are highlighted in white boxes with an arranged number, and the same number referred to the same motif, except the bHLH domain and L-ZIP (LZ) indicated directly in the figure, and motif sequences with best possible match were shown in Supplemental Table III. This figure is modeled after figure 4 in Heim et al. (2003). Figure 3. Open in new tabDownload slide Open in new tabDownload slide NJ phylogenetic tree of the OsbHLH members. This tree indicates the predicted DNA-binding activities, the intron distribution pattern, and the conservative sequence out of the bHLH domain. The unrooted tree, constructed using MEGA 3.0, summarizes the evolutionary relationships among the 167 members of the OsbHLH protein family. The proteins are named according to OsbHLH numbers (see Supplemental Fig. 1; Table I). The colorful dots on the nodes indicate the bootstrap values of the tree, which is built by the maximum parsimony method. The variation rates across the amino acid positions were shown by the length of the branch. The tree shows the 22 phylogenetic subfamilies (A–V) with high predictive value. The bootstrap values lower than 50 are not shown in the phylogenetic tree. The markers in front of the OsbHLH numbers indicate the predicted DNA-binding activity of each protein, i.e. the roundish marker indicates putative G-box binders, the square marker indicates putative non-G-box but E-box binders, the triangle marker indicates putative non-E-box binders (i.e. possible DNA-binding capacity but no predicted recognition of an E box), and the upside-down triangle marker indicates putative non-DNA binders (see Fig. 2 for categories). The colors of these markers indicate the numbers and positions of the introns localized in the bHLH domain of each protein, which are identical to those of the intron patterns shown in Figure 4. The conserved motifs outside bHLH domain among the members of the same subfamilies are highlighted in white boxes with an arranged number, and the same number referred to the same motif, except the bHLH domain and L-ZIP (LZ) indicated directly in the figure, and motif sequences with best possible match were shown in Supplemental Table III. This figure is modeled after figure 4 in Heim et al. (2003). ). The OsbHLH proteins were divided into two major groups according to the 17 N-terminal amino acids within the bHLH domain: (1) a large group of 141 bHLH proteins containing five to eight basic residues in the basic region were predicted to bind to DNA, and (2) a smaller group of 26 HLH proteins lacking the basic region were thought to lack DNA-binding ability (Fig. 2), as previously done for Arabidopsis bHLH proteins (Toledo-Ortiz et al., 2003). These HLH proteins might be similar to the animal ID-HLH proteins, functioning as negative regulators of E-box-binding bHLHs through the formation of heterodimers (Fairman et al., 1993). The DNA-binding bHLHs in the first group were further divided into two groups with different predicted target sequences depending on the presence or absence of residues Glu-13 and Arg-16 in the basic region (Figs. 1 and 2). Group (1A) has 114 putative E-box-binding proteins with the conserved Glu-13/Arg-16 residues and Group (1B) has 27 non-E-box-binding proteins lacking these residues (Fig. 2). Within Group (1A), OsbHLH062 is an exception because it does not have Arg-16; nevertheless, we placed OsbHLH062 in this group because animal proteins such as SREBP missing the Arg-16 can also bind to E box (Hua et al., 1993). It is known that the three residues in the basic region of the bHLH domain, His/Lys-9, Glu-13, and Arg-17, constituted the classic G-box-binding region (Massari and Murre, 2000). Therefore, we can subdivide Group (1A) of 114 predicted E-box-binding bHLH proteins into two subgroups: (1A1) with 95 members predicted to bind G boxes, and (1A2) with 19 members predicted to bind other types of E boxes (non-G-box binders; Fig. 2). Phylogenetic Analysis of the OsbHLH Genes To obtain clues about the evolutionary history of the OsbHLH genes, a neighbor-joining (NJ) phylogenetic tree was generated using the multiple sequence alignments of the OsbHLH protein sequences with bootstrap analysis (1,000 replicates). The position of the bHLH domain and any conserved sequence motifs outside of the bHLH domain are shown in Figure 3. We subdivided the 167 members of the OsbHLH family into 22 subfamilies, designated A to V, according to clades with at least 50% bootstrap support. In addition, we noted that most of the members in the same subfamilies shared one or more motifs outside the bHLH domain, further supporting the subfamily definition. A total of 40 motifs outside of the bHLH domain were discovered (Supplemental Table III). However, most of these motifs have not been characterized except Leu-ZIP shared by the members of subfamily R; Leu-ZIP is known as a motif involved in protein dimerization (Tong et al., 1997; Paris et al., 2003). Similarly, sequence analysis of other transcription families like ERF (Nakano et al., 2006b) and WRKY (Eulgem et al., 2000) also indicated other motifs outside of the conserved bHLH domain. The fact that internal nodes of this tree had low bootstrap support is similar to the phylogenetic analysis of bHLH proteins in other organisms and is likely due to the fact that the bHLH domain is relatively short and members within a subfamily are highly conserved, with relatively few informative character positions. To further test the reliability of the NJ tree, maximum parsimony analysis was also used to generate phylogenetic tree (Supplemental Fig. 2), and 93% of the OsbHLH proteins were placed into the same subfamilies as those in the NJ tree, indicating that both methods are in very good agreement. As described below, the subfamilies defined on the basis of the phylogenetic analysis are also supported by additional studies. Intron/Exon Structure within the OsbHLH Domains The pattern of intron positions can also provide important evidence to support phylogenetic relationships in a gene family. Among 167 rice bHLH genes, the number of introns ranged from zero to four, with 87.4% of these 167 genes having intron(s) in the bHLH domain; these genes can be grouped into 10 patterns of intron presence and positions (Fig. 4A Figure 4. Open in new tabDownload slide Intron distribution within the bHLH domains of the OsbHLH and AtbHLH proteins. A, Scheme of the intron distribution patterns (color coded and designated I–XII) within the bHLH domains of the OsbHLH proteins. The white triangles are used when the position of the intron coincides with the example. The black triangles indicate that the location of the intron within the bHLH domain is different from the example. The numbers above the triangles indicate the splicing phases of the bHLH domain sequences, 0 refers to phase 0, 1 to phase 1, and 2 to phase 2. The markers 1 to 8 beside the triangles show different positions of the introns. The number of proteins with each pattern is given at right. The correlation of intron distribution patterns and phylogenetic subfamilies is provided in Figure 3 (in different color marker in front of the OsbHLH number). The result of introns in the variable loop region has been adjusted by eye to make the result more contracted. This figure is modeled after figure 3 in Toledo-Ortiz et al. (2003). B, The intron pattern of bHLH domains in different subfamilies of OsbHLH and AtbHLH proteins. Topology of this tree is based on the phylogenetic tree of Figure 6. The markers 1 to 8 are the same as in Figure 4A. Figure 4. Open in new tabDownload slide Intron distribution within the bHLH domains of the OsbHLH and AtbHLH proteins. A, Scheme of the intron distribution patterns (color coded and designated I–XII) within the bHLH domains of the OsbHLH proteins. The white triangles are used when the position of the intron coincides with the example. The black triangles indicate that the location of the intron within the bHLH domain is different from the example. The numbers above the triangles indicate the splicing phases of the bHLH domain sequences, 0 refers to phase 0, 1 to phase 1, and 2 to phase 2. The markers 1 to 8 beside the triangles show different positions of the introns. The number of proteins with each pattern is given at right. The correlation of intron distribution patterns and phylogenetic subfamilies is provided in Figure 3 (in different color marker in front of the OsbHLH number). The result of introns in the variable loop region has been adjusted by eye to make the result more contracted. This figure is modeled after figure 3 in Toledo-Ortiz et al. (2003). B, The intron pattern of bHLH domains in different subfamilies of OsbHLH and AtbHLH proteins. Topology of this tree is based on the phylogenetic tree of Figure 6. The markers 1 to 8 are the same as in Figure 4A. , I–III, V–X, and XII). Among these 10 patterns, the most common ones had one or more introns at three highly conserved positions (indicated by white inverted triangles), accounting for 82.0% of the 167 genes (Fig. 4A, I–III, V and VI). The remaining patterns had introns at varying positions (patterns VII–X, and XII) and were observed in only 5.4% of the 167 genes. Furthermore, we investigated intron phases with respect to codons. An intron was designated as occurring in one of three phases: in phase 1, splicing occurred after the first nucleotide of the codon; in phase 2, splicing occurred after the second nucleotide; and in phase 0, splicing occurred after the third nucleotide of the codon (Sharp, 1981). Figure 4A showed that all of the introns with conserved positions also had identical phases. All of the introns at the three conserved positions (indicated by white inverted triangles) had phase 0. The other introns with less conserved positions (black inverted triangles) were in phase 0, 1, or 2 (Fig. 4A, VII–X). Therefore, the splicing phase was highly conserved during the evolution of bHLH genes and supported the subfamily designation here. Such conserved splicing phase was also observed in the MYB gene families of rice and Arabidopsis (Jiang et al., 2004). Exons with the same splicing phase at both 5′ and 3′ ends are called symmetric exons. According to the intron-early theory (Gilbert, 1987), an excess of phase 0 introns and symmetric exons (with the same phase on both ends) may facilitate exon shuffling by avoiding interruptions of the open reading frame (ORF) and facilitating recombinational fusion and exchange of protein domains (Patthy, 1987). Among the 271 introns analyzed here, 258 had phase 0, including all introns with conserved positions, whereas only four were in phase 1 and nine in phase 2. Among the 125 exons flanked by introns in the OsbHLH domain, 120 exons were symmetric with phase 0 introns, and only five were asymmetric with different splicing phase at 5′ end and 3′ end, respectively. Therefore, the analysis of bHLH genes provides strong support for the intron-early theory. Genome Distribution of OsbHLH Genes To determine the genomic distribution of the OsbHLH genes, the DNA sequence of each OsbHLH gene was used to search the rice genome database using BLASTN. Although each of the 12 rice chromosomes contains some OsbHLH genes, the distribution seems to be uneven (Fig. 5 Figure 5. Open in new tabDownload slide Chromosomal locations, region duplication, and predicted cluster for OsbHLH genes. Chromosomal positions of the OsbHLH genes are indicated by OsbHLH number (assigned in Table I). The scale is in megabases (Mb). The numbers below the name of the chromosome show the number of OsbHLH genes in this chromosome. The colorful marker in front of the OsbHLH number is the same with the color of their intron distribution pattern in Figure 4. The letter in front of the colorful marker shows the phylogenetic category of the gene (Fig. 3) and the unclassified member is denoted as a question mark (?). The green bars in the middle of the 12 chromosomes show the rough position of the centromerics according to the sequencing result of IRGSP (2005). Each pair of duplicated bHLH genes is connected with a blue line. Connecting lines mark the specific cases in which there is a strong correlation between duplicated genomic regions and the presence of bHLH genes with closely related predicted amino acid sequences (OsbHLH members in the same family). The red lines connect the predicted gene cluster with high sequence similarity and close chromosome locations. The probable hidden duplicated bHLHs are linked using a green line. The orange and red bars beside the chromosomes indicate the 14 duplication regions predicted in this study. The predicted earlier duplication of region 7 was shown in the yellow bar. This figure is adopted from figure 4 of Toledo-Ortiz et al. (2003). Figure 5. Open in new tabDownload slide Chromosomal locations, region duplication, and predicted cluster for OsbHLH genes. Chromosomal positions of the OsbHLH genes are indicated by OsbHLH number (assigned in Table I). The scale is in megabases (Mb). The numbers below the name of the chromosome show the number of OsbHLH genes in this chromosome. The colorful marker in front of the OsbHLH number is the same with the color of their intron distribution pattern in Figure 4. The letter in front of the colorful marker shows the phylogenetic category of the gene (Fig. 3) and the unclassified member is denoted as a question mark (?). The green bars in the middle of the 12 chromosomes show the rough position of the centromerics according to the sequencing result of IRGSP (2005). Each pair of duplicated bHLH genes is connected with a blue line. Connecting lines mark the specific cases in which there is a strong correlation between duplicated genomic regions and the presence of bHLH genes with closely related predicted amino acid sequences (OsbHLH members in the same family). The red lines connect the predicted gene cluster with high sequence similarity and close chromosome locations. The probable hidden duplicated bHLHs are linked using a green line. The orange and red bars beside the chromosomes indicate the 14 duplication regions predicted in this study. The predicted earlier duplication of region 7 was shown in the yellow bar. This figure is adopted from figure 4 of Toledo-Ortiz et al. (2003). ). Relatively high densities of bHLH genes were observed in some chromosomal regions, including the top and bottom of chromosomes 1, 2, and 3, and the bottom of chromosomes 4, 8, and 9. In particular, 17 OsbHLH members are located on the long arm of chromosome 4. In contrast, several large chromosomal regions lacked bHLH genes, such as the top half of chromosomes 4 and 9 and the central sections of chromosomes 7, 8, 11, and 12. Fourteen OsbHLH gene clusters were identified by members with high levels of sequence similarity (Fig. 5); for instance, the entire protein sequences of OsbHLH081 and OsbHLH082 share 75% similarity, and OsbHLH013 and OsbHLH015 are 68% similar (Fig. 5, linked with red line). Genome duplication events are thought to have occurred throughout the process of plant evolution (Kent et al., 2003; Cannon et al., 2004; Mehan et al., 2004). To detect a possible relationship between the OsbHLH genes and potential genome duplications, we identified 40 pairs of OsbHLH genes that are close paralogs in the same subfamily (Fig. 5, blue lines, the alignment of these putative duplicated genes were in Supplemental Fig. 4). These genes represent 48% of the OsbHLH family and might have evolved from putative rice genome duplication events, because multiple pairs link each of at least 14 potential chromosomal/segmental duplications (Fig. 5, pairs of bars with numbers 1–14). In contrast, no intrachromosomal duplication event was suggested by the duplication of OsbHLH genes. The putative chromosomal duplications are similar to the predicated segmental duplications of the transcriptional factor encoding genes in the rice genome (Guyot and Keller, 2004; Xiong et al., 2005). Interestingly, four pairs of bHLH genes on the long arms of chromosome 1 and 9 are close paralogs (Fig. 5, linked in green lines) in two regions that had not been proposed to be the result of rice genome duplication. Additionally, 30 OsbHLH genes (18%) might involve local and tandem duplications (Fig. 5, linked with red line). It was much fewer than the number of that involved in the polyploidy duplication, so the polyploidy duplication might play a key role in gene expansion of bHLH genes in rice. Additional evidence for a common origin of closely related bHLHs came from the intron position patterns in the bHLH domain. As shown in Figure 5, the genes related by putative duplications shared conserved intron position pattern. Only a few pairs of the probable duplicated genes, e.g. OsbHLH104 and OsbHLH152, in the same subfamily had different intron distribution patterns, which can be explained by a loss (or gain) of intron following the duplication event. Comparative Analyses of Rice and Arabidopsis bHLH Genes Using the alignment of the bHLH domain amino acid sequences of OsbHLHs and AtbHLHs (Supplemental Fig. 3), a phylogenetic tree was constructed (Fig. 6 Figure 6. Open in new tabDownload slide Open in new tabDownload slide NJ phylogenetic tree of the AtbHLH and OsbHLH domains and expression patterns for Arabidopsis and rice bHLH genes from RT-PCR, microarray, and EST data. The letter R above the column of expression data refers to root, S refers to stem, L refers to leaf, and F refers to flower and seed (silique). The black and white blocks in the right column (length of exon and intron) indicate the DNA sequence length of each bHLH domain. The white blocks indicate introns and the black blocks indicate exons. The bar above the column indicates length of the sequences. The colors of the markers in front of the bHLH numbers, which correspond with those in Figure 4, also indicate the numbers and positions of the introns localized in the bHLH domain of each protein. The names of the subfamilies divided by Buck and Atchley (2003) are also listed after the clade names of this study. The AtbHLH protein names are abbreviated as follows: BEE, Brassinosteroid enhanced expression; PIF, phytochrome-interacting factor; TT8, transparent testa8, GL3, GLABRA3; EGL3, enhancer of GLABRA3; AMS, aborted microspores; ICE1, inducer of CBF expression1; HFR1, long hypocotyl in far red1. The OsbHLH110 has three introns, but only two introns could be seen in the figure. The third predicted intron of the OsbHLH110 is too short (only 9 bp) to display in the figure. The third intron (5,273 bp) of bHLH domain of OsbHLH076 and the single intron (5,571 bp) of bHLH domain of OsbHLH064 are too long to show the full length in the figure, so part of the block is replaced by the symbol of suspension points (……). Figure 6. Open in new tabDownload slide Open in new tabDownload slide NJ phylogenetic tree of the AtbHLH and OsbHLH domains and expression patterns for Arabidopsis and rice bHLH genes from RT-PCR, microarray, and EST data. The letter R above the column of expression data refers to root, S refers to stem, L refers to leaf, and F refers to flower and seed (silique). The black and white blocks in the right column (length of exon and intron) indicate the DNA sequence length of each bHLH domain. The white blocks indicate introns and the black blocks indicate exons. The bar above the column indicates length of the sequences. The colors of the markers in front of the bHLH numbers, which correspond with those in Figure 4, also indicate the numbers and positions of the introns localized in the bHLH domain of each protein. The names of the subfamilies divided by Buck and Atchley (2003) are also listed after the clade names of this study. The AtbHLH protein names are abbreviated as follows: BEE, Brassinosteroid enhanced expression; PIF, phytochrome-interacting factor; TT8, transparent testa8, GL3, GLABRA3; EGL3, enhancer of GLABRA3; AMS, aborted microspores; ICE1, inducer of CBF expression1; HFR1, long hypocotyl in far red1. The OsbHLH110 has three introns, but only two introns could be seen in the figure. The third predicted intron of the OsbHLH110 is too short (only 9 bp) to display in the figure. The third intron (5,273 bp) of bHLH domain of OsbHLH076 and the single intron (5,571 bp) of bHLH domain of OsbHLH064 are too long to show the full length in the figure, so part of the block is replaced by the symbol of suspension points (……). ). Because of the large number of taxa and relatively small number of characters, the bootstrap values of internal nodes were low; nevertheless, the outer nodes had more credible bootstrap values, allowing for clustering of the bHLH genes of rice and Arabidopsis into 25 subfamilies (A–Y). In addition, our analysis of the OsbHLH gene family (Fig. 3, A–V) and the AtbHLH result (1–21) of Toledo-Ortiz et al. (2003) were also taken into consideration in the subfamily classification (Fig. 6). For example, subfamily 19 in Arabidopsis (Toledo-Ortiz et al., 2003) was divided into subfamily A and B in this study because of the bootstrap values. In the phylogenetic study by Buck and Atchley (2003) of 131 rice genes, 75 were grouped into 15 subfamilies, with 32 shown in their NJ tree, but the remaining 56 genes were not included in the phylogenetic result because of low statistical support. To compare our results with theirs, we labeled the clades they defined in the tree shown in Figure 6. Most of the large subfamilies, i.e. A to F, M, N, P, R to V, X, and Y, are also supported by the work of Buck and Atchley (2003) and had good bootstrap values (Fig. 6); some of these subfamilies contain new genes that we have included due to new rice genome sequence information. On the other hand, some small subfamilies, i.e. G to L, O, Q, and W did not appear in the NJ tree reported by Buck and Atchley (2003). These new subfamilies include new rice sequences that have been revealed after the completion of the rice genome sequence. Moreover, intron position patterns of the OsbHLHs were also consistent with the phylogenetic subfamilies defined in Figure 3. For instance, the members in subfamily A had the same intron distribution pattern, and so did members of the subfamilies G, I, K, L, N, O, P, Q, S, U, and V (Fig. 4B). Members of other subfamilies had the same intron distribution pattern with one or two exceptions. In addition, the intron/exon position pattern shown in Figure 4A agreed with the evolutionary relationship between OsbHLHs and AtbHLHs (see below; Fig. 6). There were 12 different groups of intron position patterns among OsbHLH and AtbHLH domains. Nine of the patterns are shared by the genes from both rice and Arabidopsis, although patterns IV and XI are found only among Arabidopsis genes, and pattern VII was only present in one rice gene. The nonconserved patterns shared by AtbHLH and OsbHLH showed that most of the intron patterns existed in the ancestor of monocots and eudicots. The percentages of each pattern in AtbHLHs and OsbHLHs were quite close; e.g. pattern I was found in 32.3% of OsbHLH members and 28.6% of AtbHLH members. The Arabidopsis bHLH introns also had identical splicing phase to those of subfamily members in OsbHLH domains. The gene structures in terms of intron position and length were also displayed in Figure 6 to provide further clues about the evolutionary relationships among OsbHLHs and AtbHLHs. Most members in the same subfamilies had similar intron/exon structure. For example, members of subfamily P had only one intron with similar lengths. The fact that they not only had similar coding sequences but also very similar intron/exon structure supports their close evolutionary relationship and membership in the same subfamily. We also examined the sizes of introns and found that most members of the same bHLH subfamily had similar intron patterns, while the sizes of their introns of some members were similar too, i.e. the members of subfamily K have the single intron of 99 bp (OsbHLH149), 123 bp (OsbHLH150), and 130 bp (OsbHLH151), respectively (Fig. 6). Approximately 73% of the introns existing within the domains of OsbHLHs were shorter than 300 bp, and other introns in this bHLH domain of these family members were more than 300 bp. Although rice and Arabidopsis bHLH genes share many similarities, there were a few differences of intron patterns between the OsbHLH and AtbHLH domains. The length of one intron within the bHLH domain of two OsbHLHs (OsbHLH064 and OsbHLH076) was more than 5 kb (Fig. 6), but no AtbHLH gene had such a long intron. This might have resulted from insertion(s) into introns in OsbHLH family members after the divergence of monocots and eudicots. A similar explanation would be reasonable for the fact that a protein with two bHLH domains was observed in the rice genome, but not in Arabidopsis. Overall, combining the bHLH domain intron patterns and the bHLH subfamilies, we can recognize two major categories of intron patterns that correspond to two major groups of subfamilies. The first one includes subfamilies A to K and W, mainly with members that have intron pattern I, which had the introns 1, 2, and 3 (Fig. 4B). Other intron patterns for members of these families can be explained by loss of specific introns in different lineages, starting from pattern I. For example, in the subfamily B, intron pattern II with intron 1 and 2, III with intron 1 and 3, IV with intron 2 and 3, and V with intron 1 might be obtained by losing one or two introns. The other category, including subfamilies L to V and X and Y mainly consisted of members with intron pattern VI, which had only the intron 2 (Fig. 4B). Members of some subfamilies without an intron in the bHLH, such as subfamilies A, W, H, I, and N, might have lost all introns. The intron position patterns in the bHLH family strongly support the hypothesis that introns have lost independently multiple times. We have also observed that some members of different bHLH subfamilies were located within the same small chromosomal region, whereas some members of the same subfamily were distributed in different chromosomal regions, suggesting that bHLH genes were distributed widely in the genome of the common ancestor of monocots and eudicots. The phylogenetic tree of Arabidopsis and rice bHLH genes (Fig. 6) provides a way to estimate the number of bHLH genes in the MRCA. There were 45 branches with bootstrap values of 50 or greater that included both Arabidopsis and rice bHLH members, 11 branches had only Arabidopsis members, and 10 branches had only rice members. This result suggests that there were at least 66 bHLH genes in the MRCA of monocots and eudicots. Furthermore, the phylogenetic analysis provides evidence for birth-and-death evolution (Nei et al., 1997, 2000) in the flowering plant bHLH gene family. There are two other theories of evolution, which were divergent evolution and concerted evolution models (Nei and Rooney, 2005). Divergent evolution theory fits well the hemoglobin gene family (Ingram, 1961), but does not fit the gene family like bHLH because sequence similarities among members of the rice bHLH family (Fig. 6) are higher than the ones between rice and Arabidopsis. Concerted evolution was proposed to explain the evolution of rRNAs because a large number of tandemly repeated genes were found (Brown et al., 1972). But there are some pseudogenes that have stop codon in the bHLH domain (excluded from our study; data not shown) or their expression signals cannot be detected (Fig. 6) so this cannot be explained by concerted evolution. However, the phylogenetic tree of the bHLH family fits well with the model of birth-and-death evolution. The branches with more than one AtbHLH or OsbHLH gene had likely experienced gene birth due to gene duplication events, for example, OsbHLH116, OsbHLH135, OsbHLH136, and OsbHLH137 should be created by tandem duplication, and OsbHLH035 and OsbHLH036 should be involved in polyploidy duplication, whereas those branches with only Arabidopsis or rice bHLH members probably had gene deaths. Our results indicate that the birth rate seemed greater than the death rate in flowering plant bHLH family. Similarly, other large transcription factor families often experience birth-and-death evolution, such as the MADS-box gene family in both rice and Arabidopsis (Nam et al., 2003, 2004). The Expression Pattern of OsbHLH and AtbHLH Genes Because the expression pattern of a gene is often correlated with its function, we examined the expression information of OsbHLHs and AtbHLHs using reverse transcription (RT)-PCR analysis, microarray experiments, expressed sequence tag (EST) data of the National Center for Biotechnology Information (NCBI), and massively parallel signature sequencing (MPSS) data. We first analyzed the expression of the OsbHLHs using RT-PCR with RNA from rice root, leaf, stem, flower, and seed (Fig. 6). The RT-PCR products of a number of OsbHLHs have been confirmed by sequencing, providing a measure of the reliability of the RT-PCR results of OsbHLHs expression. In addition, we searched for information on OsbHLHs through the EST data from NCBI and the expression data of MPSS. Even though the EST information was incomplete, we found EST data for 61 OsbHLHs (May, 2005). Forty seven of these OsbHLHs with EST data had positive RT-PCR results (Fig. 6), whereas a few OsbHLHs had EST data but were not detected by RT-PCR (Fig. 6, represented by the boxes with italic lines). Expression information from the MPSS database demonstrated that 93 OsbHLH genes are expressed (Fig. 6), but 38 bHLH genes with positive RT-PCR signals were not detected by MPSS. In total, after integrating these data together, 33 OsbHLHs, such as OsbHLH012 and OsbHLH028, were not detectably expressed according to RT-PCR, EST, and MPSS data (Fig. 6, boxes with X). These genes might be pseudogenes, or expressed at specific developmental stages or under special conditions. Furthermore, we summarize the expression of AtbHLHs from RT-PCR analysis (Heim et al., 2003), microarray data (Zhang et al., 2005), and the NCBI EST database (May, 2005; Fig. 6). Expression of 87 AtbHLHs was detected using RT-PCR, 112 by microarray analysis, 115 from MPSS database, and 85 had matching ESTs. Only 14 AtbHLH genes had no expression signal (Fig. 6). From Figure 6, 72 bHLH genes were expressed in all four tissues tested, suggesting that many bHLHs play regulatory roles at multiple development stages in rice and Arabidopsis. For example, both rice (OsbHLH031, OsbHLH032, and OsbHLH033) and Arabidopsis (AtbHLH046 and AtbHLH102) members of subfamily L are expressed in all four organs. It is possible that some members have preferential expression that is specific tissues or cells within these organs. Some bHLH genes show preferential expression, including 10 rice and Arabidopsis bHLH genes with expression in the root, one in the stem, nine in the leaf, and 30 in the flower and seed. This result indicates that members of this large family might take part in different biological processes in rice. It might be a common character of large transcription factor families, such as MYB family (Martin and Paz-Ares, 1997). In particular, members of subfamily Y, OsbHLH142, AtbHLH091, OsbHLH141, AtbHLH010, and AtbHLH089 had similar expression patterns in the flower and seed, supporting the hypothesis that these genes function in rice and Arabidopsis reproductive development. Therefore, genes with related sequences also tend to function in similar structures during development. Eleven rice bHLH genes have been characterized. For example, LAX (OsbHLH122) regulates shoot branching (Komatsu et al., 2003) and Udt1 (OsbHLH164) is critical for tapetum development (Jung et al., 2005). OSB1 (OsbHLH013) and OSB2 (OsbHLH016) are involved in anthocyanin biosynthesis (Sakamoto et al., 2001). Several genes are important for stress responses, including OsbHLH1 (i.e. OsbHLH062 in this study) in cold response (Wang et al., 2003), OsPTF1(OsbHLH096) in tolerance to phosphate starvation (Yi et al., 2005), and RERJ1 (OsbHLH006) in wound and drought responses (Kiribuchi et al., 2004, 2005). Also, OsBP-5 (OsbHLH102) is involved in transcriptional regulation of the rice Wx gene (Zhu et al., 2003), and Ra (OsbHLH013) and Rb (OsbHLH165; Hu et al., 1996) are homologs of the maize (Zea maize) Lc gene; OsMYC (OsbHLH009; Zhu et al., 2005) is a homolog of AtMYC2. Although the function of most rice bHLH genes is unknown, the phylogenetic and expression analyses provide a solid foundation for future functional studies in both rice and Arabidopsis. Identification of putative orthologs in different species will benefit the study of gene function, such as AtMYC2 (Abe et al., 2003) and OsMYC (Zhu et al., 2005), AMS (AtbHLH021; Sorensen et al., 2003) and TAPETUM DEGENERATION RETARDATION (TDR; OsbHLH005; N. Li, D. Zhang, X. Li, H. Liu, C. Yin, Z. Yuan, H. Chu, T. Wen, H. Huang, D. Luo, H. Ma, and D. Zhang, unpublished data). The identity and similarity of their full-length sequences were about 32% and 42%, respectively, and 70% and 76% within the two bHLH domains. AMS is critical for tapetal cell differentiation and likely regulates a postmeiotic transcriptional program supporting microspore development (Sorensen et al., 2003). Similar to ams, loss of function of TDR seems to result in delayed tapetal cell degradation and causes complete male sterility. CONCLUSION We have performed extensive analyses of the rice bHLH genes and compared them with Arabidopsis bHLH genes. We found that the rice and Arabidopsis bHLH genes form 25 subfamilies that are supported by phylogeny, additional protein motifs, and intron/exon structures. This phylogenetic analysis is in good agreement with previous results; at the same time, it presents new members in some existing subfamilies and defines new subfamilies by including additional bHLH genes from the completed rice genome sequence. The fact that the majority of subfamilies contain members from both rice and Arabidopsis suggests that the functions of most of bHLH genes are possibly conserved during angiosperm evolution. In addition, we estimate that the MRCA of monocots and eudicots had at least 66 bHLH genes. Phylogenetic analysis also suggests that there have been numerous gene birth events in this gene family during the evolution of flowering plants, in part due to putative genome duplications, compared with relatively few gene death events. The analysis of intron/exon structures revealed that most introns have conserved positions and phases, providing the evidence for the intron-early theory, and that multiple independent intron loss events likely have occurred during the evolution of flowering plants. Extensive expression data and available functional data support the hypothesis that bHLH genes in plants perform a variety of functions in different tissues at multiple developmental stages, also this summarized expression data of bHLH genes can be easily referred by readers. In short, our studies indicate that the ancient bHLH gene family has likely expanded considerably during flowering plant evolution to include many relatively young members, allowing both the conservation and divergence of gene function. Our results have established a solid foundation for future studies using molecular genetic, biochemical, physiological, and developmental approaches, which will likely reveal the functional significance of this dynamic and fascinating gene family. MATERIALS AND METHODS Database Search for Rice bHLH Genes To find the assembly of ESTs as candidate bHLH genes, one method of the BLAST (Altschul et al., 1990) program named TBLASTN (Altschul et al., 1997), provided by NCBI (http://www.ncbi.nlm.nih.gov) and TIGR (http://tigrblast.tigr.org/tgi) was performed. The default parameters in the TBLASTN program of TIGR and the wordsize 2, existence 10, extension 11 for NCBI were used to obtain the similar sequences as much as possible. We used the bHLH domain of a rice (Oryza sativa) bHLH gene (GeneBank number XM_463907) as a query sequence for TBLASTN. Each obtained sequence was then used as query sequence to perform PSI-BLAST (Altschul et al., 1997) in the version of release 4 of the TIGR pseudomolecules in rice (http://www.tigr.org/tdb/e2k1/osa1). The redundant sequences with different identification numbers and the same chromosome locus were removed from our data set. In addition, we have also obtained the sequences with Pfam number Pfam00010 containing the predicted HLH domain from the database of the TIGR. Based on the results of BLASTN searches in the rice genome database of NCBI using the predicted cDNA sequences of rice bHLH genes, we obtained the information of the chromosome locations of these genes. Also, we obtained the information of intron distribution pattern and intron/exon boundaries of these bHLH genes from both the results of BLASTN and the TIGR annotation database. The data sets that were retrieved from the TBLASTN search and the annotation database were combined as our rice data set. To further confirm the obtained cDNA sequences, the nucleotide sequences were translated into amino acid sequences, which were then examined for the bHLH domain using the hidden Markov model of SMART tool (http://smart.embl-heidelberg.de/; Schultz et al., 1998; Letunic et al., 2004). Multiple Sequence Alignments Multiple sequence alignments using the Multalin tool (http://prodes.toulouse.inra.fr/multalin/; Corpet, 1988) with the default parameters were performed on the obtained sequences of the OsbHLH domains and the bHLH domains flanking amino acids of the predicted bHLH proteins. The alignment was then adjusted manually by the location of the corresponding amino acids in the bHLH motif, and the similar amino acids were highlighted using GeneDoc (version 2.6.002) software (Pittsburgh Supercomputing Center, http://www.psc.edu/biomed/genedoc/; Nicholas et al., 1997). Then the amino acid sequences beside the bHLH domain were added to the aligned sequences and these sequences were aligned again to obtain the alignment with the motif adjacent to the bHLH domain. We used ClustalX (http://www-igbmc.u-strasbg.fr/BioInfo/; Thompson et al., 1997) as a secondary method to align sequences and to recheck the result. This alignment was also adjusted by manually using GeneDoc to align the motif common to OsbHLH members. To compare the evolutionary relationship of rice and Arabidopsis (Arabidopsis thaliana) bHLHs, we also performed Multalin tool using our obtained OsbHLH domains and 147 AtbHLH domains predicted by Toledo-Ortiz et al. (2003), and the combined OsbHLH and AtbHLH phylogenetic tree was obtained after manual adjustment of the alignment. To obtain information on the intron/exon structure, the cDNA alignment of bHLH domain sequences was obtained according to the amino acid sequence alignment, and the information of intron distribution pattern and intron splicing phase were derived with the aligned cDNA sequences. In addition, to search for other motifs shared by OsbHLH members, we also used the multiple EM for motif elicitation tool (version 3.0; http://bioweb.pasteur.fr/seqanal/motif/meme/; Bailey and Elkan, 1994) to find similar sequences between OsbHLH members. Tree Building A phylogenetic tree was constructed with the aligned rice bHLH protein sequences using MEGA (version 3.0; http://www.megasoftware.net/index.html; Kumar et al., 2004) and using the NJ method with the following parameters: poisson correction, pairwise deletion, and bootstrap (1,000 replicates; random seed). The amino acids variation rates were also obtained. Meanwhile, max parsimony method of the software PHYLIP (version 3.6; http://evolution.genetics.washington.edu/phylip.html; Felsenstein, 1989) was also used with bootstrap of 1,000 replicates to create a second phylogenetic tree to validate the results from the NJ method. The phylogenetic tree of the AtbHLH and OsbHLH domains was developed by using PHYLIP, and the resulting clades were assessed by bootstrap of 1,000 replicates. The Dayhoff PAM matrix in the protein distance algorithm and NJ method were employed to construct the unrooted tree. The constructed tree file was visualized by TreeView1.6.6 (http://taxonomy.zoology.gla.ac.uk/rod/rod.html). OsbHLH Locations and Segmental Duplication Following verification of the location of OsbHLH genes, the distribution of OsbHLH family members throughout the rice genome was drawn by the software MapInspect (http://www.dpw.wau.nl/pv/pub/MapComp/). For the detection of large segmental duplications, we consulted the duplicated blocks map provided by Xiong et al. (2005). On this map, each of the bHLH genes was localized on the corresponding chromosome using the coordinates from the genome sequence data (August, 2002 version). The software BioEdit (version 5.0.9; Hall, 1999) was used to analyze the homologs for similarity on the NJ phylogenetic tree of these rice bHLH genes. By performing ClustalX, we did protein sequence comparison of 40 pairs of OsbHLH genes involved in the potential genome duplication events. Expression Analysis of AtbHLHs and OsbHLHs We used RT-PCR to detect the expression patterns of the OsbHLHs. The PCR primers were designed to avoid the conserved region and to amplify products of 150 to 400 bp long. Primer sequences were shown in detail in Supplemental Table II. RNA of roots, leaves, stems, and flowers of rice japonica cultivar Nipponbare was isolated from the plants with 8 to 10 cm inflorescences. The plants grew in the greenhouse under long-day conditions. Total RNA was isolated as described (Chomczynski and Sacchi, 1987) and treated with DNaseI (Promega). Two micrograms of RNA was used for RT in a 20 μL reaction volume with M-MuLV reverse transcriptase (Fermentas) according to the manufacturer's recommendations with oligodT(18) primer. Thirty-two cycles of PCR amplification were performed. Each PCR pattern was verified by triple replicate experiments, and no template as negative control and Actin DNA fragment (551 bp) as positive control were employed for each gene. The resulting DNA bands of the expected size were considered as the expected DNA signal. To confirm this, 20 samples (OsbHLH numbers 001, 005, 006, 009, 032, 043, 056, 065, 073, 084, 090, 091, 092, 095, 104, 113, 138, 150, 152, and 153) were randomly selected for sequencing (by Invitrogen). EST data came from UniGene of NCBI (Wheeler et al., 2003; http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene. ). We also searched the expression data in the database of MPSS (Nakano et al., 2006a; http://mpss.udel.edu). The LOC (location) number of OsbHLH (shown in Table I) and AtbHLH (Toledo-Ortiz et al., 2003) was used to query the MPSS database that contains the signature information of the bHLH genes. ACKNOWLEDGMENTS We thank Haisheng Liu and Huayong Xu for useful suggestions at the beginning of this work, Mingjiao Chen for supply of the rice material, and Professor Mingsheng Chen for helpful suggestions on phylogenetic analysis. LITERATURE CITED Abe H, Urao T, Ito T, Seki M, Shinozaki K, Yamaguchi-Shinozaki K ( 2003 ) Arabidopsis AtMYC2 (bHLH) and AtMYB2 (MYB) function as transcriptional activators in abscisic acid signaling. Plant Cell 15 : 63 –78 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ ( 1990 ) Basic local alignment search tool. J Mol Biol 215 : 403 –410 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ ( 1997 ) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25 : 3389 –3402 Atchley WR, Fitch WM ( 1997 ) A natural classification of the basic helix-loop-helix class of transcription factors. Proc Natl Acad Sci USA 94 : 5172 –5176 Atchley WR, Terhalle W, Dress A ( 1999 ) Positional dependence, cliques, and predictive motifs in the bHLH protein domain. J Mol Evol 48 : 501 –516 Bailey PC, Martin C, Toledo-Ortiz G, Quail PH, Huq E, Heim MA, Jakoby M, Werber M, Weisshaar B ( 2003 ) Update on the basic helix-loop-helix transcription factor gene family in Arabidopsis thaliana. Plant Cell 15 : 2497 –2502 Bailey TL, Elkan C ( 1994 ) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2 : 28 –36 Brown DD, Wensink PC, Jordan E ( 1972 ) A comparison of the ribosomal DNA's of Xenopus laevis and Xenopus mulleri: the evolution of tandem genes. J Mol Biol 63 : 57 –73 Brownlie P, Ceska T, Lamers M, Romier C, Stier G, Teo H, Suck D ( 1997 ) The crystal structure of an intact human Max-DNA complex: new insights into mechanisms of transcriptional control. Structure 5 : 509 –520 Buck MJ, Atchley WR ( 2003 ) Phylogenetic analysis of plant basic helix-loop-helix proteins. J Mol Evol 56 : 742 –750 Cannon SB, Mitra A, Baumgarten A, Young ND, May G ( 2004 ) The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol 4 : 10 Chomczynski P, Sacchi N ( 1987 ) Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal Biochem 162 : 156 –159 Corpet F ( 1988 ) Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res 16 : 10881 –10890 Crozatier M, Valle D, Dubois L, Ibnsouda S, Vincent A ( 1996 ) Collier, a novel regulator of Drosophila head development, is expressed in a single mitotic domain. Curr Biol 6 : 707 –718 Ellenberger T, Fass D, Arnaud M, Harrison SC ( 1994 ) Crystal structure of transcription factor E47: E-box recognition by a basic region helix-loop-helix dimer. Genes Dev 8 : 970 –980 Eulgem T, Rushton PJ, Robatzek S, Somssich IE ( 2000 ) The WRKY superfamily of plant transcription factors. Trends Plant Sci 5 : 199 –206 Facchini LM, Penn LZ ( 1998 ) The molecular role of Myc in growth and transformation: recent discoveries lead to new insights. FASEB J 12 : 633 –651 Fairman R, Beran-Steed RK, Anthony-Cahill SJ, Lear JD, Stafford WF III, DeGrado WF, Benfield PA, Brenner SL ( 1993 ) Multiple oligomeric states regulate the DNA binding of helix-loop-helix peptides. Proc Natl Acad Sci USA 90 : 10429 –10433 Felsenstein J ( 1989 ) PHYLIP: Phylogeny Inference Package. Cladistics 5 : 164 –166 Fisher A, Caudy M ( 1998 ) The function of hairy-related bHLH repressor proteins in cell fate decisions. Bioessays 20 : 298 –306 Gale MD, Devos KM ( 1998 ) Comparative genetics in the grasses. Proc Natl Acad Sci USA 95 : 1971 –1974 Gilbert W ( 1987 ) The exon theory of genes. Cold Spring Harb Symp Quant Biol 52 : 901 –905 Goding CR ( 2000 ) Mitf from neural crest to melanoma: signal transduction and transcription in the melanocyte lineage. Genes Dev 14 : 1712 –1728 Guyot R, Keller B ( 2004 ) Ancestral genome duplication in rice. Genome 47 : 610 –614 Hall TA ( 1999 ) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41 : 95 –98 Heim MA, Jakoby M, Werber M, Martin C, Weisshaar B, Bailey PC ( 2003 ) The basic helix-loop-helix transcription factor family in plants: a genome-wide study of protein structure and functional diversity. Mol Biol Evol 20 : 735 –747 Henriksson M, Luscher B ( 1996 ) Proteins of the Myc network: essential regulators of cell growth and differentiation. Adv Cancer Res 68 : 109 –182 Hu J, Anderson B, Wessler SR ( 1996 ) Isolation and characterization of rice R genes: evidence for distinct evolutionary paths in rice and maize. Genetics 142 : 1021 –1031 Hua X, Yokoyama C, Wu J, Briggs MR, Brown MS, Goldstein JL, Wang X ( 1993 ) SREBP-2, a second basic-helix-loop-helix-leucine zipper protein that stimulates transcription by binding to a sterol regulatory element. Proc Natl Acad Sci USA 90 : 11603 –11607 Ingram VM ( 1961 ) Gene evolution and the haemoglobins. Nature 189 : 704 –708 IRGSP ( 2005 ) The map-based sequence of the rice genome. Nature 436 : 793 –800 Jiang C, Gu X, Peterson T ( 2004 ) Identification of conserved gene structures and carboxy-terminal motifs in the Myb gene family of Arabidopsis and Oryza sativa L. ssp. indica. Genome Biol 5 : R46 Jung KH, Han MJ, Lee YS, Kim YW, Hwang I, Kim MJ, Kim YK, Nahm BH, An G ( 2005 ) Rice Undeveloped Tapetum1 is a major regulator of early tapetum development. Plant Cell 17 : 2705 –2722 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D ( 2003 ) Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100 : 11484 –11489 Kiribuchi K, Jikumaru Y, Kaku H, Minami E, Hasegawa M, Kodama O, Seto H, Okada K, Nojiri H, Yamane H ( 2005 ) Involvement of the basic helix-loop-helix transcription factor RERJ1 in wounding and drought stress responses in rice plants. Biosci Biotechnol Biochem 69 : 1042 –1044 Kiribuchi K, Sugimori M, Takeda M, Otani T, Okada K, Onodera H, Ugaki M, Tanaka Y, Tomiyama-Akimoto C, Yamaguchi T, et al ( 2004 ) RERJ1, a jasmonic acid-responsive gene from rice, encodes a basic helix-loop-helix protein. Biochem Biophys Res Commun 325 : 857 –863 Komatsu K, Maekawa M, Ujiie S, Satake Y, Furutani I, Okamoto H, Shimamoto K, Kyozuka J ( 2003 ) LAX and SPA: major regulators of shoot branching in rice. Proc Natl Acad Sci USA 100 : 11765 –11770 Kosugi S, Ohashi Y ( 1997 ) PCF1 and PCF2 specifically bind to cis elements in the rice proliferating cell nuclear antigen gene. Plant Cell 9 : 1607 –1619 Kumar S, Tamura K, Nei M ( 2004 ) MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform 5 : 150 –163 Ledent V, Paquet O, Vervoort M ( 2002 ) Phylogenetic analysis of the human basic helix-loop-helix proteins. Genome Biol 3 : RESEARCH0030 Ledent V, Vervoort M ( 2001 ) The basic helix-loop-helix protein family: comparative genomics and phylogenetic analysis. Genome Res 11 : 754 –770 Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J, Ponting CP, Bork P ( 2004 ) SMART 4.0: towards genomic data integration. Nucleic Acids Res 32 : D142 –D144 Martin C, Paz-Ares J ( 1997 ) MYB transcription factors in plants. Trends Genet 13 : 67 –73 Martinez-Garcia JF, Huq E, Quail PH ( 2000 ) Direct targeting of light signals to a promoter element-bound transcription factor. Science 288 : 859 –863 Massari ME, Murre C ( 2000 ) Helix-loop-helix proteins: regulators of transcription in eucaryotic organisms. Mol Cell Biol 20 : 429 –440 Mehan MR, Freimer NB, Ophoff RA ( 2004 ) A genome-wide survey of segmental duplications that mediate common human genetic variation of chromosomal architecture. Hum Genomics 1 : 335 –344 Murre C, Bain G, van Dijk MA, Engel I, Furnari BA, Massari ME, Matthews JR, Quong MW, Rivera RR, Stuiver MH ( 1994 ) Structure and function of helix-loop-helix proteins. Biochim Biophys Acta 1218 : 129 –135 Murre C, McCaw PS, Baltimore D ( 1989 ) A new DNA binding and dimerization motif in immunoglobulin enhancer binding, daughterless, MyoD, and myc proteins. Cell 56 : 777 –783 Nakano M, Nobuta K, Vemaraju K, Tej SS, Skogen JW, Meyers BC ( 2006 a) Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA. Nucleic Acids Res 34 : D731 –D735 Nakano T, Suzuki K, Fujimura T, Shinshi H ( 2006 b) Genome-wide analysis of the ERF gene family in Arabidopsis and rice. Plant Physiol 140 : 411 –432 Nam J, dePamphilis CW, Ma H, Nei M ( 2003 ) Antiquity and evolution of the MADS-box gene family controlling flower development in plants. Mol Biol Evol 20 : 1435 –1447 Nam J, Kim J, Lee S, An G, Ma H, Nei M ( 2004 ) Type I MADS-box genes have experienced faster birth-and-death evolution than type II MADS-box genes in angiosperms. Proc Natl Acad Sci USA 101 : 1910 –1915 Nei M, Gu X, Sitnikova T ( 1997 ) Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc Natl Acad Sci USA 94 : 7799 –7806 Nei M, Rogozin IB, Piontkivska H ( 2000 ) Purifying selection and birth-and-death evolution in the ubiquitin gene family. Proc Natl Acad Sci USA 97 : 10866 –10871 Nei M, Rooney AP ( 2005 ) Concerted and birth-and-death evolution of multigene families. Annu Rev Genet 39 : 121 –152 Nesi N, Debeaujon I, Jond C, Pelletier G, Caboche M, Lepiniec L ( 2000 ) The TT8 gene encodes a basic helix-loop-helix domain protein required for expression of DFR and BAN genes in Arabidopsis siliques. Plant Cell 12 : 1863 –1878 Nicholas KB, Nicholas HBJ, Deerfield DWI ( 1997 ) Genedoc: analysis and visualization of genetic variation. Embnew News 4 : 14 Paris S, Longhi R, Santambrogio P, de Curtis I ( 2003 ) Leucine-zipper-mediated homo- and hetero-dimerization of GIT family p95-ARF GTPase-activating protein, PIX-, paxillin-interacting proteins 1 and 2. Biochem J 372 : 391 –398 Patthy L ( 1987 ) Intron-dependent evolution: preferred types of exons and introns. FEBS Lett 214 : 1 –7 Quail PH ( 2000 ) Phytochrome-interacting factors. Semin Cell Dev Biol 11 : 457 –466 Ramsay NA, Glover BJ ( 2005 ) MYB-bHLH-WD40 protein complex and the evolution of cellular diversity. Trends Plant Sci 10 : 63 –70 Robinson KA, Koepke JI, Kharodawala M, Lopes JM ( 2000 ) A network of yeast basic helix-loop-helix interactions. Nucleic Acids Res 28 : 4460 –4466 Sakamoto W, Ohmori T, Kageyama K, Miyazaki C, Saito A, Murata M, Noda K, Maekawa M ( 2001 ) The Purple leaf (Pl) locus of rice: the Pl(w) allele has a complex organization and includes two genes encoding basic helix-loop-helix proteins involved in anthocyanin biosynthesis. Plant Cell Physiol 42 : 982 –991 Schultz J, Milpetz F, Bork P, Ponting CP ( 1998 ) SMART, a simple modular architecture research tool: identification of signaling domains. Proc Natl Acad Sci USA 95 : 5857 –5864 Sharp PA ( 1981 ) Speculations on RNA splicing. Cell 23 : 643 –646 Shirakata M, Friedman FK, Wei Q, Paterson BM ( 1993 ) Dimerization specificity of myogenic helix-loop-helix DNA-binding factors directed by nonconserved hydrophilic residues. Genes Dev 7 : 2456 –2470 Sonnenfeld MJ, Delvecchio C, Sun X ( 2005 ) Analysis of the transcriptional activation domain of the Drosophila tango bHLH-PAS transcription factor. Dev Genes Evol 215 : 221 –229 Sorensen AM, Krober S, Unte US, Huijser P, Dekker K, Saedler H ( 2003 ) The Arabidopsis ABORTED MICROSPORES (AMS) gene encodes a MYC class transcription factor. Plant J 33 : 413 –423 Steidl C, Leimeister C, Klamt B, Maier M, Nanda I, Dixon M, Clarke R, Schmid M, Gessler M ( 2000 ) Characterization of the human and mouse HEY1, HEY2, and HEYL genes: cloning, mapping, and mutation screening of a new bHLH gene family. Genomics 66 : 195 –203 Sun XH, Copeland NG, Jenkins NA, Baltimore D ( 1991 ) Id proteins Id1 and Id2 selectively inhibit DNA binding by one class of helix-loop-helix proteins. Mol Cell Biol 11 : 5603 –5611 Swanson HI, Chan WK, Bradfield CA ( 1995 ) DNA binding specificities and pairing rules of the Ah receptor, ARNT, and SIM proteins. J Biol Chem 270 : 26292 –26302 Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG ( 1997 ) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25 : 4876 –4882 Toledo-Ortiz G, Huq E, Quail PH ( 2003 ) The Arabidopsis basic/helix-loop-helix transcription factor family. Plant Cell 15 : 1749 –1770 Tong Q, Xing S, Jhiang SM ( 1997 ) Leucine zipper-mediated dimerization is essential for the PTC1 oncogenic activity. J Biol Chem 272 : 9043 –9047 Wang YJ, Zhang ZG, He XJ, Zhou HL, Wen YX, Dai JX, Zhang JS, Chen SY ( 2003 ) A rice transcription factor OsbHLH1 is involved in cold stress response. Theor Appl Genet 107 : 1402 –1409 Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Tatusova TA, et al ( 2003 ) Database resources of the National Center for Biotechnology. Nucleic Acids Res 31 : 28 –33 Xiong Y, Liu T, Tian C, Sun S, Li J, Chen M ( 2005 ) Transcription factors in rice: a genome-wide comparative analysis between monocots and eudicots. Plant Mol Biol 59 : 191 –203 Yi K, Wu Z, Zhou J, Du L, Guo L, Wu Y, Wu P ( 2005 ) OsPTF1, a novel transcription factor involved in tolerance to phosphate starvation in rice. Plant Physiol 138 : 2087 –2096 Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al ( 2002 ) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296 : 79 –92 Zhang X, Feng B, Zhang Q, Zhang D, Altman N, Ma H ( 2005 ) Genome-wide expression profiling and identification of gene activities during early flower development in Arabidopsis. Plant Mol Biol 58 : 401 –419 Zhu Y, Cai XL, Wang ZY, Hong MM ( 2003 ) An interaction between a MYC protein and an EREBP protein is involved in transcriptional regulation of the rice Wx gene. J Biol Chem 278 : 47803 –47811 Zhu ZF, Sun CQ, Fu YC, Qian XY, Yang JS, Wang XK ( 2005 ) Isolation and analysis of a novel MYC gene from rice. Yi Chuan Xue Bao 32 : 393 –398 Author notes 1 This work was supported by the funds from the National Key Basic Research Developments Program of the Ministry of Science and Technology, People's Republic of China (2001CB109002 and 2005CB120802), National 863 High-Tech Project (2005AA2710330), Shanghai Municipal Committee of Science and Technology (03JC14061), the Program for New Century Excellent Talents in University (NCET–04–0403), the Shuguang Scholarship (04SG15), the Shanghai Institutes of Biological Sciences (Reproductive Development Project), and the U.S. Department of Energy (DE–FG02–02ER15332). 2 These authors contributed equally to the paper. * Corresponding author; e-mail [email protected]; fax 86–21–34204869. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Dabing Zhang ([email protected]). [W] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.106.080580. © 2006 American Society of Plant Biologists This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

1989

1988

1987

1986

1985

1984

1983

1982

1981

1980

1979

1978

1977

1976

1975

1974

1973

1972

1971

1970

1969

1968

1967

1966

1965

1964

1963

1962

1961

1960

1959

1958

1957

1956

1955

1954

1953

1952

1951

1950

1949

1948

1947

1946

1945