Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Identification of plant microRNA homologs

Identification of plant microRNA homologs Vol. 22 no. 3 2006, pages 359–360 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bti802 Sequence analysis Identification of plant microRNA homologs 1, 1 2 2 Tobias Dezulian , Michael Remmert , Javier F. Palatnik , Detlef Weigel and Daniel H. Huson 1 2 Center for Bioinformatics Tu¨ bingen, Tu¨ bingen University, Germany and Max Planck Institute for Developmental Biology, Tu¨ bingen, Germany Received on August 12, 2005; revised on November 10, 2005; accepted on November 24, 2005 Advance Access publication November 29, 2005 Associate Editor: Charlie Hodgman ABSTRACT MiRNA genes in plants are grouped into families that yield (almost) identical miRNAs. Currently, 43 families containing Summary: MicroRNAs (miRNAs) are a recently discovered class of 513 miRNA genes across 7 plant species are listed in release 7.0 non-coding RNAs that regulate gene and protein expression in plants of the MicroRNA registry (Griffiths-Jones, 2004, http://microrna. and animals. MiRNAs have so far been identified mostly by specific sanger.ac.uk) responsible for name assignment of published cloning of small RNA molecules, complemented by computational miRNAs. methods. We present a computational identification approach that is Here, we present an approach and implementation (‘micro- able to identify candidate miRNA homologs in any set of sequences, HARVESTER’) that can identify candidate miRNA homologs given a query miRNA. The approach is based on a sequence similarity based on a query miRNA, with excellent sensitivity and specificity. search step followed by a set of structural filters. The microHARVESTER takes advantage of the conservation Availability: microHARVESTER is offered as a web-service and pattern typical for miRNA genes: the (mature) miRNA is most additionally as source code upon request at http://www-ab. conserved since its sequence is crucial for target-interaction; the informatik.uni-tuebingen.de/software/microHARVESTER miRNA is less conserved but restricted by the need to extensively Contact: dezulian@informatik.uni-tuebingen.de base-pair with the miRNA; the rest of the miRNA gene can be less conserved. Our approach uses a BLAST sequence similarity search MicroRNAs (miRNAs) are small RNAs 20–24 nt in length. They to first generate a set of candidates which is then rigorously refined perform important regulatory roles in both plants and animals. The by a series of filters—exploiting structural features specific to plant miRNA biogenesis and effector pathways share components with miRNAs to achieve specificity. The output of the tool consists of a PDF overview document that is generated for each miRNA query. those for another class of small RNAs, short interfering RNAs It presents candidate miRNA homologs along with figures of their (siRNAs), and both are currently under intense scrutiny (Susi predicted structure and a color-coded alignment. et al., 2004). Biogenesis of miRNAs starts with the synthesis of Given a known miRNA (miRNA precursor sequence plus mature a large primary transcript (Bartel, 2004; He and Hannon, 2004), miRNA sequence) as input for our search we use the precursor as a which contains a double-stranded miRNA precursor that adopts a query for a sequence similarity search against a set of sequences fold-back structure by complementary base pairing. In plants, the (e.g. a set of EST sequences or read from a new plant genome) to miRNA precursor is degraded in the nucleus by the RNAse III generate a set of candidate homologs. Since the (mature) miRNA enzyme DICER-LIKE1, which releases a short RNA duplex. sequence is very much conserved across large evolutionary dis- This duplex is formed by the miRNA along with the complementary tances (Axtell and Bartel, 2005), using BLAST (Altschul et al., fragment, called miRNA , from the other arm of the precursor. The 1997) with the very large E-value cutoff of 10 and minimal miRNA and the miRNA are offset by 2 nt owing to the staggered word size of 7, one can generate a hit for almost all miRNA homo- cuts of DICER-LIKE1. Finally, mature miRNAs are selected from logs at the price of many false positives. In the first filter step, we the RNA duplex and incorporated into RNA induced silencing discard those sequences of the candidate set whose aligned seg- complexes (RISCs), to which they provide sequence specificity. ments do not span most of the mature segment of the query. In a MiRNAs recognize completely or partially complementary sequences in target mRNAs and guide them to cleavage or trans- second filter step, we apply a modified Smith–Waterman pairwise lational arrest. Animal miRNAs typically recognize several target alignment algorithm (Smith and Waterman, 1981) to precisely sequences located in the 3 -UTR and inhibit their translation, determine the mature sequence in the candidate precursor from whereas plant miRNAs usually recognize one motif in the coding the optimal alignment of the query mature sequence against the region of their targets and affect their stability. It is thought that the corresponding segment of the BLAST hit. We discard a candidate better complementarity between plant miRNAs and their targets if the length of the mature sequences differs by >2 nt. In a third filter favors the latter mechanism. In plants, miRNAs regulate diverse step, we predict the minimal free energy structure of the candidate genes and pathways, such as development, hormone signaling, sequence using RNAfold (Hofacker, 1994) and determine its putat- stress response and trans-acting siRNAs (Allen et al., 2005). ive miRNA sequence. We discard a candidate if more than six nucleotides of its miRNA are not predicted to form bonds with its mature miRNA (keeping in mind the 2 nt offset between miRNA To whom correspondence should be addressed. The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 359 T.Dezulian et al. MicroHARVESTER is able to identify plant miRNA homologs with good sensitivity and specificity in any set of sequences, for a given query miRNA. Using an EST database as the sequence pool offers the additional assurance that the predicted miRNA homologs are actually expressed (Zhang et al., 2005). Nevertheless, this approach has also proven useful on databases of genomic DNA. Successful approaches for plant miRNA homolog identification have previously been described (Maher et al., 2004; Adai et al., 2005). However, microHARVESTER is the first such tool that is available through a web interface. It complements a very recently published animal miRNA homolog identification approach (Wang AB C et al., 2005). In addition to the original purpose of miRNA homolog identification, microHARVESTER can be effectively used to screen Fig. 1. (A) The multiple sequence alignment shows the reliability of each candidate miRNA sets derived from comparative approaches to alignment position. Darker colors indicate better alignment scores. Dark and identify representatives of new miRNA families. In this setting, light frames mark the positions coding for the miRNA and miRNA , respec- tively. (B) The minimal free energy structure for an EST harboring a miRNA each candidate miRNA is used as the query and the number and homolog candidate is depicted; in the enlarged section (C), miRNA and divergence pattern of resulting putative homologs as well as their miRNA are marked on the right and left hand side, respectively. structure provides clues to the miRNA-likeness of the query. Conflict of Interest: none declared. and miRNA ). From a selection of all candidates that pass each filter we construct a multiple sequence alignment, using T-Coffee REFERENCES (Notredame et al., 2000), of a region that includes the miRNA, the miRNA and the ‘loop’ sequence in between the miRNA and Adai,A., Johnson,C., Mlotshwa,S., Archer-Evans,S., Manocha,V., Vance,V. and the miRNA . The reliability of each position of this multiple align- Sundaresan,V. (2005) Computational prediction of miRNAs in Arabidopsis thaliana. Genome Res., 15, 78–91. ment is visualized using a color scheme. An overview PDF docu- Allen,E., Xie,Z., Gustafson,A.M. and Carrington,J.C. (2005) microRNA-directed ment is generated, which contains this multiple sequence alignment. phasing during trans-acting siRNA biogenesis in plants. Cell, 121, 207–221. In addition, it provides for each putative miRNA homolog: a figure Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and of its minimal free energy structure with the miRNA and the Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. miRNA highlighted in dark and light shades, respectively, along Axtell,M.J. and Bartel,D.P. (2005) Antiquity of MicroRNAs and their targets in land with its database accession (Fig. 1). plants. Plant Cell, 17, 1658–1673. In order to assess sensitivity and specificity of this approach, we Bartel,D.P. (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, applied the microHARVESTER to the fully sequenced dicot 116, 281–297. Arabidopsis thaliana (Ath) genome using a set of query sequences Griffiths-Jones,S. (2004) The microRNA Registry. Nucleic Acids Res., 32, D109–D111. from the monocot Zea mays (Zma). For each of the currently avail- He,L. and Hannon,G.J. (2004) MicroRNAs: small RNAs with a big role in gene able (MicroRNA registry release 7.0) 18 miRNA families shared by regulation. Nat. Rev. Genet., 5, 522–531. Ath and Zma we selected one Zma miRNA gene at random. Using Hofacker,I.L., Fontana,W., Stadler,P.F., Bonhoeffer,L.S., Tacker,M. and Schuster,P. this query set, the microHARVESTER identified 67 of the 75 Ath (1994) Fast folding and comparison of RNA secondary structures. Monatsh. Chem., 125, 167–188. miRNA genes of these families—at least one in each family—at the Maher,C., Timmermans,M., Stein,L. and Ware,D. (2004) Identifying MicroRNAs in price of five false positives. Plant Genomes. Proceedings of the 2004 IEEE Computational Systems Bioinform- MicroHARVESTER is available as a web-service at www-ab. atics Conference (CSB 2004) Stanford, CA, pp. 718–723. informatik.uni-tuebingen.de/software/microHARVESTER. Up to Notredame,C., Higgins,D.G. and Heringa,J. (2000) T-Coffee: A novel method for fast five miRNA queries may be submitted upon which a job id and and accurate multiple sequence alignment. J. Mol. Biol., 302, 205–217. Smith,T.F. and Waterman,M.S. (1981) Identification of common molecular URL will be issued and the resulting PDFs will be downloadable subsequences. J. Mol. Biol., 147, 195–197. after job completion. Source code for the microHARVESTER is Susi,P., Hohkuri,M., Wahlroos,T. and Kilby,N.J. (2004) Characteristics of RNA silen- also available from the authors upon request. In order to run this cing in plants: similarities and differences across kingdoms. Plant Mol. Biol., 54, standalone version on a standard linux operating system, addition- 157–174. Wang,X., Zhang,J., Li,F., Gu,J., He,T., Zhang,X. and Li,Y. (2005) MicroRNA ally the following free software is needed: Java 1.5, NCBI BLAST, identification based on sequence and structure alignment. Bioinformatics, 21, RNAfold, T-Coffee plus a standard LaTeX installation. Results 3610–3614. can optionally be stored in a mySQL database. Note that when Zhang,B.H., Pan,X.P., Wang,Q.L., Cobb,G.P. and Anderson,T.A. (2005) Identification constructing the BLAST database, large input sequences are split and characterization of new plant microRNAs using EST analysis. Cell Res., 15, into overlapping fragments for better retrieval efficiency. 336–360. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

Identification of plant microRNA homologs

Loading next page...
 
/lp/oxford-university-press/identification-of-plant-microrna-homologs-hLP0KuW1qV

References (14)

Publisher
Oxford University Press
Copyright
© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/bti802
pmid
16317073
Publisher site
See Article on Publisher Site

Abstract

Vol. 22 no. 3 2006, pages 359–360 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bti802 Sequence analysis Identification of plant microRNA homologs 1, 1 2 2 Tobias Dezulian , Michael Remmert , Javier F. Palatnik , Detlef Weigel and Daniel H. Huson 1 2 Center for Bioinformatics Tu¨ bingen, Tu¨ bingen University, Germany and Max Planck Institute for Developmental Biology, Tu¨ bingen, Germany Received on August 12, 2005; revised on November 10, 2005; accepted on November 24, 2005 Advance Access publication November 29, 2005 Associate Editor: Charlie Hodgman ABSTRACT MiRNA genes in plants are grouped into families that yield (almost) identical miRNAs. Currently, 43 families containing Summary: MicroRNAs (miRNAs) are a recently discovered class of 513 miRNA genes across 7 plant species are listed in release 7.0 non-coding RNAs that regulate gene and protein expression in plants of the MicroRNA registry (Griffiths-Jones, 2004, http://microrna. and animals. MiRNAs have so far been identified mostly by specific sanger.ac.uk) responsible for name assignment of published cloning of small RNA molecules, complemented by computational miRNAs. methods. We present a computational identification approach that is Here, we present an approach and implementation (‘micro- able to identify candidate miRNA homologs in any set of sequences, HARVESTER’) that can identify candidate miRNA homologs given a query miRNA. The approach is based on a sequence similarity based on a query miRNA, with excellent sensitivity and specificity. search step followed by a set of structural filters. The microHARVESTER takes advantage of the conservation Availability: microHARVESTER is offered as a web-service and pattern typical for miRNA genes: the (mature) miRNA is most additionally as source code upon request at http://www-ab. conserved since its sequence is crucial for target-interaction; the informatik.uni-tuebingen.de/software/microHARVESTER miRNA is less conserved but restricted by the need to extensively Contact: dezulian@informatik.uni-tuebingen.de base-pair with the miRNA; the rest of the miRNA gene can be less conserved. Our approach uses a BLAST sequence similarity search MicroRNAs (miRNAs) are small RNAs 20–24 nt in length. They to first generate a set of candidates which is then rigorously refined perform important regulatory roles in both plants and animals. The by a series of filters—exploiting structural features specific to plant miRNA biogenesis and effector pathways share components with miRNAs to achieve specificity. The output of the tool consists of a PDF overview document that is generated for each miRNA query. those for another class of small RNAs, short interfering RNAs It presents candidate miRNA homologs along with figures of their (siRNAs), and both are currently under intense scrutiny (Susi predicted structure and a color-coded alignment. et al., 2004). Biogenesis of miRNAs starts with the synthesis of Given a known miRNA (miRNA precursor sequence plus mature a large primary transcript (Bartel, 2004; He and Hannon, 2004), miRNA sequence) as input for our search we use the precursor as a which contains a double-stranded miRNA precursor that adopts a query for a sequence similarity search against a set of sequences fold-back structure by complementary base pairing. In plants, the (e.g. a set of EST sequences or read from a new plant genome) to miRNA precursor is degraded in the nucleus by the RNAse III generate a set of candidate homologs. Since the (mature) miRNA enzyme DICER-LIKE1, which releases a short RNA duplex. sequence is very much conserved across large evolutionary dis- This duplex is formed by the miRNA along with the complementary tances (Axtell and Bartel, 2005), using BLAST (Altschul et al., fragment, called miRNA , from the other arm of the precursor. The 1997) with the very large E-value cutoff of 10 and minimal miRNA and the miRNA are offset by 2 nt owing to the staggered word size of 7, one can generate a hit for almost all miRNA homo- cuts of DICER-LIKE1. Finally, mature miRNAs are selected from logs at the price of many false positives. In the first filter step, we the RNA duplex and incorporated into RNA induced silencing discard those sequences of the candidate set whose aligned seg- complexes (RISCs), to which they provide sequence specificity. ments do not span most of the mature segment of the query. In a MiRNAs recognize completely or partially complementary sequences in target mRNAs and guide them to cleavage or trans- second filter step, we apply a modified Smith–Waterman pairwise lational arrest. Animal miRNAs typically recognize several target alignment algorithm (Smith and Waterman, 1981) to precisely sequences located in the 3 -UTR and inhibit their translation, determine the mature sequence in the candidate precursor from whereas plant miRNAs usually recognize one motif in the coding the optimal alignment of the query mature sequence against the region of their targets and affect their stability. It is thought that the corresponding segment of the BLAST hit. We discard a candidate better complementarity between plant miRNAs and their targets if the length of the mature sequences differs by >2 nt. In a third filter favors the latter mechanism. In plants, miRNAs regulate diverse step, we predict the minimal free energy structure of the candidate genes and pathways, such as development, hormone signaling, sequence using RNAfold (Hofacker, 1994) and determine its putat- stress response and trans-acting siRNAs (Allen et al., 2005). ive miRNA sequence. We discard a candidate if more than six nucleotides of its miRNA are not predicted to form bonds with its mature miRNA (keeping in mind the 2 nt offset between miRNA To whom correspondence should be addressed. The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 359 T.Dezulian et al. MicroHARVESTER is able to identify plant miRNA homologs with good sensitivity and specificity in any set of sequences, for a given query miRNA. Using an EST database as the sequence pool offers the additional assurance that the predicted miRNA homologs are actually expressed (Zhang et al., 2005). Nevertheless, this approach has also proven useful on databases of genomic DNA. Successful approaches for plant miRNA homolog identification have previously been described (Maher et al., 2004; Adai et al., 2005). However, microHARVESTER is the first such tool that is available through a web interface. It complements a very recently published animal miRNA homolog identification approach (Wang AB C et al., 2005). In addition to the original purpose of miRNA homolog identification, microHARVESTER can be effectively used to screen Fig. 1. (A) The multiple sequence alignment shows the reliability of each candidate miRNA sets derived from comparative approaches to alignment position. Darker colors indicate better alignment scores. Dark and identify representatives of new miRNA families. In this setting, light frames mark the positions coding for the miRNA and miRNA , respec- tively. (B) The minimal free energy structure for an EST harboring a miRNA each candidate miRNA is used as the query and the number and homolog candidate is depicted; in the enlarged section (C), miRNA and divergence pattern of resulting putative homologs as well as their miRNA are marked on the right and left hand side, respectively. structure provides clues to the miRNA-likeness of the query. Conflict of Interest: none declared. and miRNA ). From a selection of all candidates that pass each filter we construct a multiple sequence alignment, using T-Coffee REFERENCES (Notredame et al., 2000), of a region that includes the miRNA, the miRNA and the ‘loop’ sequence in between the miRNA and Adai,A., Johnson,C., Mlotshwa,S., Archer-Evans,S., Manocha,V., Vance,V. and the miRNA . The reliability of each position of this multiple align- Sundaresan,V. (2005) Computational prediction of miRNAs in Arabidopsis thaliana. Genome Res., 15, 78–91. ment is visualized using a color scheme. An overview PDF docu- Allen,E., Xie,Z., Gustafson,A.M. and Carrington,J.C. (2005) microRNA-directed ment is generated, which contains this multiple sequence alignment. phasing during trans-acting siRNA biogenesis in plants. Cell, 121, 207–221. In addition, it provides for each putative miRNA homolog: a figure Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and of its minimal free energy structure with the miRNA and the Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. miRNA highlighted in dark and light shades, respectively, along Axtell,M.J. and Bartel,D.P. (2005) Antiquity of MicroRNAs and their targets in land with its database accession (Fig. 1). plants. Plant Cell, 17, 1658–1673. In order to assess sensitivity and specificity of this approach, we Bartel,D.P. (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, applied the microHARVESTER to the fully sequenced dicot 116, 281–297. Arabidopsis thaliana (Ath) genome using a set of query sequences Griffiths-Jones,S. (2004) The microRNA Registry. Nucleic Acids Res., 32, D109–D111. from the monocot Zea mays (Zma). For each of the currently avail- He,L. and Hannon,G.J. (2004) MicroRNAs: small RNAs with a big role in gene able (MicroRNA registry release 7.0) 18 miRNA families shared by regulation. Nat. Rev. Genet., 5, 522–531. Ath and Zma we selected one Zma miRNA gene at random. Using Hofacker,I.L., Fontana,W., Stadler,P.F., Bonhoeffer,L.S., Tacker,M. and Schuster,P. this query set, the microHARVESTER identified 67 of the 75 Ath (1994) Fast folding and comparison of RNA secondary structures. Monatsh. Chem., 125, 167–188. miRNA genes of these families—at least one in each family—at the Maher,C., Timmermans,M., Stein,L. and Ware,D. (2004) Identifying MicroRNAs in price of five false positives. Plant Genomes. Proceedings of the 2004 IEEE Computational Systems Bioinform- MicroHARVESTER is available as a web-service at www-ab. atics Conference (CSB 2004) Stanford, CA, pp. 718–723. informatik.uni-tuebingen.de/software/microHARVESTER. Up to Notredame,C., Higgins,D.G. and Heringa,J. (2000) T-Coffee: A novel method for fast five miRNA queries may be submitted upon which a job id and and accurate multiple sequence alignment. J. Mol. Biol., 302, 205–217. Smith,T.F. and Waterman,M.S. (1981) Identification of common molecular URL will be issued and the resulting PDFs will be downloadable subsequences. J. Mol. Biol., 147, 195–197. after job completion. Source code for the microHARVESTER is Susi,P., Hohkuri,M., Wahlroos,T. and Kilby,N.J. (2004) Characteristics of RNA silen- also available from the authors upon request. In order to run this cing in plants: similarities and differences across kingdoms. Plant Mol. Biol., 54, standalone version on a standard linux operating system, addition- 157–174. Wang,X., Zhang,J., Li,F., Gu,J., He,T., Zhang,X. and Li,Y. (2005) MicroRNA ally the following free software is needed: Java 1.5, NCBI BLAST, identification based on sequence and structure alignment. Bioinformatics, 21, RNAfold, T-Coffee plus a standard LaTeX installation. Results 3610–3614. can optionally be stored in a mySQL database. Note that when Zhang,B.H., Pan,X.P., Wang,Q.L., Cobb,G.P. and Anderson,T.A. (2005) Identification constructing the BLAST database, large input sequences are split and characterization of new plant microRNAs using EST analysis. Cell Res., 15, into overlapping fragments for better retrieval efficiency. 336–360.

Journal

BioinformaticsOxford University Press

Published: Nov 29, 2005

There are no references for this article.