Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Constrained pattern matching

Constrained pattern matching Constrained Pattern Matching YONGWOOK CHOI, J. Craig Venter Institute WOJCIECH SZPANKOWSKI, Purdue University Constrained sequences are strings satisfying certain additional structural restrictions (e.g., some patterns are forbidden). They nd applications in communication, digital recording, and biology. In this article, we restrict our attention to the so-called (d, k ) constrained binary sequences in which any run of zeros must be of length at least d and at most k , where 0 ¤ d < k . In many applications, one needs to know the number of occurrences of a given pattern w in such sequences, for which we coin the term constrained pattern matching. For a given word w, we rst estimate the mean and the variance of the number of occurrences of w in a (d, k ) sequence generated by a memoryless source. Then we present the central limit theorem and large deviations results. As a by-product, we enumerate asymptotically the number of (d, k ) sequences with exactly r occurrences of w, and compute Shannon entropy of (d, k ) sequences with a given number of occurrences of w. We also apply our results to detect under- and overrepresented patterns in neuronal data (spike http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Algorithms (TALG) Association for Computing Machinery

Loading next page...
 
/lp/association-for-computing-machinery/constrained-pattern-matching-Sr0nRxE4fO

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Association for Computing Machinery
Copyright
Copyright © 2011 by ACM Inc.
ISSN
1549-6325
DOI
10.1145/1921659.1921671
Publisher site
See Article on Publisher Site

Abstract

Constrained Pattern Matching YONGWOOK CHOI, J. Craig Venter Institute WOJCIECH SZPANKOWSKI, Purdue University Constrained sequences are strings satisfying certain additional structural restrictions (e.g., some patterns are forbidden). They nd applications in communication, digital recording, and biology. In this article, we restrict our attention to the so-called (d, k ) constrained binary sequences in which any run of zeros must be of length at least d and at most k , where 0 ¤ d < k . In many applications, one needs to know the number of occurrences of a given pattern w in such sequences, for which we coin the term constrained pattern matching. For a given word w, we rst estimate the mean and the variance of the number of occurrences of w in a (d, k ) sequence generated by a memoryless source. Then we present the central limit theorem and large deviations results. As a by-product, we enumerate asymptotically the number of (d, k ) sequences with exactly r occurrences of w, and compute Shannon entropy of (d, k ) sequences with a given number of occurrences of w. We also apply our results to detect under- and overrepresented patterns in neuronal data (spike

Journal

ACM Transactions on Algorithms (TALG)Association for Computing Machinery

Published: Mar 1, 2011

There are no references for this article.