Access the full text.
Sign up today, get DeepDyve free for 14 days.
Michael Bender, Martín Farach-Colton (2000)
The LCA Problem Revisited
(2003)
1 – Basic principles
Document Retrieval on General Sequence Collections
W. Hon, R. Shah, Sharma Thankachan, J. Vitter (2010)
String Retrieval for Multi-pattern Queries
T. Gagie, Juha Kärkkäinen, G. Navarro, S. Puglisi (2010)
Colored range queries and document retrievalTheor. Comput. Sci., 483
CoRR arXiv:1307.6789
R. González, G. Navarro (2007)
Compressed Text Indexes with Fast Locate
R. Raman, S. Rao (2003)
Succinct Dynamic Dictionaries and Trees
Niko Välimäki, V. Mäkinen (2007)
Space-Efficient Algorithms for Document Retrieval
P. Gund, E. Maliski, M. Brown (2005)
Editorial overview: whither the pharmaceutical industry?Current opinion in drug discovery & development, 8 3
J. Fischer, Volker Heun (2011)
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static ArraysSIAM J. Comput., 40
Ravi Janardan, M. Lopez (1993)
Generalized intersection searching problemsInt. J. Comput. Geom. Appl., 3
P. Weiner (1973)
Linear Pattern Matching Algorithms
Jérémy Barbay, Francisco Claude, T. Gagie, G. Navarro, Yakov Nekrich (2009)
Efficient Fully-Compressed Sequence RepresentationsAlgorithmica, 69
G. Manzini (2001)
An analysis of the Burrows-Wheeler transformJ. ACM, 48
Martín Farach-Colton (1997)
Optimal Suffix Tree Construction with Large Alphabets
11)−rank 0 (B v , 5−1) = 5. Since 5 ≥ 4 = q, the qth element of
G. Navarro, Ricardo Baeza-Yates, E. Sutinen, J. Tarhio (2001)
Indexing methods for approximate string matchingIEEE Data(base) Engineering Bulletin, 24
R. Grossi, Ankur Gupta, J. Vitter (2003)
High-order entropy-compressed text indexes
M. Crochemore, W. Rytter (2002)
Jewels of stringology : text algorithms
W. Hon, Manish Patil, R. Shah, Shih-Bin Wu (2009)
Efficient index for retrieving top-k most frequent documentsJ. Discrete Algorithms, 8
A. Jørgensen, Kasper Larsen (2011)
Range selection and median: tight cell probe lower bounds and adaptive data structures
Dekel Tsur (2013)
Top-k document retrieval in optimal spaceInf. Process. Lett., 113
L. Hui (1992)
Color Set Size Problem with Application to String Matching
W. Hon, R. Shah, Sharma Thankachan, J. Vitter (2012)
Document Listing for Queries with Excluded Pattern
K. Mehlhorn (1984)
Data Structures and Algorithms 1, 1
R. Grossi, Alessio Orlandi, R. Raman (2010)
Optimal Trade-Offs for Succinct String Indexes
J. Vuillemin (1980)
A unifying look at data structuresCommun. ACM, 23
This is done in time t SA with the CSAs of A and A d
共立出版株式会社 (1978)
コンピュータ・サイエンス : ACM computing surveys
D. Belazzougui, P. Boldi, R. Pagh, S. Vigna (2009)
Monotone minimal perfect hashing: searching a sorted table with O(1) accesses
D. Belazzougui, G. Navarro, Daniel Valenzuela (2011)
Improved compressed indexes for full-text document retrieval
John Smith (1969)
TablesNeuromuscular Disorders, 11
Prosenjit Gupta, Ravi Janardan, M. Smid (1993)
Further Results on Generalized Intersection Searching Problems: Counting, Reporting, and Dynamization
M. Brown (2005)
Editorial opinion: chemoinformatics - a ten year update.Current opinion in drug discovery & development, 8 3
P. Ferragina, R. González, G. Navarro, Rossano Venturini (2007)
Compressed text indexes: From theory to practiceACM J. Exp. Algorithmics, 13
Jérémy Barbay, Meng He, J. Munro, S. Rao (2007)
A Succinct Indexes for Strings , Binary Relations and Multi-labeled Trees
P. Ko, S. Aluru (2003)
Space efficient linear time construction of suffix arraysJ. Discrete Algorithms, 3
D. Harel, R. Tarjan (1984)
Fast Algorithms for Finding Nearest Common AncestorsSIAM J. Comput., 13
U. Manber, E. Myers (1993)
Suffix arrays: a new method for on-line string searchesSIAM J. Comput., 22
D. Belazzougui, G. Navarro (2011)
Alphabet-Independent Compressed Text Indexing
Marek Karpinski, Yakov Nekrich (2010)
Top-K color queries for document retrievalArXiv, abs/1007.1361
Hagai Cohen, E. Porat (2009)
Fast set intersection and two-patterns matchingArXiv, abs/0909.5146
A. Aho, J. Hopcroft, J. Ullman (1974)
The Design and Analysis of Computer Algorithms
D. Gusfield (1997)
Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology
H. Gabow, J. Bentley, R. Tarjan (1984)
Scaling and related techniques for geometry problems
Beat Gfeller, P. Sanders (2009)
Towards Optimal Range Medians
B. Hsu, G. Ottaviano (2013)
Space-efficient data structures for Top-k completionProceedings of the 22nd international conference on World Wide Web
Timothy Chan, Stephane Durocher, Kasper Larsen, Jason Morrison, Bryan Wilkinson (2011)
Linear-Space Data Structures for Range Mode Query in ArraysTheory of Computing Systems, 55
A. Fariña, N. Brisaboa, G. Navarro, Francisco Claude, Á. Places, Eduardo Rodríguez (2012)
Word-based self-indexes for natural language textACM Trans. Inf. Syst., 30
W. Szpankowski (1993)
A Generalized Suffix Tree and its (Un)expected Asymptotic BehaviorsSIAM J. Comput., 22
(1996)
Proc. 16th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). LNCS 1180
G. Navarro, Sharma Thankachan (2013)
Top-k Document Retrieval in Compact Space and Near-Optimal Time
K. Sadakane, G. Navarro (2010)
Fully-functional succinct trees
Alexander Golynski, J. Munro, S. Rao (2006)
Rank/select operations on large alphabets: a tool for text indexing
G. Navarro, Yakov Nekrich (2012)
Top-k document retrieval in optimal time and linear space
G. Salton (1968)
Automatic Information Organization And Retrieval
E. Linstead, S. Bajracharya, T. Ngo, Paul Rigor, C. Lopes, P. Baldi (2008)
Sourcerer: mining and searching internet-scale software repositoriesData Mining and Knowledge Discovery, 18
R. Shah, Cheng Sheng, Sharma Thankachan, J. Vitter (2012)
On Optimal Top-K String RetrievalArXiv, abs/1207.2632
W. Hon, R. Shah, J. Vitter (2009)
Space-Efficient Framework for Top-k String Retrieval Problems2009 50th Annual IEEE Symposium on Foundations of Computer Science
D. Shasha, Philippe Bonnet (2004)
Database tuning principles, experiments, and troubleshooting techniques
M. Patrascu (2007)
Lower bounds for 2-dimensional range counting
R. Raman, Venkatesh Raman, S. Satti (2007)
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisetsACM Transactions on Algorithms (TALG), 3
Roberto Konow, G. Navarro (2012)
Faster Compact Top-k Document Retrieval2013 Data Compression Conference
L. Rasmussen (1992)
In information retrieval: data structures and algorithms
R. Typke, F. Wiering, R. Veltkamp (2005)
A Survey of Music Information Retrieval Systems
O. Berkman, U. Vishkin (1993)
Recursive Star-Tree Parallel Data StructureSIAM J. Comput., 22
A. Apostolico (1985)
The Myriad Virtues of Subword Trees
Moshe Lewenstein (2013)
Orthogonal Range Searching for Text Indexing
J. Culpepper, G. Navarro, S. Puglisi, A. Turpin (2010)
Top-k Ranked Document Search in General Text Databases
P. Ferragina, Nick Koudas, D. Srivastava, S. Muthukrishnan (2001)
Two-dimensional substring indexingProceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Yossi Matias, S. Muthukrishnan, S. Sahinalp, J. Ziv (1998)
Augmenting Suffix Trees, with Applications
K. Mehlhorn (1984)
Sorting and Searching (Eatcs Monographs on Theoretical Computer Science)
Stefan Büttcher, C. Clarke, G. Cormack (2010)
Information Retrieval: Implementing and Evaluating Search Engines
Jérémy Barbay, T. Gagie, G. Navarro, Yakov Nekrich (2010)
Alphabet Partitioning for Compressed Rank/Select and Applications
M. Crochemore, W. Rytter (2002)
Jewels of stringology
G. Navarro, Daniel Valenzuela (2012)
Space-Efficient Top-k Document Retrieval
T. Gagie, G. Navarro, S. Puglisi (2010)
New algorithms on wavelet trees and applications to information retrievalArXiv, abs/1011.4532
G. Navarro, L. Russo (2011)
Space-Efficient Data-Analysis Queries on GridsArXiv, abs/1106.4649
W. Croft, Donald Metzler, Trevor Strohman (2009)
Search Engines - Information Retrieval in Practice
P. Ferragina, G. Manzini, V. Mäkinen, G. Navarro (2007)
Compressed representations of sequences and full-text indexesACM Trans. Algorithms, 3
C. Makris (2012)
Wavelet trees: A surveyComput. Sci. Inf. Syst., 9
F. Silvestri (2010)
Mining Query Logs: Turning Search Usage Data into KnowledgeFound. Trends Inf. Retr., 4
M. Greve, A. Jørgensen, Kasper Larsen, Jakob Truelsen (2010)
Cell Probe Lower Bounds and Approximations for Range Mode
I. Witten (1994)
Managing gigabytes
Juha Kärkkäinen, P. Sanders, S. Burkhardt (2006)
Linear work suffix array constructionJ. ACM, 53
J. Culpepper, M. Petri, Falk Scholer (2012)
Efficient in-memory top-k document retrieval
Marjan Celikik, H. Bast (2009)
Fast error-tolerant search on very large texts
G. Gonnet, Ricardo Baeza-Yates, T. Snider (1992)
New Indices for Text: Pat Trees and Pat Arrays
G. Navarro, V. Mäkinen (2007)
Compressed full-text indexesACM Comput. Surv., 39
Jiawei Han, Hong Cheng, Dong Xin, Xifeng Yan (2007)
Frequent pattern mining: current status and future directionsData Mining and Knowledge Discovery, 15
Héctor Ferrada, G. Navarro (2013)
A Lempel-Ziv Compressed Structure for Document Listing
D. Belazzougui, G. Navarro (2011)
New Lower and Upper Bounds for Representing Sequences
We ask for the q = 1st element and get that it is 3, which appears 2 times. Thus we set q ← q + 2 = 3 and ask for the q = 3rd element, getting 4, which occurs 2 times. Thus we set q ← q + 2 = 5
Manish Patil, Sharma Thankachan, R. Shah, W. Hon, J. Vitter, S. Chandrasekaran (2011)
Inverted indexes for phrases and stringsProceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
B. Liu (2006)
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
G. Navarro (2012)
Wavelet trees for all
D. Willard (1983)
Log-Logarithmic Worst-Case Range Queries are Possible in Space Theta(N)Inf. Process. Lett., 17
E. McCreight (1976)
A Space-Economical Suffix Tree Construction AlgorithmJ. ACM, 23
I. Witten, Alistair Moffat, T. Bell (1999)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
G. Navarro, Sharma Thankachan (2013)
Faster Top-k Document Retrieval in Optimal Space
G. Navarro, Yakov Nekrich (2013)
Optimal Top-k Document RetrievalArXiv, abs/1307.6789
H. Heaps (1978)
Information retrieval, computational and theoretical aspects
W. Hon, R. Shah, Sharma Thankachan (2011)
Towards an Optimal Space-and-Query-Time Index for Top-k Document RetrievalArXiv, abs/1108.0554
(2014)
Article 52, Publication date
H. Bast, C. Mortensen, Ingmar Weber (2006)
Output-sensitive autocompletion searchInformation Retrieval, 11
Annekathrin Bartsch, B. Bunk, Isam Haddad, Johannes Klein, R. Münch, Thorsten Johl, U. Kärst, L. Jänsch, D. Jahn, Ida Retter (2011)
GeneReporter—sequence-based document retrieval and annotationBioinformatics, 27
R. Shah, Cheng Sheng, Sharma Thankachan, J. Vitter (2013)
Top-k Document Retrieval in External Memory
(2013)
Received April
E. Ukkonen (1995)
On-line construction of suffix treesAlgorithmica, 14
T. Gagie, S. Puglisi, A. Turpin (2009)
Range Quantile Queries: Another Virtue of Wavelet Trees
K. Sadakane (2007)
Succinct data structures for flexible text retrieval systemsJ. Discrete Algorithms, 5
W. Hon, Sharma Thankachan, R. Shah, J. Vitter (2013)
Faster Compressed Top-k Document Retrieval2013 Data Compression Conference
David Clark (1998)
Compact pat trees
Kasper Larsen, F. Walderveen (2013)
Near-Optimal Range Reporting Structures for Categorical Data
G. Navarro, S. Puglisi, Daniel Valenzuela (2011)
Practical Compressed Document Retrieval
P. Bose, Meng He, A. Maheshwari, Pat Morin (2009)
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing
P. Boas, R. Kaas, E. Zijlstra (1976)
Design and implementation of an efficient priority queueMathematical systems theory, 10
Timothy Chan, Stephane Durocher, M. Skala, Bryan Wilkinson (2012)
Linear-Space Data Structures for Range Minority Query in ArraysAlgorithmica, 72
Jérémy Barbay, A. López-Ortiz, Tyler Lu, Alejandro Salinger (2009)
An experimental investigation of set intersection algorithms for text searchingACM J. Exp. Algorithmics, 14
S. Muthukrishnan (2002)
Efficient algorithms for document retrieval problems
B. Schieber, U. Vishkin (1988)
On Finding Lowest Common Ancestors: Simplification and ParallelizationSIAM J. Comput., 17
J. Fischer, T. Gagie, T. Kopelowitz, Moshe Lewenstein, V. Mäkinen, Leena Salmela, Niko Välimäki (2012)
Forbidden Patterns
W. Hon, R. Shah, J. Vitter (2010)
Compression, Indexing, and Retrieval for Massive String Data
Gaoqi Rao, Endong Xun (2012)
Word Boundary Information and Chinese Word SegmentationInt. J. Asian Lang. Process., 22
Dong Kim, J. Sim, Heejin Park, Kunsoo Park (2005)
Constructing suffix arrays in linear timeJ. Discrete Algorithms, 3
Let us map A[10] to its local suffix array. Fig. 15 illustrates the necessary components
W. Frakes, R. Baeza-Yates (1992)
Information Retrieval: Data Structures and Algorithms
Spaces, Trees, and Colors: The Algorithmic Landscape of Document Retrieval on Sequences Gonzalo Navarro, Department of Computer Science, University of Chile Document retrieval is one of the best-established information retrieval activities since the '60s, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to "natural language" text collections, where inverted indexes are the preferred solution. As successful as this paradigm has been, it fails to properly handle various East Asian languages and other scenarios where the "natural language" assumptions do not hold. In this survey, we cover the recent research in extending the document retrieval techniques to a broader class of sequence collections, which has applications in bioinformatics, data and web mining, chemoinformatics, software engineering, multimedia information retrieval, and many other fields. We focus on the algorithmic aspects of the techniques, uncovering a rich world of relations between document retrieval challenges and fundamental problems on trees, strings, range queries, discrete geometry, and other areas. Categories and Subject Descriptors: E.1 [Data structures]; E.2 [Data storage representations]; E.4 [Coding and information theory]: Data compaction and compression; F.2.2 [Analysis of algorithms and problem
ACM Computing Surveys (CSUR) – Association for Computing Machinery
Published: Mar 1, 2014
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.