Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences

Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences Spaces, Trees, and Colors: The Algorithmic Landscape of Document Retrieval on Sequences Gonzalo Navarro, Department of Computer Science, University of Chile Document retrieval is one of the best-established information retrieval activities since the '60s, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to "natural language" text collections, where inverted indexes are the preferred solution. As successful as this paradigm has been, it fails to properly handle various East Asian languages and other scenarios where the "natural language" assumptions do not hold. In this survey, we cover the recent research in extending the document retrieval techniques to a broader class of sequence collections, which has applications in bioinformatics, data and web mining, chemoinformatics, software engineering, multimedia information retrieval, and many other fields. We focus on the algorithmic aspects of the techniques, uncovering a rich world of relations between document retrieval challenges and fundamental problems on trees, strings, range queries, discrete geometry, and other areas. Categories and Subject Descriptors: E.1 [Data structures]; E.2 [Data storage representations]; E.4 [Coding and information theory]: Data compaction and compression; F.2.2 [Analysis of algorithms and problem http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Computing Surveys (CSUR) Association for Computing Machinery

Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences

ACM Computing Surveys (CSUR) , Volume 46 (4) – Mar 1, 2014

Loading next page...
 
/lp/association-for-computing-machinery/spaces-trees-and-colors-the-algorithmic-landscape-of-document-uETOp6NGWg

References (128)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2014 by ACM Inc.
ISSN
0360-0300
DOI
10.1145/2535933
Publisher site
See Article on Publisher Site

Abstract

Spaces, Trees, and Colors: The Algorithmic Landscape of Document Retrieval on Sequences Gonzalo Navarro, Department of Computer Science, University of Chile Document retrieval is one of the best-established information retrieval activities since the '60s, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to "natural language" text collections, where inverted indexes are the preferred solution. As successful as this paradigm has been, it fails to properly handle various East Asian languages and other scenarios where the "natural language" assumptions do not hold. In this survey, we cover the recent research in extending the document retrieval techniques to a broader class of sequence collections, which has applications in bioinformatics, data and web mining, chemoinformatics, software engineering, multimedia information retrieval, and many other fields. We focus on the algorithmic aspects of the techniques, uncovering a rich world of relations between document retrieval challenges and fundamental problems on trees, strings, range queries, discrete geometry, and other areas. Categories and Subject Descriptors: E.1 [Data structures]; E.2 [Data storage representations]; E.4 [Coding and information theory]: Data compaction and compression; F.2.2 [Analysis of algorithms and problem

Journal

ACM Computing Surveys (CSUR)Association for Computing Machinery

Published: Mar 1, 2014

There are no references for this article.