journal article
LitStream Collection
doi: 10.1002/asi.20348pmid: N/A
In contrast to traditional information retrieval systems, which return ranked lists of documents that users must manually browse through, a question answering system attempts to directly answer natural language questions posed by the user. Although such systems possess language‐processing capabilities, they still rely on traditional document retrieval techniques to generate an initial candidate set of documents. In this article, the authors argue that document retrieval for question answering represents a task different from retrieving documents in response to more general retrospective information needs. Thus, to guide future system development, specialized question answering test collections must be constructed. They show that the current evaluation resources have major shortcomings; to remedy the situation, they have manually created a small, reusable question answering test collection for research purposes. In this article they describe their methodology for building this test collection and discuss issues they encountered regarding the notion of “answer correctness.”
Paling, Stephen; Nilan, Michael
doi: 10.1002/asi.20345pmid: N/A
Producers in creative genres are frequently motivated by goals that put those producers in opposition to popular culture and marketplace pressures. Questions about whether those goals reflect values that belong specifically to print culture, or whether those values will continue to motivate producers in creative genres after the introduction of online technology, have not been answered empirically. Previous studies of genre change have been among those that have focused on the ability of human actors to use information technology to alter those genres as social structures. However, these studies have focused on generic artifacts rather than on the creative values that motivated the creation of those artifacts. Editors of small literary magazines (generally referred to as little magazines) make ideal subjects for this study. Creative values play an important role in their decisions, and they frequently publish poetry, fiction, and other work that stand in opposition to popular culture and literature. This study proposed and evaluated a conceptual framework for anticipating whether editors of little magazines will use online technologies to reinforce or alter the values characteristic of their genre. The study found that the values posited in the conceptual framework fit the goals expressed by little magazine editors. Not all editors held those values equally, however. These findings suggest that producers in creative genres can use online technology in ways that actually reflect an intensification of those values. The concept of intensifying use of technology (IUT) was posited to explain the differences.
Lau, Annie Y.S.; Coiera, Enrico W.
doi: 10.1002/asi.20377pmid: N/A
This study aimed to develop a model for predicting the impact of information access using Web searches, on human decision making. Models were constructed using a database of search behaviors and decisions of 75 clinicians, who answered questions about eight scenarios within 80 minutes in a controlled setting at a university computer laboratory. Bayesian models were developed with and without bias factors to account for anchoring, primacy, recency, exposure, and reinforcement decision biases. Prior probabilities were estimated from the population prior, from a personal prior calculated from presearch answers and confidence ratings provided by the participants, from an overall measure of willingness to switch belief before and after searching, and from a willingness to switch belief calculated in each individual scenario. The optimal Bayes model predicted user answers in 73.3% (95% CI: 68.71 to 77.35%) of cases, and incorporated participants' willingness to switch belief before and after searching for each scenario, as well as the decision biases they encounter during the search journey. In most cases, it is possible to predict the impact of a sequence of documents retrieved by a Web search engine on a decision task without reference to the content or structure of the documents, but relying solely on a simple Bayesian model of belief revision.
doi: 10.1002/asi.20350pmid: N/A
The authors report on an experimental study on the differences between spoken and written queries. A set of written and spontaneous spoken queries are generated by users from written topics. These two sets of queries are compared in qualitative terms and in terms of their retrieval effectiveness. Written and spoken queries are compared in terms of length, duration, and part of speech. In addition, assuming perfect transcription of the spoken queries, written and spoken queries are compared in terms of their aptitude to describe relevant documents. The retrieval effectiveness of spoken and written queries is compared using three different information retrieval models. The results show that using speech to formulate one's information need provides a way to express it more naturally and encourages the formulation of longer queries. Despite that, longer spoken queries do not seem to significantly improve retrieval effectiveness compared with written queries.
doi: 10.1002/asi.20351pmid: N/A
The authors propose a method for automatically generating Japanese–English bilingual thesauri based on bilingual corpora. The term bilingual thesaurus refers to a set of bilingual equivalent words and their synonyms. Most of the methods proposed so far for extracting bilingual equivalent word clusters from bilingual corpora depend heavily on word frequency and are not effective for dealing with low‐frequency clusters. These low‐frequency bilingual clusters are worth extracting because they contain many newly coined terms that are in demand but are not listed in existing bilingual thesauri. Assuming that single language‐pair‐independent methods such as frequency‐based ones have reached their limitations and that a language‐pair‐dependent method used in combination with other methods shows promise, the authors propose the following approach: (a) Extract translation pairs based on transliteration patterns; (b) remove the pairs from among the candidate words; (c) extract translation pairs based on word frequency from the remaining candidate words; and (d) generate bilingual clusters based on the extracted pairs using a graph‐theoretic method. The proposed method has been found to be significantly more effective than other methods.
doi: 10.1002/asi.20352pmid: N/A
The application of thesauri in networked environments is seriously hampered by the challenges of introducing new concepts and terminology into the formal controlled vocabulary, which is critical for enhancing its retrieval capability. The author describes an automated process of adding new terms to thesauri as entry vocabulary by analyzing the association between words/phrases extracted from bibliographic titles and subject descriptors in the metadata record (subject descriptors are terms assigned from controlled vocabularies of thesauri to describe the subjects of the objects [e.g., books, articles] represented by the metadata records). The investigated approach uses a corpus of metadata for scientific and technical (S&T) publications in which the titles contain substantive words for key topics. The three steps of the method are (a) extracting words and phrases from the title field of the metadata; (b) applying a method to identify and select the specific and meaningful keywords based on the associated controlled vocabulary terms from the thesaurus used to catalog the objects; and (c) inserting selected keywords into the thesaurus as new terms (most of them are in hierarchical relationships with the existing concepts), thereby updating the thesaurus with new terminology that is being used in the literature. The effectiveness of the method was demonstrated by an experiment with the Chinese Classification Thesaurus (CCT) and bibliographic data in China Machine‐Readable Cataloging Record (MARC) format (CNMARC) provided by Peking University Library. This approach is equally effective in large‐scale collections and in other languages.
Conrad, Jack G.; Schriber, Cindy P.
doi: 10.1002/asi.20363pmid: N/A
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retrieve search results consisting of sets of duplicate documents, whether identical duplicates or close variants. The goal of this work is to facilitate (a) investigations into the phenomenon of near duplicates and (b) algorithmic approaches to minimizing its deleterious effect on search results. Harnessing the expertise of both client‐users and professional searchers, we establish principled methods to generate a test collection for identifying and handling nonidentical duplicate documents. We subsequently examine a flexible method of characterizing and comparing documents to permit the identification of near duplicates. This method has produced promising results following an extensive evaluation using a production‐based test collection created by domain experts.
doi: 10.1002/asi.20365pmid: N/A
Advances in search technology have meant that search systems can now offer assistance to users beyond simply retrieving a set of documents. For example, search systems are now capable of inferring user interests by observing their interaction, offering suggestions about what terms could be used in a query, or reorganizing search results to make exploration of retrieved material more effective. When providing new search functionality, system designers must decide how the new functionality should be offered to users. One major choice is between (a) offering automatic features that require little human input, but give little human control; or (b) interactive features which allow human control over how the feature is used, but often give little guidance over how the feature should be best used. This article presents a study in which we empirically investigate the issue of control by presenting an experiment in which participants were asked to interact with three experimental systems that vary the degree of control they had in creating queries, indicating which results are relevant in making search decisions. We use our findings to discuss why and how the control users want over search decisions can vary depending on the nature of the decisions and the impact of those decisions on the user's search.
Bailón‐Moreno, Rafael; Jurado‐Alameda, Encarnación; Ruiz‐Baños, Rosario
doi: 10.1002/asi.20362pmid: N/A
The scientific network of the surfactants and related subjects has been analyzed with the CoPalRed© knowledge system. The actors studied have been countries, research centers and laboratories, researchers, and journals. The thematic map of the major research areas has been established. Most of the research areas, and those that have the greatest representation in terms of number of documents, are related to physics and chemistry. However, biochemistry and cell biology, medicine (pediatrics and pulmonary physiology), and, to a lesser extent, veterinary medicine and food science and technology are also noteworthy in the field of surfactants, which presents a markedly multidisciplinary profile.
Showing 1 to 10 of 14 Articles