journal article
LitStream Collection
Bodoff, David; Wong, Samuel Po‐Shing
doi: 10.1002/asi.20378pmid: N/A
The view of documents and/or queries as random variables is gaining importance in the theory of information retrieval. We argue that traditional probabilistic models consider documents and queries as random variables, but that newer models such as language modeling and our unified model take this one step further. The additional step is called error in predictors. Such models consider that we don't observe the document and query random variables that are modeled to predict relevance probabilistically. Rather, there are additional random variables, which are the observed documents and queries. We discuss some important implications of this idea for parameter estimation, relevance prediction, and even test‐collection construction. By clarifying the positions of various probabilistic models on this question, and presenting in one place many of its implications, this article aims to deepen our common understanding of the theories behind traditional probabilistic models, and to strengthen the theoretical basis for further development of more recent approaches such as language modeling.
Ng, Kwong Bor; Kantor, Paul; Strzalkowski, Tomek; Wacholder, Nina; Tang, Rong; Bai, Bing; Rittman, Robert; Song, Peng; Sun, Ying
doi: 10.1002/asi.20393pmid: N/A
The authors report on a series of experiments to automate the assessment of document qualities such as depth and objectivity. The primary purpose is to develop a quality‐sensitive functionality, orthogonal to relevance, to select documents for an interactive question‐answering system. The study consisted of two stages. In the classifier construction stage, nine document qualities deemed important by information professionals were identified and classifiers were developed to predict their values. In the confirmative evaluation stage, the performance of the developed methods was checked using a different document collection. The quality prediction methods worked well in the second stage. The results strongly suggest that the best way to predict document qualities automatically is to construct classifiers on a person‐by‐person basis.
doi: 10.1002/asi.20396pmid: N/A
In the first part of this article the author defines the n‐overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N‐grams, etc.) that belong to 1, 2, …, n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n‐overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well‐known Jaccard index in case n 5 2). Next, the distributional form of the n‐overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n‐overlap vector. Both item (token) n‐overlap and source (type) n‐overlap are studied. The n‐overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N‐grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n‐overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N‐grams).
doi: 10.1002/asi.20398pmid: N/A
The study examined Web co‐links to Canadian university Web sites. Multidimensional scaling (MDS) was used to analyze and visualize co‐link data as was done in co‐citation analysis. Co‐link data were collected in ways that would reflect three different views, the global view, the French Canada view, and the English Canada view. Mapping results of the three data sets accurately reflected the ways Canadians see the universities and clearly showed the linguistic and cultural differences within Canadian society. This shows that Web co‐linking is not a random phenomenon and that co‐link data contain useful information for Web data mining. It is proposed that the method developed in the study can be applied to other contexts such as analyzing relationships of different organizations or countries. This kind of research is promising because of the dynamics and the diversity of the Web.
doi: 10.1002/asi.20399pmid: N/A
Reading and writing book reviews for learned journals plays an important part in academic life but little is known about how academics carry out these tasks. The aim of this research was to explore these activities with academics from the arts and humanities, the social sciences, and the natural sciences. An electronic questionnaire was used to ascertain (a) how often the respondents read and wrote book reviews, (b) how useful they found them, and (c) what features they thought important in book reviews. Fifty‐two academics in the arts, 53 in the social sciences, and 51 in the sciences replied. There were few disciplinary differences. Most respondents reported reading between one and five book reviews a month and writing between one and two a year. There was high overall agreement between what the respondents thought were important features of book reviews, but there were also wide individual differences between them. This agreement across the disciplines supports the notion that book reviews can be seen as an academic genre with measurable features. This has implications for how they are written, and how authors might be taught to write them better. A potential checklist for authors is suggested.
Yi, Kwan; Beheshti, Jamshid; Cole, Charles; Leide, John E.; Large, Andrew
doi: 10.1002/asi.20401pmid: N/A
The authors report the findings of a study that analyzes and compares the query logs of PsycINFO for psychology and the two history databases of ABC‐Clio: Historical Abstracts and America: History and Life to establish the sociological nature of information need, searching, and seeking in history versus psychology. Two problems are addressed: (a) What level of query log analysis—by individual query terms, by co‐occurrence of word pairs, or by multiword terms (MWTs)—best serves as data for categorizing the queries to these two subject‐bound databases; and (b) how can the differences in the nature of the queries to history versus psychology databases aid in our understanding of user search behavior and the information needs of their respective users. The authors conclude that MWTs provide the most effective snapshot of user searching behavior for query categorization. The MWTs to ABC‐Clio indicate specific instances of historical events, people, and regions, whereas the MWTs to PsycINFO indicate concepts roughly equivalent to descriptors used by PsycINFO's own classification scheme. The average length of queries is 3.16 terms for PsycINFO and 3.42 for ABC‐Clio, which breaks from findings for other reference and scholarly search engine studies, bringing query length closer in line to findings for general Web search engines like Excite.
Rantanen, Esa M.; Palmer, Brent O.; Wiegmann, Douglas A.; Musiorski, Kevin M.
doi: 10.1002/asi.20412pmid: N/A
One of the main factors in all aviation accidents is human error. Therefore, the National Aeronautics and Space Administration (NASA) Aviation Safety Program (AvSP) has identified several human factors safety technologies to address this problem. Some technologies directly address human error either by attempting to reduce the occurrence of errors or by mitigating the negative consequences of errors. However, new technologies and system changes may also introduce new error opportunities or even induce different types of errors. Consequently, a thorough understanding of the relationship between error classes and technology “fixes” is crucial for the evaluation of intervention strategies outlined in the AvSP so that resources can be effectively directed to maximize the benefit to flight safety. This article summarizes efforts to map intervention technologies onto error categories and describes creation of a conceptual framework, identification of applicable taxonomies for each dimension of the framework, and construction of a usable prototype database. The framework consists of a three‐dimensional matrix with axes for the human operator, the task, and the environment. Human errors and technologies cohabit molecules in the matrix linking them. The database allows for taxonomic development in all three areas pertaining to human performance by keeping the taxonomies dynamic.
doi: 10.1002/asi.20415pmid: N/A
It has been 61 years since the 1945 Memex article, and so much has changed since then that we might well wonder whether the article is still worth looking at. It certainly inspired some of the leading figures in information technology, but now it seems to be cited either for things it did not really say, or because everything it proposed has been pretty much accomplished, albeit with alternate technology. If we take another look at the Memex description, though, there are a few key ideas that can still be goals in terms of an easy‐to‐use personal collection that is a supplement to one's own memory. Perhaps in today's terms, the device would be a combination of the iPod design and a tablet computer. As such, it could function as a handy information pod, with certain Memex features, serving as an extended personal memory.
Davis, Philip M.; Price, Jason S.
doi: 10.1002/asi.20405pmid: N/A
The design of a publisher's electronic interface can have a measurable effect on electronic journal usage statistics. A study of journal usage from six COUNTER‐compliant publishers at 32 research institutions in the United States, the United Kingdom, and Sweden indicates that the ratio of PDF to HTML views is not consistent across publisher interfaces, even after controlling for differences in publisher content. The number of full‐text downloads may be artificially inflated when publishers require users to view HTML versions before accessing PDF versions or when linking mechanisms, such as CrossRef, direct users to the full text rather than the abstract of each article. These results suggest that usage reports from COUNTER‐compliant publishers are not directly comparable in their current form. One solution may be to modify publisher numbers with “adjustment factors” deemed to be representative of the benefit or disadvantage due to its interface. Standardization of some interface and linking protocols may obviate these differences and allow for more accurate cross‐publisher comparisons.
Showing 1 to 10 of 15 Articles