Access the full text.
Sign up today, get DeepDyve free for 14 days.
Olena Medelyan, I. Witten (2008)
Domain-independent automatic keyphrase indexing with small training setsJ. Assoc. Inf. Sci. Technol., 59
J. Greenberg, Robert Losee, José Agüera, Ryan Scherle, Hollie White, C. Willis (2011)
HIVE: Helping interdisciplinary vocabulary engineeringBulletin of The American Society for Information Science and Technology, 37
R.M. Losee (2004)
A performance model of the length and number of subject headings and index phrasesJournal of the American Society for Information Science, 31
D. Buscaldi, P. Rosso (2008)
A conceptual density‐based approach for the disambiguation of toponymsInformation Outlook, 22
K. Pearson (1905)
The Problem of the Random WalkNature, 72
M. Maron (1961)
Automatic Indexing: An Experimental InquiryJ. ACM, 8
Natasha Vleduts-Stokolov (1987)
Concept recognition in an automatic text‐processing system for the life sciencesJournal of the Association for Information Science and Technology, 38
J. Greenberg, R. Losee, J.R. Pérez Agüera, R. Scherle, H. White, C. Willis (2011)
HIVE: Helping interdisciplinary vocabulary engineeringKnowledge Organization, 37
M.W. Hood (1990)
AGRICOLA–guide to subject indexingInformation Processing & Management
S. Ghahramani (2005)
Fundamentals of probability with stochastic processesJournal of the American Society for Information Science
R. Mihalcea, P. Tarau (2004)
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004)
C‐H Leung, W‐K. Kan (1997)
A statistical learning approach to automatic indexing of controlled index termsInformation Processing & Management, 48
H. Borko (1962)
Proceedings of the AIEE‐IRE'62Journal of Documentation
H.C. Berg (1993)
Random walks in biologyInternational Journal of Geographical Information Science
Robert Losee (2006)
Is 1 noun worth 2 adjectives? Measuring relative feature utilityInf. Process. Manag., 42
Robert Losee (2004)
A performance model of the length and number of subject headings and index phrasesKnowledge Organization, 31
June Silvester, Michael Genuardi, P. Klingbiel (1994)
Machine-Aided Indexing at NASAInf. Process. Manag., 30
D. Buscaldi, Paolo Rosso (2008)
A conceptual density‐based approach for the disambiguation of toponymsInternational Journal of Geographical Information Science, 22
B. Martins, M. Silva (2005)
Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM'05)
Chi-Hong Leung, W. Kan (1997)
A Statistical Learning Approach to Automatic Indexing of Controlled Index TermsJ. Am. Soc. Inf. Sci., 48
Simon Overell, S. Rüger (2008)
Using co‐occurrence models for placename disambiguationInternational Journal of Geographical Information Science, 22
M.E. Stevens, G.H. Urban (1964)
Proceedings of the AFIPS ‘64
S.M. Humphrey (1999)
Automatic indexing of documents from journal descriptors: A preliminary investigationJournal of the ACM, 50
P.H. Klingbiel (1969)
Machine‐aided indexing (Technical Report DDC‐TR‐69‐1)Nature
F. Sebastiani (2001)
Machine learning in automated text categorizationArXiv, cs.IR/0110053
B.J. Field (1975)
Towards automatic indexing: Automatic assignment of controlled‐language indexing from free indexingJournal of the American Society of Information Science, 31
Robert Losee (2007)
Decisions in thesaurus construction and useInf. Process. Manag., 43
Jochen Leidner, Michael Lieberman (2011)
Detecting geographical references in the form of place names and associated spatial natural languageACM SIGSPATIAL Special, 3
A.C. Foskett (1996)
The subject approach to informationSIGSPATIAL Special
S. Brin, L. Page (1998)
Seventh International World‐Wide Web Conference (WWW 1998)Bulletin of the American Society for Information Science and Technology
Aronson (2004)
The NLM Indexing initiative's medical text indexerMedinfo, 11
S.M. Humphrey, N.E. Miller (1987)
Knowledge‐based indexing of the medical literature: The indexing aid projectJournal of American Society for Information Science and Technology, 38
B. Field (1975)
TOWARDS AUTOMATIC INDEXING: AUTOMATIC ASSIGNMENT OF CONTROLLED‐LANGUAGE INDEXING AND CLASSIFICATION FROM FREE INDEXINGJournal of Documentation, 31
Aurélie Névéol, S. Shooshan, S. Humphrey, James Mork, A. Aronson (2009)
A recent advance in the automatic indexing of the biomedical literatureJournal of biomedical informatics, 42 5
R. Mihalcea, A. Csomai (2007)
Proceedings of the ACM Conference on Information and Knowledge Management
O. Medelyan (2009)
Human‐competitive automatic topic indexing
H. Borko (1899)
The construction of an empirically based mathematically derived classification system
M.M.K. Hlava (2005)
Automatic indexingInformation Processing & Management, 9
R. Mihalcea, D. Radev (2011)
Graph‐based natural language processing and information retrieval
(1985)
ISO 5963–Documentation–Methods for examining documents, determining their subjects, and selecting indexing termsJournal of Biomedical Informatics
B.G. Malkiel (1999)
A random walk down Wall Street
S. Humphrey (1999)
Automatic Indexing of Documents from Journal Descriptors: A Preliminary InvestigationJournal of the American Society for Information Science. American Society for Information Science, 50 8
(1985)
Theory of subject analysis: A sourcebookJournal of the American Society for Information Science
J. Klafter, I.M. Sokolov (2011)
First steps in random walks: From tools to applicationsInternational Journal of Geographical Information Science
J.L. Leidner, M.D. Lieberman (2011)
Detecting geographical references in the form of place names and associated spatial natural languageACM Computing Surveys, 3
Bruno Martins, Mário Silva (2005)
A graph-ranking algorithm for geo-referencing documentsFifth IEEE International Conference on Data Mining (ICDM'05)
S. Humphrey, N. Miller (1987)
Knowledge-based indexing of the medical literature: The Indexing Aid ProjectJournal of the American Society for Information Science. American Society for Information Science, 38 3
Marjorie Hlava (2005)
Comparing rule-based and statistics-based indexing systemsInformation outlook, 9
C. Plaunt, B. Norgard (1998)
An Association-Based Method for Automatic Indexing with a Controlled VocabularyJ. Am. Soc. Inf. Sci., 49
F.W. Lancaster (2003)
Indexing and abstracting in theory and practiceJournal of the American Society for Information Science
Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre‐indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high‐energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus‐centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.
Journal of the Association for Information Science and Technology – Wiley
Published: Jul 1, 2013
Keywords: ; ;
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.