Access the full text.
Sign up today, get DeepDyve free for 14 days.
David Milne, I. Witten (2013)
An open-source toolkit for mining WikipediaArtif. Intell., 194
Koraljka Golub (2006)
Automated subject classification of textual Web pages, based on a controlled vocabulary: Challenges and recommendationsNew Review of Hypermedia and Multimedia, 12
R. Dolin, D. Agrawal, A. Abbadi (1999)
Scalable collection summarization and selection
Arash Joorabchi, A. Mahdi (2013)
Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithmsJournal of Information Science, 39
J. Beall (2011)
Academic Library Databases and the Problem of Word-Sense AmbiguityThe Journal of Academic Librarianship, 37
Olena Medelyan, I. Witten (2008)
Domain-independent automatic keyphrase indexing with small training setsJ. Assoc. Inf. Sci. Technol., 59
R. Larson (1992)
Experiments in Automatic Library of Congress ClassificationJ. Am. Soc. Inf. Sci., 43
Maria Grineva, Maxim Grinev, D. Lizorkin (2009)
Extracting key terms from noisy and multitheme documents
Charlotte Jenkins, M. Jackson, P. Burden, J. Wallis (1998)
Automatic Classification of Web Resources using Java and Dewey Decimal ClassificationComput. Networks, 30
Young-Mee Chung, Young-Hee Noh (2003)
Developing a specialized directory system by automatically classifying Web documentsJournal of Information Science, 29
Ulli Waltinger, Alexander Mehler, Mathias Lösch, W. Horstmann (2009)
Hierarchical Classification of OAI Metadata Using the DDC Taxonomy
T. Koch, A. Ardö, Koraljka Golub (2004)
Browsing and searching behavior in the Renardus Web service: a study based on log analysisProceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004.
Roger Thompson, K. Shafer, D. Vizine-Goetz (1997)
Evaluating Dewey concepts as a knowledge base for automatic subject assignment
(2006)
About the author Arash Joorabchi is currently a Postdoctoral Researcher in the Department of Electronic and Computer Engineering, University of Limerick, Ireland
J. Pong, R. Kwok, Raymond Lau, Jin-Xing Hao, Percy Wong (2008)
A comparative study of two automatic document classification methods in a library settingJournal of Information Science, 34
Mathias Lösch (2011)
A Multidisciplinary Search Engine for Scientific Open Access Documents, 2
Rebecca Dean (2004)
FAST: Development of Simplified Headings for MetadataCataloging & Classification Quarterly, 39
Scorpion [ Online ]
L. Hunter, K. Cohen (2006)
Biomedical language processing: what's beyond PubMed?Molecular cell, 21 5
Karen Jones (2004)
IDF term weighting and IR research lessonsJ. Documentation, 60
Olena Medelyan (2009)
Human-competitive automatic topic indexing
G. Möller, K. Carstensen (1999)
Automatic Classification of the World-Wide Web using the Universal Decimal Classification
L. Rolling (1981)
Indexing consistency, quality and efficiencyInf. Process. Manag., 17
E. Frank, G. Paynter (2004)
Predicting Library of Congress classifications from Library of Congress subject headingsJ. Assoc. Inf. Sci. Technol., 55
Thom Hickey, E. O'Neill, Jenny Toves (2002)
Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR)D Lib Mag., 8
Jessica Adamick, Rebecca Reznik-Zellen (2010)
Trends in Large-Scale Subject RepositoriesD Lib Mag., 16
(2010)
“ Classify : a FRBR - based research prototype for applying classification numbers ”
Koraljka Golub, A. Ardö, D. Mladenić, M. Grobelnik (2006)
Comparing and Combining Two Approaches to Automated Subject Classification of Text
Mathias Lösch, Ulli Waltinger, W. Horstmann, Alexander Mehler (2011)
Building a DDC-annotated Corpus from OAI MetadataJ. Digit. Inf., 12
M. Osborne, S. Petrovic, R. McCreadie, C. Macdonald, I. Ounis, S. Petrovic (2012)
Bieber no more : First Story Detection using Twitter and Wikipedia
Kwan Yi (2007)
Automated Text Classification Using Library Classification Schemes : Trends, Issues, and ChallengesInternational cataloguing and bibliographic control, 36
J. Wang (2009)
An extensive study on automated Dewey Decimal ClassificationJ. Assoc. Inf. Sci. Technol., 60
Olena Medelyan, I. Witten, David Milne (2008)
Topic indexing with Wikipedia
A. Mahdi, Arash Joorabchi (2010)
A citation-based approach to automatic topical indexing of scientific literatureJournal of Information Science, 36
Purpose – This paper aims to report on the design and development of a new approach for automatic classification and subject indexing of research documents in scientific digital libraries and repositories (DLR) according to library controlled vocabularies such as DDC and FAST. Design/methodology/approach – The proposed concept matching‐based approach (CMA) detects key Wikipedia concepts occurring in a document and searches the OPACs of conventional libraries via querying the WorldCat database to retrieve a set of MARC records which share one or more of the detected key concepts. Then the semantic similarity of each retrieved MARC record to the document is measured and, using an inference algorithm, the DDC classes and FAST subjects of those MARC records which have the highest similarity to the document are assigned to it. Findings – The performance of the proposed method in terms of the accuracy of the DDC classes and FAST subjects automatically assigned to a set of research documents is evaluated using standard information retrieval measures of precision, recall, and F1. The authors demonstrate the superiority of the proposed approach in terms of accuracy performance in comparison to a similar system currently deployed in a large scale scientific search engine. Originality/value – The proposed approach enables the development of a new type of subject classification system for DLR, and addresses some of the problems similar systems suffer from, such as the problem of imbalanced training data encountered by machine learning‐based systems, and the problem of word‐sense ambiguity encountered by string matching‐based systems.
Library Hi Tech – Emerald Publishing
Published: Nov 15, 2013
Keywords: Libraries; Information retrieval; Concept matching; Subject indexing; WorldCat; Wikipedia; Scientific digital libraries and repositories; Metadata generation; Subject metadata; Dewey Decimal Classification (DDC); FAST subject headings; Automatic classification
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.