Classification of scientific publications according to library controlled vocabularies A new concept matching‐based approach

Arash Joorabchi; Abdulhussain E. Mahdi

doi:10.1108/LHT-03-2013-0030

Loading next page...

References (35)

David Milne, I. Witten (2013)
An open-source toolkit for mining Wikipedia
Artif. Intell., 194
Koraljka Golub (2006)
Automated subject classification of textual Web pages, based on a controlled vocabulary: Challenges and recommendations
New Review of Hypermedia and Multimedia, 12
R. Dolin, D. Agrawal, A. Abbadi (1999)
Scalable collection summarization and selection
Arash Joorabchi, A. Mahdi (2013)
Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms
Journal of Information Science, 39
J. Beall (2011)
Academic Library Databases and the Problem of Word-Sense Ambiguity
The Journal of Academic Librarianship, 37
Olena Medelyan, I. Witten (2008)
Domain-independent automatic keyphrase indexing with small training sets
J. Assoc. Inf. Sci. Technol., 59
R. Larson (1992)
Experiments in Automatic Library of Congress Classification
J. Am. Soc. Inf. Sci., 43
Maria Grineva, Maxim Grinev, D. Lizorkin (2009)
Extracting key terms from noisy and multitheme documents
Charlotte Jenkins, M. Jackson, P. Burden, J. Wallis (1998)
Automatic Classification of Web Resources using Java and Dewey Decimal Classification
Comput. Networks, 30
Young-Mee Chung, Young-Hee Noh (2003)
Developing a specialized directory system by automatically classifying Web documents
Journal of Information Science, 29
Ulli Waltinger, Alexander Mehler, Mathias Lösch, W. Horstmann (2009)
Hierarchical Classification of OAI Metadata Using the DDC Taxonomy
T. Koch, A. Ardö, Koraljka Golub (2004)
Browsing and searching behavior in the Renardus Web service: a study based on log analysis
Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004.
Roger Thompson, K. Shafer, D. Vizine-Goetz (1997)
Evaluating Dewey concepts as a knowledge base for automatic subject assignment
(2006)
About the author Arash Joorabchi is currently a Postdoctoral Researcher in the Department of Electronic and Computer Engineering, University of Limerick, Ireland
J. Pong, R. Kwok, Raymond Lau, Jin-Xing Hao, Percy Wong (2008)
A comparative study of two automatic document classification methods in a library setting
Journal of Information Science, 34
Mathias Lösch (2011)
A Multidisciplinary Search Engine for Scientific Open Access Documents
, 2
10.1002/(SICI)1097-4571(199203)43:2<130::AID-ASI3>3.0.CO;2-S
Rebecca Dean (2004)
FAST: Development of Simplified Headings for Metadata
Cataloging & Classification Quarterly, 39
Scorpion [ Online ]
L. Hunter, K. Cohen (2006)
Biomedical language processing: what's beyond PubMed?
Molecular cell, 21 5
Karen Jones (2004)
IDF term weighting and IR research lessons
J. Documentation, 60
Olena Medelyan (2009)
Human-competitive automatic topic indexing
G. Möller, K. Carstensen (1999)
Automatic Classification of the World-Wide Web using the Universal Decimal Classification
L. Rolling (1981)
Indexing consistency, quality and efficiency
Inf. Process. Manag., 17
E. Frank, G. Paynter (2004)
Predicting Library of Congress classifications from Library of Congress subject headings
J. Assoc. Inf. Sci. Technol., 55
Thom Hickey, E. O'Neill, Jenny Toves (2002)
Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR)
D Lib Mag., 8
Jessica Adamick, Rebecca Reznik-Zellen (2010)
Trends in Large-Scale Subject Repositories
D Lib Mag., 16
(2010)
“ Classify : a FRBR - based research prototype for applying classification numbers ”
Koraljka Golub, A. Ardö, D. Mladenić, M. Grobelnik (2006)
Comparing and Combining Two Approaches to Automated Subject Classification of Text
Mathias Lösch, Ulli Waltinger, W. Horstmann, Alexander Mehler (2011)
Building a DDC-annotated Corpus from OAI Metadata
J. Digit. Inf., 12
M. Osborne, S. Petrovic, R. McCreadie, C. Macdonald, I. Ounis, S. Petrovic (2012)
Bieber no more : First Story Detection using Twitter and Wikipedia
Kwan Yi (2007)
Automated Text Classification Using Library Classification Schemes : Trends, Issues, and Challenges
International cataloguing and bibliographic control, 36
J. Wang (2009)
An extensive study on automated Dewey Decimal Classification
J. Assoc. Inf. Sci. Technol., 60
Olena Medelyan, I. Witten, David Milne (2008)
Topic indexing with Wikipedia
A. Mahdi, Arash Joorabchi (2010)
A citation-based approach to automatic topical indexing of scientific literature
Journal of Information Science, 36

Publisher: Emerald Publishing
Copyright: Copyright © 2013 Emerald Group Publishing Limited. All rights reserved.
ISSN: 0737-8831
DOI: 10.1108/LHT-03-2013-0030
Publisher site: See Article on Publisher Site

Abstract

Purpose – This paper aims to report on the design and development of a new approach for automatic classification and subject indexing of research documents in scientific digital libraries and repositories (DLR) according to library controlled vocabularies such as DDC and FAST. Design/methodology/approach – The proposed concept matching‐based approach (CMA) detects key Wikipedia concepts occurring in a document and searches the OPACs of conventional libraries via querying the WorldCat database to retrieve a set of MARC records which share one or more of the detected key concepts. Then the semantic similarity of each retrieved MARC record to the document is measured and, using an inference algorithm, the DDC classes and FAST subjects of those MARC records which have the highest similarity to the document are assigned to it. Findings – The performance of the proposed method in terms of the accuracy of the DDC classes and FAST subjects automatically assigned to a set of research documents is evaluated using standard information retrieval measures of precision, recall, and F1. The authors demonstrate the superiority of the proposed approach in terms of accuracy performance in comparison to a similar system currently deployed in a large scale scientific search engine. Originality/value – The proposed approach enables the development of a new type of subject classification system for DLR, and addresses some of the problems similar systems suffer from, such as the problem of imbalanced training data encountered by machine learning‐based systems, and the problem of word‐sense ambiguity encountered by string matching‐based systems.

Journal

Library Hi Tech – Emerald Publishing

Published: Nov 15, 2013

Keywords: Libraries; Information retrieval; Concept matching; Subject indexing; WorldCat; Wikipedia; Scientific digital libraries and repositories; Metadata generation; Subject metadata; Dewey Decimal Classification (DDC); FAST subject headings; Automatic classification

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Classification of scientific publications according to library controlled vocabularies A new concept matching‐based approach

Classification of scientific publications according to library controlled vocabularies A new concept matching‐based approach

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Classification of scientific publications according to library controlled vocabularies A new concept matching‐based approach

Classification of scientific publications according to library controlled vocabularies A new concept matching‐based approach

References (35)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies