Artificial Intelligence in Medicine 55 (2012) 197–207
Contents lists available at SciVerse ScienceDirect
Artificial Intelligence in Medicine
journal homepage: www.elsevier.com/locate/aiim
Screening nonrandomized studies for medical systematic reviews: A
comparative study of classifiers
Tanja Bekhuis
a,∗
, Dina Demner-Fushman
b,1
a
Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
b
Communications Engineering Branch, Lister Hill National Center for Biomedical Communications, US National Library of Medicine, Bethesda, MD, USA
article info
Article history:
Received 13 December 2010
Received in revised form
29 December 2011
Accepted 13 May 2012
Keywords:
Medical informatics
Clinical research informatics
Text mining
Document classification
Systematic reviews
abstract
Objectives: To investigate whether (1) machine learning classifiers can help identify nonrandomized
studies eligible for full-text screening by systematic reviewers; (2) classifier performance varies with
optimization; and (3) the number of citations to screen can be reduced.
Methods: We used an open-source, data-mining suite to process and classify biomedical citations that
point to mostly nonrandomized studies from 2 systematic reviews. We built training and test sets for
citation portions and compared classifier performance by considering the value of indexing, various
feature sets, and optimization. We conducted our experiments in 2 phases. The design of phase I with
no optimization was: 4 classifiers × 3 feature sets × 3 citation portions. Classifiers included k-nearest
neighbor, naïve Bayes, complement naïve Bayes, and evolutionary support vector machine. Feature sets
included bag of words, and 2- and 3-term n-grams. Citation portions included titles, titles and abstracts,
and full citations with metadata. Phase II with optimization involved a subset of the classifiers, as well as
features extracted from full citations, and full citations with overweighted titles. We optimized features
and classifier parameters by manually setting information gain thresholds outside of a process for iterative
grid optimization with 10-fold cross-validations. We independently tested models on data reserved for
that purpose and statistically compared classifier performance on 2 types of feature sets. We estimated
the number of citations needed to screen by reviewers during a second pass through a reduced set of
citations.
Results: In phase I, the evolutionary support vector machine returned the best recall for bag of words
extracted from full citations; the best classifier with respect to overall performance was k-nearest neigh-
bor. No classifier attained good enough recall for this task without optimization. In phase II, we boosted
performance with optimization for evolutionary support vector machine and complement naïve Bayes
classifiers. Generalization performance was better for the latter in the independent tests. For evolution-
ary support vector machine and complement naïve Bayes classifiers, the initial retrieval set was reduced
by 46% and 35%, respectively.
Conclusions: Machine learning classifiers can help identify nonrandomized studies eligible for full-text
screening by systematic reviewers. Optimization can markedly improve performance of classifiers. How-
ever, generalizability varies with the classifier. The number of citations to screen during a second
independent pass through the citations can be substantially reduced.
© 2012 Elsevier B.V. All rights reserved.
∗
Corresponding author at: University of Pittsburgh School of Medicine Depart-
ment of Biomedical Informatics UPMC Cancer Pavilion, Suite 301-338, 5150 Centre
Avenue, Pittsburgh, PA 15232, USA. Tel.: +1 412 647 6705.
E-mail addresses: tcb24@pitt.edu (T. Bekhuis), ddemner@mail.nih.gov
(D. Demner-Fushman).
1
US National Library of Medicine Lister Hill National Center for Biomedical Com-
munications Building 38A, Office 10S-1022 8600 Rockville Pike Bethesda, MD 20894,
USA. Tel.: +1 301 435 5320.
1. Introduction
Translation of biomedical research into practice depends in part
on the production of systematic reviews that synthesize available
evidence for clinicians, researchers, and policymakers. Unfortu-
nately, remarkable growth in the number of reviews has not kept
pace with growth in the number of medical trials, which are sources
of evidence [1]. The problem is even more serious because most
reviews are traditional rather than systematic. What is needed is
streamlined production of the latter [1,2] to better control known
threats to validity [3] while promoting transparent and repro-
ducible science.
0933-3657/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.artmed.2012.05.002