Quality & Quantity 34: 259–274, 2000.
© 2000 Kluwer Academic Publishers. Printed in the Netherlands.
A Conceptual Framework
for Quantitative Text Analysis
On Joining Probabilities and Substantive Inferences about Texts
CARL W. ROBERTS
Department of Sociology or Department of Statistics, Iowa State University, Ames, IA 50011, U.S.A.
Abstract. Quantitative text analysis refers to the application of one or more methods for drawing
statistical inferences from text populations. After brieﬂy distinguishing quantitative text analysis
from linguistics, computational linguistics, and qualitative text analysis, issues raised during the
1955 Allerton House Conference are used as a vehicle for characterizing classical text analysis as
an instrumental-thematic method. Quantitative text analysis methods are then depicted according
to a 2 × 3 conceptual framework in which texts are interpreted either instrumentally (according
to the researcher’s conceptual framework) or representationally (according to the texts’ sources’
perspectives), as well as in which variables are thematic (counts of word/phrase occurrences), se-
mantic (themes within a semantic grammar), or network-related (theme- or relation-positions within
a conceptual network). Common methodological errors associated with each method are discussed.
The paper concludes with a delineation of the universe of substantive answers that quantitative text
analysis is able to provide to social science researchers.
Key words: content analysis, text analysis, semantic grammar, network, instrumental versus
representational, quantitative methods.
Painted with broad strokes, formal analyses of linguistic data are pursued from
three academic orientations: linguistics, computer science, and the social sciences.
Linguists’ interests are primarily in describing text structure: as surface forms pro-
duced by an innate human capacity to generate linguistic expressions (Chomsky,
1965), as sequences of functional forms expressed by goal-oriented humans ac-
cording to discrete narrative grammars (Griemas, 1984 ; Halliday, 1994),
as patterns of symbols uttered by fallible native speakers in ways (in)consistent
with some prescribed standard (Honey, 1983), etc. With recent developments in
computer technology, linguists have begun to evaluate their theories by developing
corresponding text-parsing software (cf. Rosner and Johnson, 1992). This work
forms the academic branch of computational linguistics, in addition to which a
more applied, commercial branch has developed with the objective of quickly
“understanding” user input and yielding as output the user-expected outcome
(Grishman, 1986; McEnery, 1992). Finally, social scientists conduct formal ana-