Plant Molecular Biology 42: 25–43, 2000.
© 2000 Kluwer Academic Publishers. Printed in the Netherlands.
Examining rates and patterns of nucleotide substitution in plants
Spencer V. Muse
Program in Statistical Genetics, Department of Statistics, North Carolina State University, Raleigh,
NC 27695–8203, USA (fax 919-515-1909; e-mail: firstname.lastname@example.org)
Key words: chloroplast genes, maximum likelihood, molecular clocks, nucleotide substitution models, rates of
Driven by rapid improvements in affordable computing power and by the even faster accumulation of genomic
data, the statistical analysis of molecular sequence data has become an active area of interdisciplinary research.
Maximum likelihood methods have become mainstream because of their desirable properties and, more impor-
tantly, their potential for providing statistically sound solutions in complex data analysis settings. In this chapter, a
review of recent literature focusing on rates and patterns of nucleotide substitution rates in the nuclear, chloroplast,
and mitochondrial genomes of plants demonstrates the power and ﬂexibility of these new methods. The emerging
picture of the nucleotide substitution process in plants is a complex one. Evolutionary rates are seen to be quite
variable, both among genes and among plant lineages. However, there are hints, particularly in the chloroplast, that
individual factors can have important effects on many genes simultaneously.
Statistical methods for molecular evolutionary
Analyzing the growing collection of molecular se-
quence data presents some of the most exciting chal-
lenges to today’s biologists and statisticians. Sequence
data have become a ubiquitous component of many
diverse biological disciplines, and there are numerous
opportunities for innovative data analysis applications.
Molecular sequences are used to address questions
about evolutionary relationships, population biology,
paternity and identity, protein structure, and gene
function. At the same time, the ﬁeld of statistics is ex-
periencing its own revolution of sorts. The remarkable
gains in affordable computing power are leading to
new statistical methodologies that are more accurate,
powerful, and ﬂexible than their predecessors. In the
presence of these changes, close interactions between
lab scientists and statisticians allow for the identiﬁca-
tion of interesting and important biological questions,
and at the same time insure that correct and powerful
analytical tools will be available for data analysis.
In this paper, I focus on the study of nucleotide
substitution rates in plants, so in the following para-
graphs I present some core elements dealing with the
statistical analysis of substitution rates. The methods
discussed here are a mixture of modern and more
traditional procedures. The emergence of likelihood
methodologies as mainstream techniques for the study
of molecular evolution (for recent reviews, see [41,
20, 21]) has been especially important in recent years,
so I provide a rather extensive discussion of these
After discussing a variety of statistical tools used
in molecular evolutionary studies, I close with a brief
survey of studies of nucleotide substitution rates in
plants. The works surveyed make extensive use of the
methods described in the ﬁrst sections.
Likelihood methods in statistics
Methods basedon the likelihood function represent the
most well developed body of statistical inference pro-
cedures. In spite of their generally desirable theoretical
properties, computational burden has made many like-
lihood methods prohibitively slow. However, gains in