Plant Molecular Biology 52: 627–642, 2003.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
Excess non-synonymous substitutions suggest that positive selection
episodes occurred during the evolution of DNA-binding domains in the
Arabidopsis R2R3-MYB gene family
, Michael T. Clegg
and Tao Jiang
Department of Computer Science and
Department of Botany and Plant Science, University of California,
Riverside, CA 92521, USA (
author for correspondence; e-mail firstname.lastname@example.org)
Received 8 August 2002; accepted in revised form 11 February 2003
Key words: Arabidopsis, molecular evolution, non-synonymous substitution, positive selection, R2R3-MYB gene,
It has been suggested that evolutionary changes in regulatory genes may be the predominant molecular mechanism
governing both physiological and morphological evolution. R2R3-AtMYB is one of the largest transcription factor
gene families in Arabidopsis. Using inferred ancestral sequences we show that several lineages in the R2R3-
AtMYB phylogeny experienced excess non-synonymous nucleotide substitution upon gene duplication, indicating
episodes of positive selection driving adaptive shifts early in the evolution of this gene family. A noise reduction
technique was then used to determine individual sites in DNA-binding domains (R2 domain and R3 domain) of
R2R3-AtMYB protein sequence that were favored by frequent non-synonymous substitutions. The analyses reveal
that the ﬁrst helix (helix1) and the second helix (helix2) in both R2 and R3 domains are characterized by more
frequent non-synonymous substitutions, and thus experienced signiﬁcantly higher positive selection pressure than
the third helix (helix3) in both domains. Previous MYB protein structure studies have suggested that helix1 and
helix2 in both R2 and R3 domains are involved in the characteristic packing of R2R3-AtMYB DNA-binding
domains. This suggests that excess non-synonymous substitutions in these helices could have resulted in MYB
recognition of novel gene target sites.
Comparative genomics is providing a powerful tool for
understanding how gene families arise and how gene
families expand to create new biological activities
and speciﬁcities. Moreover, comparisons between and
within genomes promise to reveal speciﬁc nucleotide
sites involved in historical adaptive changes. This
will in turn provide a major guide for the systematic
investigation of protein structure and function.
Arabidopsis thaliana offers important advantages
for basic researches in comparative genomics be-
cause the nucleotide sequence of its entire genome is
now available (Arabidopsis Genome Initiative, 2000).
Systematic analyses of the Arabidopsis genome are
creating a basis for detailed studies of genome struc-
ture and evolution (Blanc et al., 2000; Castresana,
2001; Sreekumar et al., 2001; Ball and Cherry, 2001).
Comparative analyses of Arabidopsis paralogous gene
family members should give us a better understand-
ing of the historical dynamics of gene duplication, and
these analyses may reveal how different evolutionary
lineages in a gene family have evolved new functions.
One of the major mechanisms for the evolution of
novel gene functions is gene duplication followed by
the functional divergence of duplicated genes (Ohno,
1967, 1970; Nei, 1969; Zhang et al., 1998). It has been
indicated that more than a third of the protein cod-
ing capacity in the completely sequenced eukaryotic
genomes up to date consists of duplicated genes. After
duplication, new gene copies often experience re-
laxed evolutionary constraints. It promotes functional