Plant Molecular Biology 42: 703–717, 2000.
© 2000 Kluwer Academic Publishers. Printed in the Netherlands.
Organization and structural evolution of four multigene families in
Arabidopsis thaliana: AtLCAD, AtLGT, AtMYST and AtHD-GL2
Raquel Tavares, S
ebastien Aubourg, Alain Lecharny and Martin Kreis
Institut de Biotechnologie des Plantes, Laboratoire de Biologie du D´eveloppement des Plantes, Universit´ede
Paris-Sud, UMR CNRS 8618, 91405 Orsay Cedex, France (
author for correspondence)
Received 15 July 1999; accepted in revised form 28 December 1999
Key words: gene duplication, gene structure, genomics, intron distribution, retrotranscription
The Arabidopsis Genome Initiative has released up to now more than 80% of the genome sequence of Arabidopsis
thaliana. About 70% of the identiﬁed genes have at least one paralogue. In order to understand the biological
function of individual genes, it is essential to study the structure, expression and organization of the entire multigene
family. A systematic analysis of multigene families, made possible by the amount of genomic sequence data
available, provides important clues for the understanding of genome evolution and plasticity. In this paper, four
multigene families of A. thaliana are characterized, namely LCAD, HD-GL2, LGT and MYST. Members of HD-
GL2 and LCAD have already been reported in plants. The LGT genes specify proteins containing motifs of glycosyl
transferase. No plant genes similar to the LGT genes have been reported to date. The novel MYST family, most
likely plant-speciﬁc, encodes proteins with no identiﬁed function. Sequencing and in silico analysis led to the
characterization of 29 novel genes belonging to these four gene families. The organization, structure and evolution
of all the members of the four families are discussed, as well as their chromosome location. Expression data of
some of the paralogues of each family are also presented.
The systematic sequencing programmes of genomes
from prokaryotes, yeasts, nematodes and plants are
beginning to reveal the importance of gene families,
both in number and in size, for the organization and
evolution of the genomes. Gene families arise from
the duplication of ancestral genes, followed by the
divergence of both copies, leading to functional and
structural specialization (Ohno, 1970; Fryxell, 1996).
The resulting genes, with newly acquired speciﬁcities,
altered recognition properties or modiﬁed functions,
provide opportunities for the evolution of new anatom-
ical structures or physiological pathways (Henikoff
et al., 1997b). Proteins belonging to the same family
have the same biochemical function, but not neces-
sarily the same biological role. Adaptative evolution
The nucleotide sequence data reported will appear in the EMBL,
GenBank and DDBJ Nucleotide Sequence Databases under the
accession numbers AJ224338, AJ243015 and Y16848.
has played an important role in driving divergence fol-
lowing gene duplication events (Clegg et al., 1997).
During plant evolution, increased combinatorial pos-
sibilities of interaction between gene family products,
corresponding to an increased complexity of the fam-
ilies, correlates with increasingly complex body plans
and anatomical structures (Graham, 1995).
The understanding of the biological function of a
gene requires the analysis of the complete gene family.
Indeed, in order to be able to associate orthologues be-
tween different model organisms, and hence to assign
biological functions to these genes, the classiﬁcation
of every paralogue from each genome is an essential
step (Tatusov et al., 1997).
Starting from a key gene, homologous genes of
an organism of interest may be found by scrutiniz-
ing the sequence databases (Henikoff et al., 1997b).
An exhaustive and recurrent screening of known se-
quences and their systematic analysis are now made
possible by the rapid expansion of sequence databases.