1022-7954/02/3806- $27.00 © 2002
Russian Journal of Genetics, Vol. 38, No. 6, 2002, pp. 659–663. From Genetika, Vol. 38, No. 6, 2002, pp. 793–798.
Original English Text Copyright © 2002 by Trifonov.
Reduction of complex systems to their building blocks
is one natural way to establish rules of their functioning
and, in case of developing systems, of their evolution.
The genomes, with sizes from about 300 bases for
viroids  to several billion base pairs for higher
eukaryotes, are likely to be built as well from some ele-
mentary structures not immediately obvious in the mac-
roscopically long genomic sequences. The genomes as
genetically and physically linked sets of genes probably
appeared as a result of fusion of previously autonomous
genes, at some early stage of evolution . The early
genes, therefore, would be the most likely units of the
genome structure. A scheme of distinct stages in the
protein evolution has been recently suggested [3, 4].
Two of the stages are thought to originate from an
important polymer-statistical property-ring closure [5,
6] of polypeptide chains , in one case and of duplex
DNA [2, 8], in another. The optimal size of the rings
(loops) formed by returning polymer chain trajectories
depends on the chain ﬂexibility and equals 25–30
amino-acid residues  and about 400 base pairs ,
respectively. The last ﬁgure corresponds to typical pro-
tein fold (domain) size of 100–150 amino acids [9, 10].
Historically, the ﬁrst indication that the proteins are,
probably, built of certain standard size units was the
observation by T. Svedberg, as early as in 1929 ,
that “the proteins…can, with regard to molecular
weight, be divided into four subgroups. The molecular
masses characteristic of the three higher subgroups
are—as a ﬁrst approximation—derived from the
molecular mass of the ﬁrst subgroup by multiplying by
the integers” . The ﬁrst estimate of the size of the
unit, about 160 amino acids, followed soon . The
above optimal DNA ring estimate for the protein unit
size is indistinguishable from the estimate by Svedberg,
considering low accuracy of both.
Thus, the originally circular fragments of DNA
encoding the proteins of typical fold size are good can-
didates for the genome units of which, probably, the
modern genomes were eventually formed by combina-
torial fusion of the units . In this review various man-
ifestations of the genome segmentation in the protein
and nucleotide sequences are discussed, and the esti-
mates of the genome unit size are presented.
RING (LOOP) CLOSURE
A ﬂexible polymer chain in a solution may acquire
a wide spectrum of shapes, trajectories, due to thermal
motion. Occasionally it returns to itself making a loop,
or circle, if the ends are kept together by some interac-
tion closing the loop. The loops could be of various
sizes, but the extremes are avoided if not excluded.
Indeed, statistical weight of the loops of small contour
length is limited because of sterical constraints. Large
loops, on the other hand, are more rarely formed since
the distant points less frequently come into contact. As
a result, a certain optimal most frequent size of the
loops exists, that, according to theory, depends on the
chain ﬂexibility usually measured by so-called persis-
tent length [5, 6]. Theoretical estimates for double-
stranded DNA give about 400 bp for the optimal ring
closure size . This is in full agreement with experi-
mental determinations in which DNA fragments of var-
ious lengths were covalently circularized by ligase
reaction . Important experiments have been con-
Elementary Units of Genome Structure
E. N. Trifonov
Department of Structural Biology, Weizmann Institute of Science, Rehovot, 76100 Israel;
fax: 972-8-934-2653, e-mail: firstname.lastname@example.org
Genome Diversity Center, Institute of Evolution, University of Haifa, Haifa, 31905 Israel;
fax: 972-4-824-6554, e-mail: email@example.com
Received December 4, 2001
—Numerous observations, measurements and calculations strongly indicate that both eukaryotic and
prokaryotic genomes are built as linear arrays of units of rather uniform size, about 400 base pairs. The units are likely
to correspond to early individual genes that existed, presumably, in form of DNA circles. Their combinatorial fusion
resulted eventually in formation of the early segmented genomes. The segmented structure of the genomes is, appar-
ently, still maintained by some structural selection pressures. Some of the units can be recognized by characteristic
sequence motifs at the borders of the units. Identiﬁcation and characterization of the units, their mapping on the
genomes should become an important prerequisite of genome comparisons and genome evolution studies.
This article was submitted by author in English.