ISSN 10227954, Russian Journal of Genetics, 2016, Vol. 52, No. 2, pp. 125–140. © Pleiades Publishing, Inc., 2016.
Published in Russian in Genetika, 2016, Vol. 52, No. 2, pp. 146–163.
Defining the term “gene” has become a rather
complicated task in modern science. One approach is
to use such terms as “matreshka” gene, “nested” gene
and “overlapping open reading frames.” Historically,
the emergence of new data required changing the pre
viously proposed concept of the gene. However, the
new concept has not entirely supplanted the earlier
definitions, which has allowed the coexistence of sev
eral concepts of the gene at the same time [1, 2].
The definition of the gene in 1970–1980s limited it
to the region corresponding to a mature messenger
RNA (mRNA) and, in particular, the open reading
frame (ORF) , as a potentially translated sequence
that consists of a string of sense codons in one frame,
beginning with a start codon and ending with a stop
codon, while the intergenic space in eukaryotes was
considered as nonfunctional sequences that could not
be transcribed. The gene concept was based on the
assumption that transcription is limited to the known
protein coding genes (PCGs) and several genes encod
ing socalled structural RNA such as tRNA or rRNA.
The discovery of introns  and the use of sensitive
methods of analysis of the transcriptome changed the
understanding of the nature of the gene . Micro
chips with overlapping DNA probes (tiling microar
rays)  and deep sequencing of RNA  revealed a
The article was translated by the authors.
myriad of transcripts covering almost the entire
human genome .
In addition, it turned out that various mechanisms
of gene expression are defined not only by the number
of PCGs but also by alternative transcription initiation
sites, alternative splicing and transcript editing [9, 10].
Even long intergenic regions that were originally con
sidered to lack any function could be transcribed.
Almost every nucleotide of the DNA of the human
genome corresponds to a nucleotide included in at
least one RNA transcript . However, the synthesis
of each fulllength sense transcript is accompanied by
no less than 100 short, abortively synthesized RNAs
[11, 12]. Awareness of the abundance and diversity of
noncoding transcripts is expressed in the term “dark
matter” borrowed from astrophysics .
Thus, the great variety of RNA transcripts that do
not encode proteins led to the modification of the gene
concept. One of the definitions is “a locatable region
of genomic sequence, corresponding to a unit of
inheritance, which is associated with regulatory
regions, transcribed regions and/or other functional
sequence regions” . Another definition of the gene
is “a union of genomic sequences encoding a coherent
set of potentially overlapping functional products” .
The development of molecular genetics led to a
change in the understanding of what a PCG is, and the
estimated number of PCGs is still being reduced .
In the first phase of the project on whole human
genome sequencing, approximately 100000 PCGs
“Matreshka” Genes with Alternative Reading Frames
E. V. Sheshukova
, A. V. Shindyapina
, T. V. Komarova
, and Yu. L. Dorokhov
Vavilov Institute of General Genetics, Russian Academy of Science, Moscow, 119991 Russia
Belozersky Institute of PhysicoChemical Biology, Moscow State University, Moscow, 119991 Russia
Received September 2, 2015
—Although a relatively small part of the human genome contains protein encoding genes, the latest
data on the discovery of alternative open reading frames (ORFs) in conventional mRNAs has highlighted the
expanded coding potential of these genes. Until recently, it was believed that each mRNA transcript encodes
a single protein. Recent proteogenomics data indicate the existence of exceptions to this rule, which greatly
changes the usual meaning of the term “gene.” The topology of a gene with overlapping ORFs resembles a
Russian “matreshka” toy. There are two levels of “matreshka” genetic systems. First, the chromosomal level,
when the “nested” gene is located within introns and exons of the main chromosomal gene, both in the sense
and antisense orientation relative to the external gene. The second level is a mature mRNA molecule con
taining overlapping ORFs or an ORF with an alternative start codon. In this review, we will focus on the prop
erties of “matreshka” genes of the second type and methods for their detection and verification. Particular
attention is paid to the biological properties of the polypeptides encoded by these genes.
: gene, open reading frame, alternative start codon, noncanonical start codon, “matreshka” gene
AND THEORETICAL ARTICLES