1022-7954/05/4107- © 2005 Pleiades Publishing, Inc.
Russian Journal of Genetics, Vol. 41, No. 7, 2005, pp. 814–821. Translated from Genetika, Vol. 41, No. 7, 2005, pp. 997–1005.
Original Russian Text Copyright © 2005 by Tarasov, Mustafaev, Abilev, Mel’nik.
Basic approaches to the quantitative study of the
relationship between the structure and activity of chem-
icals (QSAR study) were developed in the early 1960s
by Hansch and Fujita  and Free and Wilson .
These approaches are based on correlation analysis of
the relationship between the biological activities of
derivatives of a medicine and their physical and chem-
ical properties, including distribution coefﬁcients in the
octanol–water system, substituent structures, etc.
Since the 1990s, QSAR models have been applied to
relationships between mutagenic and carcinogenic
properties of chemicals and their structure. This possi-
bility arose as a result of construction of databases on
tests of chemicals for genotoxicity and carcinogenicity
and development of special software, including Derec
, CASE  Multicase [5, 6], TOPKAT , etc.
These programs used both physicochemical properties
and structural fragments as descriptors. The most efﬁ-
cient approach was developed by Klopman and Rosen-
kranz and implemented in the well-known program
CASE [8–12]. It involved generation of all possible
structural features (descriptors) of chemicals and statis-
tics-based recognition of those determining mutagenic
or carcinogenic activity. This approach allows identiﬁ-
cation of structural alerts related to not only known
mechanisms of mutagenic or carcinogenic effect but
even to unknown ones. However, the prediction of the
mutagenic and carcinogenic activity of chemicals by
any QSAR model in use is still little efﬁcient. For any
program, the concordance does not exceed 60–70%
. This is much less than the corresponding values
for pharmacological activity. To make the prediction of
genotoxicity more efﬁcient, the program Multicase
invokes search for additional descriptors, called modu-
lators [10, 12]. These modulators can be both physico-
chemical traits and structural features. Thus, properties
of a chemical are described with the use of more than
one descriptor: the main descriptor and modulators.
However, even this approach insufﬁciently increases
the efﬁciency of QSAR analysis. We have developed
another approach, involving generation of so-called
compound descriptors. Double or triple combinations
of primary structural descriptors are formed, and
descriptors markedly related to genotoxicity of chemi-
cal are selected with the use of the resulting compound
In this paper, we illustrate the efﬁciency of descrip-
tion of the mutagenic activity of chemicals with the use
of univocal double compound descriptors, which occur
solely in mutagenically active (biophores) or solely in
inactive (biophobes) compounds.
MATERIALS AND METHODS
The database for choosing the mutagen
and nonmutagen groups stores the results of tests per-
formed in germline cells of mice and rats, scoring her-
itable translocations, gene mutations at speciﬁc loci,
and chromosome aberrations. A substance was consid-
ered a mutagen if it displayed a mutagenic effect in at
least one of the tests.
The data for constructing the sample were retrieved
from the databases GAP (http://www.epa.gov), NTP
(http://ntp-server.niehs.nih.gov), Gene-Tox (http://toxnet.
nlm. nih.gov), and the database developed at the Labo-
ratory of Genome Instability, Institute of General
Genetics (Table 1).
Use of Compound Structural Descriptors
for Increasing the Efficiency of QSAR Study
V. A. Tarasov, O. N. Mustafaev, S. K. Abilev, and V. A. Mel’nik
Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991 Russia;
fax: (095)135-12-89; e-mail: firstname.lastname@example.org
Received October 12, 2004
—A new concept of describing the dependence of the mutagenic activity of a chemical substance on
its structure (QSAR analysis) is presented. It involves compound descriptors, which are combinations of unre-
lated fragments of molecular structure. Software has been developed to generate various structural fragments
of molecules and their combinations (ensembles) and select compound descriptors of statistical signiﬁcance for
the biological activity of a chemical. By examples of univocal compound descriptors consisting of two struc-
tural fragments and present only in active or only in inactive compounds, it has been shown that the efﬁciency
of QSAR study can be increased fourfold or more. The approach has been applied to a set of 105 compounds
whose mutagenic effect on rodent germline cells is known.
MODELS AND METHODS