1022-7954/02/3811- $27.00 © 2002
Russian Journal of Genetics, Vol. 38, No. 11, 2002, pp. 1339–1342. Translated from Genetika, Vol. 38, No. 11, 2002, pp. 1580–1584.
Original Russian Text Copyright © 2002 by Ruanet, Badaeva.
Species identiﬁcation based on karyotype is impor-
tant for genetic, molecular biological, and phylogenetic
studies. C-banding and in situ hybridization of chromo-
somes with the use of various markers make it possible
to obtain a pattern of heterochromatic (HC) chromo-
some regions that is speciﬁc for each species. However,
the deciphering of this pattern, i.e., species identiﬁca-
tion based on the amount, distribution, and intensity of
staining of HC blocks is a difﬁcult problem . We
studied the possibility to use artiﬁcial neural network
(ANN) technologies for optimization and automation
of this process. These technologies are based on teach-
ing a computer program with the use of a set of tasks
with known solutions, which ensures their advantage
over the programs based on rigid algorithms [2–4].
ANNs are effective for classiﬁcation and prediction if
the known relationships are markedly nonlinear [2–4],
because the formation of the system of heredity, like all
natural processes, is complicated, and the regular pat-
terns are subject to exceptions . The user collects a
representative sample and then runs the learning algo-
rithm, which automatically perceives the data structure.
Note that all ANN’s information on the problem is con-
tained in the set of problems serving as examples.
Therefore, the ANN learning ability directly depends
on the number of examples in the teaching sample and
on how comprehensively these examples describe the
main problem that the ANN is to solve [4, 6].
ANNs do not require such detailed formalization of
data as, e.g., statistical methods do. In general, any
parameters that could inﬂuence the solution of a given
problem may be used as input ﬁelds [7, 8].
The material of the study comprised one diploid and
six polyploid species of genus
genomes. On the one hand, these species have large
chromosomes, and their phylogeny is well studied. On
the other hand, C-banding reveals various and speciﬁc
patterns of the distribution of HC regions in the karyo-
types of different species from genus
; the vari-
ation is accounted for by differences in the staining
intensity, number, and location of the bands. This is
related to ampliﬁcation, deletions, redistribution of
DNA repeats, and chromosome rearrangements that
have occurred in the course of evolution [9–11].
Data formalization and ANN teaching are the most
difﬁcult and sophisticated stages of work involving
ANNs. Schematically, the learning proceeds as follows.
The researcher has a database (a set of teaching pairs).
The database is divided into two unequal parts: the
greater part serves as a teaching database, and the
smaller part, as a testing database. The researcher
inputs the teaching database into the ANN and receives
an answer. If the error is small, the ANN learning is
considered completed. If the error is large, the proce-
dure is repeated. The quality of the learning is tested
using the testing database. A successful testing of a set
of problems with known answers that are absent in the
teaching sample is regarded as a criterion indicating
that the goal has been attained, i.e., the expert system
has been created [2, 3].
We used the same technique, with a total of
36 examples of problems. The teaching and testing
databases comprised 30 and 6 problems, respectively.
Each genome was described by 43 parameters. Then,
the set of problems that had not been included into the
teaching database was presented to the taught ANN.
The examples of problems for the teaching database
were idiograms of C-banded chromosomes of the
genomes of three tetraploid
), and three hexaploid species, includ-
) (Fig. 1). As a tool for solving
the problem, we chose multilayer perceptron (MLP),
the type of ANN architecture that is currently most
widely used [4, 12–14]. We used the NeuroPro 0.25
software (a freeware product by V.G. Tsaregorodtsev,
Institute of Computing Mathematics, Siberian Divi-
sion, Russian Academy of Sciences). When optimizing
The Use of Artificial Neural Networks for Species Identification
Based on Analysis of
V. V. Ruanet
and E. D. Badaeva
Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991 Russia;
fax: (095) 135-12-89; e-mail: firstname.lastname@example.org
Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991 Russia
Received May 20, 2002
—The possibilities of the use of artiﬁcial neural networks (ANNs) for identiﬁcation of some polyploid
species of genus
based on the idiograms of their
genomes were demonstrated.