ISSN 0032-9460, Problems of Information Transmission, 2007, Vol. 43, No. 3, pp. 167–189.
Pleiades Publishing, Inc., 2007.
Original Russian Text
J.-F. Coeurjolly, R. Drouilhet, J.-F. Robineau, 2007, published in Problemy Peredachi Informatsii, 2007, Vol. 43, No. 3,
Normalized Information-Based Divergences
Universit´ePierreMend`es-France, Grenoble, France
Received April 11, 2006; in ﬁnal form, May 16, 2007
Abstract—This paper is devoted to the mathematical study of some divergences based on
mutual information which are well suited to categorical random vectors. These divergences are
generalizations of the “entropy distance” and “information distance.” Their main characteristic
is that they combine a complexity term and the mutual information. We then introduce the
notion of (normalized) information-based divergence, propose several examples, and discuss
their mathematical properties, in particular, in some prediction framework.
Shannon’s information theory, usually just called information theory, was introduced in 1948 .
The theory is aimed at providing means for measuring information. More precisely, the amount of
information in an object may be measured by its entropy and may be interpreted as the length of
the description of the object in some encoding way. In the Shannon approach, the objects to be
encoded are assumed to be outcomes of a known source. Shannon’s theory also provides the notion
of mutual information (related to two objects), which plays a central role in many applications,
from lossy compression to machine learning methods.
Several authors noted that it would be useful to modify the mutual information such that the
resulting quantity becomes a metric in a strict sense. As a ﬁrst example, [2, 3] introduced the
entropy distance deﬁned as the sum of conditional entropies. Other interesting measures are the
information distance  and its normalized version, the similarity metric, introduced in  in the
context of the Kolmogorov complexity theory. More precisely, the information distance is deﬁned
as the maximum of the conditional Kolmogorov complexities. The similarity metric is universal
in the sense deﬁned by the authors and is not computable since it is based on an uncomputable
notion of the Kolmogorov complexity.
Recent papers have demonstrated that applications of suitable versions of the similarity metric
are of use in areas as diverse as genomics, virology, languages, literature, music, handwritten digits,
and astronomy . To apply the metric to real data, the authors have to replace the use of the non-
computable Kolmogorov complexity with an approximation obtained by using standard real-world
compressors: GenCompress for genomics , the Normalized Compression Distance (NCD) for mu-
sic clustering , and the Normalized Google Distance (NGD) for automatic meaning discovery 
are examples of eﬀective compressors. To include the information distance and similarity metric in
a framework based on information theory concepts, we make use of the principle that the expected
Kolmogorov complexity equals the Shannon entropy; an interested reader is referred to [10–12] for