The Mutual information reference article from the English Wikipedia on 24-Jul-2004
(provided by Fixed Reference: snapshots of Wikipedia from wikipedia.org)

Mutual information

Time you got around to sponsoring a child
In probability theory, the mutual information between two variables X and Y is given by

where P(X) and P(Y) are the probability distributions of X and Y.

Table of contents
1 Properties of mutual information
2 Relation to other quantities
3 References

Properties of mutual information

If X and Y are independent, then I(X,Y) = 0, since P(X,Y) = P(X) P(Y) in that case.

Mutual information is symmetric: I(X,Y) = I(Y,X).

Mutual information is nonnegative: I(X,Y) ≥ 0.

Relation to other quantities

The mutual information can be equivalently expressed as

where H(X) and H(X|Y) are the unconditional and conditional entropy of X, likewise H(Y) and H(Y|X) are the unconditional and conditional entropy of Y, with

and

Since H(X) > H(X|Y), this proves the nonnegativity property stated above.

Mutual information can also be expressed in terms of the Kullback-Leibler divergence. Note that

Thus mutual information can be understood as a weighted Kullback-Leibler divergence: the more different the distributions P(X) and P(X|Y), the greater the information gain.

References

Athanasios Papoulis. Probability, Random Variables, and Stochastic Processes, second edition. New York: McGraw-Hill, 1984. (See Chapter 15.)