The Self-information reference article from the English Wikipedia on 24-Jul-2004
(provided by Fixed Reference: snapshots of Wikipedia from wikipedia.org)

Self-information

For people who check facts
Within the context of information theory, self-information is defined as the amount of information that knowledge about (the outcome of) a certain event, adds to someone's overall knowledge. The amount of self-information is expressed in the unit of information: a bit.

By definition, the amount of self-information contained in a probabilistic event dependends only on the probability that the event happens. More specifically: the smaller this probability is, the larger is the self-information associated with receiving information that the event indeed occurred.

Further, by definition, the measure of self-information has the following property. If an event C is composed of two mutually independent events A and B, then the amount of information at the proclamation that C has happened, equals the sum of the amounts of information at proclamations of event A and event B respectively.

Taking into account these properties, the self-information H(A) associated with event A that has a probability is defined as:

bits. This definition, using the binary logarithm function, complies with the above conditions.

This definition can be rewritten as:

(bits).

Examples

H('tail') = log2 (1/0.5) = log2 2 = 1 bits of information. H('four') = log2 (1/(1/6)) = log2 (6) = 2.585 bits. H('throw 1 is two & throw 2 is four') = log2 (1/Pr(throw 1 = 'two' & throw 2 = 'four')) = log2 (1/(1/36)) = log2 (36) = 5.170 bits.
This outcome equals the sum of the individual amounts of self-information associated with {throw 1 = 'two'} and {throw 2 = 'four'}; namely 2.585 + 2.585 = 5.170 bits.