The Likelihood principle reference article from the English Wikipedia on 24-Jul-2004
(provided by Fixed Reference: snapshots of Wikipedia from

Likelihood principle

See the real Africa
In statistics, the likelihood principle is a controversial principle of statistical inference which asserts that all of the information in a sample is contained in the likelihood function.

A likelihood function is a conditional probability distribution considered as a function of its second argument, holding the first fixed. For example, consider a model which gives the probability density function (in the discrete case, a probability mass function) of observable random variables X as a function of a parameter θ. Then for a specific value of x, the function L(θ) = P(X = x | θ) is a likelihood function of θ. Two likelihood functions are considered equivalent if either is a scalar multiple of the other; the likelihood principle says that all information relevant to inferences about the value of θ is found in the equivalence class.

Table of contents
1 Example
2 The law of likelihood
3 Historical remarks
4 Arguments for and against the likelihood principle
5 References
6 External links



Then the observation that X = 3 induces the likelihood function

and the observation that Y = 5 induces the likelihood function

These are equivalent because each is a scalar multiple of the other. The likelihood principle therefore says the inferences drawn about the value of θ should be the same in both cases.

The difference between observing X = 3 and observing Y = 5 is only in the design of the experiment: in one case, one has decided in advance to try five times; in the other, to keep trying until three successes are observed. The outcome is the same in both cases. Therefore the likelihood principle is sometimes stated by saying:

The inference should depend only on the outcome of the experiment, and not on the design of the experiment.

The law of likelihood

A related concept is the law of likelihood, the notion that the extent to which the evidence supports one parameter value or hypothesis against another is equal to the ratio of their likelihoods. That is, P(X | a)/P(X | b) is the degree to which the data X support parameter value or hypothesis a against b. If this ratio is 1, the evidence is indifferent, and if greater or less than 1, the evidence supports a against b or vice versa.

Combining the likelihood principle with the law of likelihood yields the consequence that the parameter value which maximizes the likelihood function is the value which is most strongly supported by the evidence. This is the basis for the widely-used method of maximum likelihood.

Historical remarks

The likelihood principle was first identified by that name in print in 1962 (Barnard et al., Birnbaum, and Savage et al.), but arguments for the same principle, unnamed, and the use of the principle in applications goes back to the works of R.A. Fisher in the 1920s. The law of likelihood was identified by that name by I. Hacking (1965). More recently the likelihood principle as a general principle of inference has been championed by Anthony W.F. Edwards. The likelihood principle has been applied to the philosophy of science by R. Royall.

Arguments for and against the likelihood principle

The likelihood principle is not universally accepted. Some widely-used methods of conventional statistics, for example many significance testss, are not consistent with the likelihood principle. By contrast, a likelihood-ratio test is based on the principle. Let us briefly consider some of the arguments for and against the likelihood principle.

Arguments in favor of the likelihood principle

From a Bayesian point of view, the likelihood principle is a consequence that falls out of Bayes' theorem. An observation A enters the formula,

only through the likelihood function, . In general, observations come into play through the likelihood function, and only through the likelihood function; no other mechanism is needed.

Arguments against the likelihood principle

The likelihood principle implies that any event that did not happen has no effect on an inference, since if an unrealized event does affect an inference then there is some information not contained in the likelihood function. However, unrealized events do play a role in some common statistical methods. For example, the result of a significance test depends on the probability of a result as extreme or more extreme than the observation. Thus, to the extent that such methods are accepted, the likelihood principle is denied.

The likelihood principle also yields results which seems to some people to be apparently paradoxical results. A commonly cited example is the optional stopping problem. Suppose I tell you that I tossed a coin 10 times and observed 7 heads. You might make some inference about the probability of heads. Suppose now I tell that I tossed the coin until I observed 7 heads, and I tossed it 10 times. Will you now make some different inference?

The likelihood function is the same in both cases: it is proportional to

According to the likelihood principle, the inference should be the same in either case. But this may seem to be something fishy; it might seem possible to argue to a foregone conclusion by simply tossing the coin enough. Such apparently-paradoxical results of this kind are considered evidence against the likelihood principle.


External links