# Likelihood principle

In statistics, the**likelihood principle**is a controversial principle of statistical inference which asserts that all of the information in a sample is contained in the likelihood function.

A likelihood function is a conditional probability distribution considered as a function of its second argument, holding the first fixed. For example, consider a model which gives the probability density function (in the discrete case, a probability mass function) of observable random variables *X* as a function of a parameter θ.
Then for a specific value of *x*, the function L(θ) = P(*X* = *x* | θ) is a likelihood function of θ. Two likelihood functions are considered equivalent if either is a scalar multiple of the other; the likelihood principle says that all information relevant to inferences about the value of θ is found in the equivalence class.

Table of contents |

2 The law of likelihood 3 Historical remarks 4 Arguments for and against the likelihood principle 5 References 6 External links |

## Example

*X*is the number of successes in five independent Bernoulli trials with probability θ of success on each trial, and*Y*is the number of independent Bernoulli trials needed to get three successes, again with probability θ of success on each trial.

*X*= 3 induces the likelihood function

*Y*= 5 induces the likelihood function

The difference between observing *X* = 3 and observing *Y* = 5 is only in the design of the experiment: in one case, one has decided in advance to try five times; in the other, to keep trying until three successes are observed. The *outcome* is the same in both cases. Therefore the likelihood principle is sometimes stated by saying:

*The inference should depend***only on the outcome**of the experiment, and**not on the design**of the experiment.

## The law of likelihood

A related concept is the **law of likelihood**, the notion that the extent to which the evidence supports one parameter value or hypothesis against another is equal to the ratio of their likelihoods.
That is, P(*X* | *a*)/P(*X* | *b*) is the degree to which the data *X* support parameter value or hypothesis *a* against *b*.
If this ratio is 1, the evidence is indifferent,
and if greater or less than 1, the evidence supports *a* against *b* or vice versa.

Combining the likelihood principle with the law of likelihood yields the consequence that the parameter value which maximizes the likelihood function is the value which is most strongly supported by the evidence. This is the basis for the widely-used method of maximum likelihood.

## Historical remarks

The likelihood principle was first identified by that name in print in 1962 (Barnard et al., Birnbaum, and Savage et al.), but arguments for the same principle, unnamed, and the use of the principle in applications goes back to the works of R.A. Fisher in the 1920s. The law of likelihood was identified by that name by I. Hacking (1965). More recently the likelihood principle as a general principle of inference has been championed by Anthony W.F. Edwards. The likelihood principle has been applied to the philosophy of science by R. Royall.

## Arguments for and against the likelihood principle

The likelihood principle is not universally accepted. Some widely-used methods of conventional statistics, for example many significance testss, are not consistent with the likelihood principle. By contrast, a likelihood-ratio test is based on the principle. Let us briefly consider some of the arguments for and against the likelihood principle.

### Arguments in favor of the likelihood principle

From a Bayesian point of view, the likelihood principle is a consequence that falls out of Bayes' theorem.
An observation *A* enters the formula,

### Arguments against the likelihood principle

The likelihood principle implies that any event that did not happen has no effect on an inference, since if an unrealized event does affect an inference then there is some information not contained in the likelihood function. However, unrealized events do play a role in some common statistical methods. For example, the result of a significance test depends on the probability of a result as extreme or more extreme than the observation. Thus, to the extent that such methods are accepted, the likelihood principle is denied.

The likelihood principle also yields results which seems to some people to be apparently paradoxical results. A commonly cited example is the optional stopping problem. Suppose I tell you that I tossed a coin 10 times and observed 7 heads. You might make some inference about the probability of heads. Suppose now I tell that I tossed the coin until I observed 7 heads, and I tossed it 10 times. Will you now make some different inference?

The likelihood function is the same in both cases: it is proportional to

*argue to a foregone conclusion*by simply tossing the coin enough. Such apparently-paradoxical results of this kind are considered evidence against the likelihood principle.

## References

- G.A. Barnard, G.M. Jenkins, and C.B. Winsten. "Likelihood Inference and Time Series",
*J. Royal Statistical Society*, series A, 125:321-372, 1962. - Allan Birnbaum. "On the foundations of statistical inference".
*J. Amer. Statist. Assoc.*57(298):269Ö326, 1962.*(With discussion.)* - Anthony W.F. Edwards.
*Likelihood*. 1st edition 1972 (Cambridge University Press), 2nd edition 1992 (Johns Hopkins University Press). - Anthony W.F. Edwards. "The history of likelihood".
*Int. Statist. Rev.*42:9-15, 1974. - Ronald A. Fisher. "On the Mathematical Foundations of Theoretical Statistics",
*Phil. Trans. Royal Soc.*, series A, 222:326, 1922.*(On the web at: [1])* - Ian Hacking.
*Logic of Statistical Inference*. Cambridge University Press, 1965. - Richard M. Royall.
*Statistical Evidence: A Likelihood Paradigm*. London: Chapman & Hall, 1997. - Leonard J. Savage et al.
*The Foundations of Statistical Inference.*1962.

## External links