Confidence interval
In statistics, confidence intervals are the most prevalent form of interval estimation. If U and V are statistics (i.e., "observable" random variables) whose probability distribution depends on some unobservable parameter θ, and the relation
| Table of contents |
|
2 Concrete practical examples 3 See also |
It is very tempting to misunderstand this statement in the following way. We used capital letters U and V for random variables; it is conventional to use lower-case letters u and v for their observed values in a particular instance. The misunderstanding is the conclusion that
How to misunderstand confidence intervals
so that after the data has been observed, a conditional probability distribution of θ, given the data, is inferred. For example, suppose X is normally distributed with expected value θ and variance 1. (It is grossly unrealistic to take the variance to be known while the expected value must be inferred from the data, but it makes the example simple.) The random variable X is observable. (The random variable X − θ is an example of one that is not observable, since its value depends on θ.) Then X - θ is normally distributed with expectation 0 and variance 1; therefore
But if probabilities are construed as degrees of belief rather than as relative frequencies of occurrence of random events, i.e., if we are Bayesians rather than frequentists, can we then say we are 90% sure that the mass is between 82 − 1.645 and 82 + 1.645? Many answers to this question have been proposed, and are philosophically controversial. The answer will not be a mathematical theorem, but a philosophical tenet.
For users of frequentist methods, the explanation of a confidence interval can amount to something like: "The confidence interval represents values for the population parameter for which the difference between the parameter and the observed estimate is not statistically significant at the 10% level". Critics of frequentist methods suggest that this hides the real and, to the critics, incomprensible frequentist interpretation which might be expressed as: "If the population parameter in fact lies within the confidence interval, then the probability that the estimator either will be the estimate actually observed, or will be closer to the parameter, is less than or equal to 90%". Users of Bayesian methods, if they produced a confidence interval, might by contrast say "My degree of belief that the parameter is in fact in the confidence interval is 90%".
[I will add an example of a "recognizable subset" here; i.e., a case in which the data themselves make the epistemic conclusion dubious.]
Here is one of the most familiar realistic examples. Suppose X1, ..., Xn are an independent sample from a normally distributed population with mean μ and variance σ2. Let
Concrete practical examples
Then
has a Student's t-distribution with n − 1 degrees of freedom. Note that what distribution it has does not depend on the values of the unobservable parameters μ and σ2; i.e., it is a pivotal quantity. If c is the 95th percentile of this distribution, then
Consequently
and we have a 90% confidence interval for μ.