Pearson's chi-square test
Pearson's chi-square test (χ2) is one of a variety of chi-square tests ÃÂ statistical procedures whose results are evaluated by reference to the chi-square distribution. It tests a null hypothesis that the relative frequencies of occurrence of observed events follow a specified frequency distribution. The events must be mutually exclusive. One of the simplest examples is the hypothesis that an ordinary six-sided die is "fair", i.e., all six outcomes occur equally often. Chi-square is calculated by finding the difference between each observed and theoretical frequency, squaring them, dividing each by the theoretical frequency, and taking the sum of the results:
- O = an observed frequency
- E = an expected (theoretical) frequency, asserted by the null hypothesis
Pearson's chi-square is used to assess two types of comparison: tests of goodness of fit and tests of independence. A test of goodness of fit establishes whether or not an observed frequency distribution differs from a theoretical distribution. A test of independence assesses whether paired observations on two variables are independent of each other ÃÂ for example, whether people from different regions differ in the frequecy with which they report that they support a political candidate.
Pearson's chi-square is the original and most widely-used chi-square test.
The null distribution of the Pearson statistic is only approximated as a chi-square distribution. This approximation arises as the true distribution, under the null hypothesis, of the expected value is given by a Binomial distribution:
- p = probability, under the null hypothesis
- n = number of samples
When comparing the Pearson test statistic against a chi-squared distribution, the above binomial distribution is approximated as a Gaussian (normal) distribution:
See also Yates' correction for continuity, median test.