Pearson's Chi-squared Test
Peawrson's chi-squared test is a statistical test with several applications.
Contents
Test for Goodness-of-fit
Test for Homogeneity
Test for Independence
The test can be used to identify independence of two measures. For example:
. webuse auto (1978 automobile data) . recode mpg (0/15=1 "<15")(16/20=2 "16-20")(21/25=3 "21-25")(26/.=4 "26+"), gen(r_mpg) (74 differences between mpg and r_mpg) . tab r_mpg foreign, chi2 RECODE of | mpg | (Mileage | Car origin (mpg)) | Domestic Foreign | Total -----------+----------------------+---------- <15 | 9 1 | 10 16-20 | 24 4 | 28 21-25 | 12 10 | 22 26+ | 7 7 | 14 -----------+----------------------+---------- Total | 52 22 | 74 Pearson chi2(3) = 10.4175 Pr = 0.015
The null hypothesis is that the measures are independent. Therefore, each cell has an expected frequency based on the marginal frequencies.
- 'Domestic' has 70% of the 74 observations
'<15' has 13%
The intersection is expected to have 70%*13%=9% of 74 observations, equal to 6.66.
The chi-squared statistic is a sum over cells. Each cell's contribution is expressed in terms of expected e and observed o as (o-e)2/e. To calculate the probability of a given chi-squared statistic under the null hypothesis, the degrees of freedom must be calculated as well. This is expressed in terms of number of rows r and number of columns c as (r-1)(c-1).
In the above example, the chi-squared statistic is 10.42 with 3 degrees of freedom. The corresponding probability is 0.015. If the critical level is e.g. 0.05, then the null hypothesis would be rejected and the two measures would be identified as dependent.