Pearson's Chi-squared Test

Peawrson's chi-squared test is a statistical test with several applications.


Test for Goodness-of-fit


Test for Homogeneity


Test for Independence

The test can be used to identify independence of two measures. For example:

. webuse auto
(1978 automobile data)

. recode mpg (0/15=1 "<15")(16/20=2 "16-20")(21/25=3 "21-25")(26/.=4 "26+"), gen(r_mpg)
(74 differences between mpg and r_mpg)

. tab r_mpg foreign, chi2

 RECODE of |
       mpg |
  (Mileage |      Car origin
    (mpg)) |  Domestic    Foreign |     Total
-----------+----------------------+----------
       <15 |         9          1 |        10 
     16-20 |        24          4 |        28 
     21-25 |        12         10 |        22 
       26+ |         7          7 |        14 
-----------+----------------------+----------
     Total |        52         22 |        74 

          Pearson chi2(3) =  10.4175   Pr = 0.015

The null hypothesis is that the measures are independent. Therefore, each cell has an expected frequency based on the marginal frequencies.

The chi-squared statistic is a sum over cells. Each cell's contribution is expressed in terms of expected e and observed o as (o-e)2/e. To calculate the probability of a given chi-squared statistic under the null hypothesis, the degrees of freedom must be calculated as well. This is expressed in terms of number of rows r and number of columns c as (r-1)(c-1).

In the above example, the chi-squared statistic is 10.42 with 3 degrees of freedom. The corresponding probability is 0.015. If the critical level is e.g. 0.05, then the null hypothesis would be rejected and the two measures would be identified as dependent.


CategoryRicottone

Statistics/PearsonsChiSquaredTest (last edited 2025-07-01 00:58:22 by DominicRicottone)