Collinearity Test

Multicollinearity is a fundamental failure of the assumptions in a regression model. A collinearity test is useful to determine if a model is appropriately specified.


Variance Inflation Factor

Every parameter in a regression model can be evaluated by it's variance inflation factor (VIF). This is the ratio of a parameter's variance under a full model to its variance under a model including only that parameter. This can be interpreted as how much a variance is inflated by collinearity, or as how much a variance is inflated compared to if a parameter had 0 collinearity with other parameters.

As a result, VIFs are a measure for multicollinearity in a regression model. Higher values indicate greater collinearity. A threshold of 10 can be a good benchmark.

Generally, VIF is calculated as the inverse of tolerance. (Therefore it is also possible to test multicollinearity with tolerance directly, as opposed to VIF, but this isn't as easily interpreted.)

Generally, tolerance for a parameter is calculated as 1 - R2 where R2 is the coefficient of determination from regressing all other independent variables onto the variable in question.


Example

Copied from here:

. use https://stats.idre.ucla.edu/stat/stata/notes/hsb2, clear

. generate rw = read*write  /* create interaction of read and write */

. svyset [pw=math], strata(ses)

            pweight: math
          VCE: linearized
  Single unit: missing
     Strata 1: ses
         SU 1: 
        FPC 1: 

. svy: regress rw write read
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         3                  Number of obs      =       200
Number of PSUs     =       200                  Population size    =     10529
                                                Design df          =       197
                                                F(   2,    196)    =   5258.91
                                                Prob > F           =    0.0000
                                                R-squared          =    0.9916

------------------------------------------------------------------------------
             |             Linearized
          rw |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       write |   49.77855    .948907    52.46   0.000     47.90723    51.64987
        read |    55.3573   .9117403    60.72   0.000     53.55928    57.15533
       _cons |  -2703.949   55.95981   -48.32   0.000    -2814.306   -2593.591
------------------------------------------------------------------------------

. display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))
tolerance = .00843133 VIF = 118.60521


Usage

R

The olsrr package is recommended:

> library(olsrr)
> model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
> ols_vif_tol(model)
  variables Tolerance      VIF
1      disp 0.1252279 7.985439
2        hp 0.1935450 5.166758
3        wt 0.1445726 6.916942
4      qsec 0.3191708 3.133119

SAS

Use the vif option on PROC REG.

proc reg data=LIBREF.TABLE;
  model DEPVAR = INDEPVARLIST / vif tol;
run;

Stata

The example demonstrates a generic computation in the form of Stata syntax. It could have been done interchangeably with logit or regress. A linear model is ordinarily flawed for a binary outcome variable, but the model is only being used for the coefficient of determination.

There is also estat vif, though it does not support all types of regressions and does not support weighted analysis at all.


CategoryRicottone

Econometrics/CollinearityTest (last edited 2024-04-25 15:45:47 by DominicRicottone)