Collinearity Test
Multicollinearity is a fundamental failure of the assumptions in a regression model. A collinearity test is useful to determine if a model is appropriately specified.
Variance Inflation Factor
Every parameter in a regression model can be evaluated by it's variance inflation factor (VIF). This is the ratio of a parameter's variance under a full model to its variance under a model including only that parameter. This can be interpreted as how much a variance is inflated by collinearity, or as how much a variance is inflated compared to if a parameter had 0 collinearity with other parameters.
As a result, VIFs are a measure for multicollinearity in a regression model. Higher values indicate greater collinearity. A threshold of 10 can be a good benchmark.
Generally, VIF is calculated as the inverse of tolerance. (Therefore it is also possible to test multicollinearity with tolerance directly, as opposed to VIF, but this isn't as easily interpreted.)
Generally, tolerance for a parameter is calculated as 1 - R2 where R2 is the coefficient of determination from regressing all other independent variables onto the variable in question.
Example
Copied from here:
. use https://stats.idre.ucla.edu/stat/stata/notes/hsb2, clear . generate rw = read*write /* create interaction of read and write */ . svyset [pw=math], strata(ses) pweight: math VCE: linearized Single unit: missing Strata 1: ses SU 1: FPC 1: . svy: regress rw write read (running regress on estimation sample) Survey: Linear regression Number of strata = 3 Number of obs = 200 Number of PSUs = 200 Population size = 10529 Design df = 197 F( 2, 196) = 5258.91 Prob > F = 0.0000 R-squared = 0.9916 ------------------------------------------------------------------------------ | Linearized rw | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- write | 49.77855 .948907 52.46 0.000 47.90723 51.64987 read | 55.3573 .9117403 60.72 0.000 53.55928 57.15533 _cons | -2703.949 55.95981 -48.32 0.000 -2814.306 -2593.591 ------------------------------------------------------------------------------ . display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2)) tolerance = .00843133 VIF = 118.60521
Usage
R
The olsrr package is recommended:
> library(olsrr) > model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars) > ols_vif_tol(model) variables Tolerance VIF 1 disp 0.1252279 7.985439 2 hp 0.1935450 5.166758 3 wt 0.1445726 6.916942 4 qsec 0.3191708 3.133119
SAS
Use the vif option on PROC REG.
proc reg data=LIBREF.TABLE; model DEPVAR = INDEPVARLIST / vif tol; run;
Stata
The example demonstrates a generic computation in the form of Stata syntax. It could have been done interchangeably with logit or regress. A linear model is ordinarily flawed for a binary outcome variable, but the model is only being used for the coefficient of determination.
There is also estat vif, though it does not support all types of regressions and does not support weighted analysis at all.