= Collinearity Test = Multicollinearity is a fundamental failure of the assumptions in a regression model. A '''collinearity test''' is useful to determine if a model is appropriately specified. <> ---- == Variance Inflation Factor == Every parameter in a regression model can be evaluated by it's '''variance inflation factor''' ('''VIF'''). This is the ratio of a parameter's variance under a full model to its variance under a model including only that parameter. This can be interpreted as how much a variance is inflated by collinearity, or as how much a variance is inflated compared to if a parameter had 0 collinearity with other parameters. As a result, VIFs are a measure for multicollinearity in a regression model. Higher values indicate greater collinearity. A threshold of 10 can be a good benchmark. Generally, VIF is calculated as the inverse of '''tolerance'''. (Therefore it is also possible to test multicollinearity with tolerance directly, as opposed to VIF, but this isn't as easily interpreted.) Generally, tolerance for a parameter is calculated as ''1 - R^2^'' where ''R^2^'' is the coefficient of determination from regressing all other independent variables onto the variable in question. ---- == Example == Copied from [[https://stats.oarc.ucla.edu/stata/faq/how-can-i-check-for-collinearity-in-survey-regression/|here]]: {{{ . use https://stats.idre.ucla.edu/stat/stata/notes/hsb2, clear . generate rw = read*write /* create interaction of read and write */ . svyset [pw=math], strata(ses) pweight: math VCE: linearized Single unit: missing Strata 1: ses SU 1: FPC 1: . svy: regress rw write read (running regress on estimation sample) Survey: Linear regression Number of strata = 3 Number of obs = 200 Number of PSUs = 200 Population size = 10529 Design df = 197 F( 2, 196) = 5258.91 Prob > F = 0.0000 R-squared = 0.9916 ------------------------------------------------------------------------------ | Linearized rw | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- write | 49.77855 .948907 52.46 0.000 47.90723 51.64987 read | 55.3573 .9117403 60.72 0.000 53.55928 57.15533 _cons | -2703.949 55.95981 -48.32 0.000 -2814.306 -2593.591 ------------------------------------------------------------------------------ . display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2)) tolerance = .00843133 VIF = 118.60521 }}} ---- == Usage == === R === The [[R/OlsRr|olsrr]] package is recommended: {{{ > library(olsrr) > model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars) > ols_vif_tol(model) variables Tolerance VIF 1 disp 0.1252279 7.985439 2 hp 0.1935450 5.166758 3 wt 0.1445726 6.916942 4 qsec 0.3191708 3.133119 }}} === SAS === Use the `vif` option on [[SAS/Reg|PROC REG]]. {{{ proc reg data=LIBREF.TABLE; model DEPVAR = INDEPVARLIST / vif tol; run; }}} === Stata === The example demonstrates a generic computation in the form of [[Stata]] syntax. It could have been done interchangeably with [[Stata/Logit|logit]] or [[Stata/Regress|regress]]. A linear model is ordinarily flawed for a binary outcome variable, but the model is only being used for the coefficient of determination. There is also [[Stata/Estat|estat vif]], though it does not support all types of regressions and does not support weighted analysis at all. ---- CategoryRicottone