= Statistical Modeling: The Two Cultures =

'''Statistical Modeling: The Two Cultures''' (DOI: [[https://doi.org/10.1214/ss/1009213726]]) was written by Leo Breiman. It was published in ''Statistical Science'' (2001, vol. 16, no. 3).

The author describes the fields of statistics as having two approaches to problemsolving:
 1. The '''data modeling culture''' which matches a phenomena to a data-generating model, then tries to fit the model (i.e., estimate then interpret the parameters' coefficients) using measurements
    * "Every article started with: Assume that the data are generated by the following model: ..."
 1. The '''algorithmic modeling culture''' which estimates many models and optimizes for predictive accuracy

Broadly speaking, the author criticizes how ill-fitting data models are used to inappropriately claim significance of findings. By making fewer assumptions about data generation, the latter 'culture' leads to more robust predictions.
 * Generally the latter's models still assume data is i.i.d.
 * "At one point, some years ago, I set up a simulated regression problem in seven dimensions with a controlled amount of nonlinearity. Standard tests of goodness-of-fit did not reject linearity until the nonlinearity was extreme. Recent theory supports this conclusion. Work by Bickel, Ritov and Stoker (2001) shows that goodness-of-fit tests have very little power unless the direction of the alternative is precisely specified. The implication is that omnibus goodness-of-fit tests, which test in many directions simultaneously, have little power, and will not reject until the lack of fit is extreme."
   * TODO: sounds like a replicable experiment!
 * Residual analysis not a plausibly falsifiable test with more than 2 or 3 dimensions

Author predicts that complicated Bayesian models will become more popular as the former 'culture' runs into more problems that do not fit the classical parametric data models.

'''Rashomon effect''': crowding of 'good' models leads to '''instability'''. A common practice is to apply feature bagging, [[Stata/Stepwise|stepwise predictor omission]], etc., to create a more interpretable or parsimonious parametric model. The fit model minimizes ''R^2^'', but often there are many different models (i.e., different parameters were retained) that get very close. As a result, a small change in the training data leads to selection of a very different model without much change in stated significance. The latter 'culture' has solved this problem with aggregating models, e.g. [[Statistics/Bagging|bagging]].

Author argues that predictive power and interpretability are at natural odds.

Parametric models are not robust to high '''dimensionality''', whereas several non-parametric models (e.g. [[Statistics/SupportVectorMachines|support-vector networks]]) only converge given high dimensionality.



----
CategoryRicottone CategoryTodoReplication