= Statistical Modeling: The Two Cultures = '''Statistical Modeling: The Two Cultures''' (DOI: [[https://doi.org/10.1214/ss/1009213726]]) was written by Leo Breiman. It was published in ''Statistical Science'' (2001, vol. 16, no. 3). The author describes the fields of statistics as having two approaches to problemsolving: 1. The '''data modeling culture''' which matches a phenomena to a data-generating model, then tries to fit the model (i.e., estimate then interpret the parameters' coefficients) using measurements * "Every article started with: Assume that the data are generated by the following model: ..." 1. The '''algorithmic modeling culture''' which estimates many models and optimizes for predictive accuracy Broadly speaking, the author criticizes how ill-fitting data models are used to inappropriately claim significance of findings. By making fewer assumptions about data generation, the latter 'culture' leads to more robust predictions. * Generally the latter's models still assume data is i.i.d. * "At one point, some years ago, I set up a simulated regression problem in seven dimensions with a controlled amount of nonlinearity. Standard tests of goodness-of-fit did not reject linearity until the nonlinearity was extreme. Recent theory supports this conclusion. Work by Bickel, Ritov and Stoker (2001) shows that goodness-of-fit tests have very little power unless the direction of the alternative is precisely specified. The implication is that omnibus goodness-of-fit tests, which test in many directions simultaneously, have little power, and will not reject until the lack of fit is extreme." * TODO: sounds like a replicable experiment! * Residual analysis not a plausibly falsifiable test with more than 2 or 3 dimensions Author predicts that complicated Bayesian models will become more popular as the former 'culture' runs into more problems that do not fit the classical parametric data models. '''Rashomon effect''': crowding of 'good' models leads to '''instability'''. A common practice is to apply feature bagging, [[Stata/Stepwise|stepwise predictor omission]], etc., to create a more interpretable or parsimonious parametric model. The fit model minimizes ''R^2^'', but often there are many different models (i.e., different parameters were retained) that get very close. As a result, a small change in the training data leads to selection of a very different model without much change in stated significance. The latter 'culture' has solved this problem with aggregating models, e.g. [[Statistics/Bagging|bagging]]. Author argues that predictive power and interpretability are at natural odds. Parametric models are not robust to high '''dimensionality''', whereas several non-parametric models (e.g. [[Statistics/SupportVectorMachines|support-vector networks]]) only converge given high dimensionality. ---- CategoryRicottone CategoryTodoReplication