|
⇤ ← Revision 1 as of 2025-08-04 15:26:29
Size: 1982
Comment: Initial commit
|
Size: 2117
Comment: Content
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 19: | Line 19: |
| So refer to that document and bear in mind that the probit link function is the cumulative [[Statistics/NormalDistribution|normal distribution]] function. | So refer to that document, and bear in mind that the important difference is that the [[Statistics/GeneralizedLinearModel#Design|link function]] is instead the c.d.f. of the [[Statistics/NormalDistribution|normal distribution]] (i.e., ''Φ(.)''). |
| Line 29: | Line 29: |
| The probit function is the cumulative [[Statistics/NormalDistribution|normal distribution]] function, implying that the underlying outcome is continuous and normally-distributed, and then that the binary outcome of analysis is a categorization of the underlying outcome according to some threshold on the distribution. In some case, this is ''exactly'' what has been done to collect data, i.e. classifying individuals between high and low income according to the population mean. | The [[Statistics/GeneralizedLinearModel#Design|link function]] here is the c.d.f. of the [[Statistics/NormalDistribution|normal distribution]], implying that the underlying outcome is continuous and normally-distributed, and then that the binary outcome of analysis is a categorization of the underlying outcome according to some threshold on the distribution. In some cases, this is ''exactly'' what has been done to collect data, i.e. classifying individuals between high and low income according to the population mean. |
Probit Model
A probit model is a linear regression method for a binary outcome.
Contents
Design
A probit model is appropriate for fitting binary outcomes (i.e., 0 and 1) into a linear model (as y = Xb).
The probit function is extremely similar to the logit function. See the following comparison, noting that red is logit and blue is probit.
So refer to that document, and bear in mind that the important difference is that the link function is instead the c.d.f. of the normal distribution (i.e., Φ(.)).
Description
Fitting a model with both logistic and probit regression will usually lead to the same interpretation either way. The important differences are in the theoretical foundations and in model interpretation.
The link function here is the c.d.f. of the normal distribution, implying that the underlying outcome is continuous and normally-distributed, and then that the binary outcome of analysis is a categorization of the underlying outcome according to some threshold on the distribution. In some cases, this is exactly what has been done to collect data, i.e. classifying individuals between high and low income according to the population mean.
A logistic regression has coefficients interpreted in terms of odds, whereas the coefficients don't have any inherent meaning in a probit regression. They are related however; the probit model will estimate coefficients about 1.8 times that of a logistic model. This goes back to the fact that a probit model assumes a variance of 1 whereas a logistic model assumes a variance of π2/3. See this discussion.
