Statistical model validation

417 Views Asked by At

Why do we need to validate statistical models? Does the scope of this validation change depending on the intended use of the model? In general we can build models for two primary purposes: statistical inference and predictive modeling. Is the model validation process the same for both of these purposes? If not, how does it differ?

1

There are 1 best solutions below

0
On BEST ANSWER

In its most basic form, we validate statistical models to determine how well they fit the data. We want to be confident in the model and its performance before applying it to a data set. Sometimes unknown influencers for model A may appear and and cause error in model B. There are also issues with regressors differing between the building of the model and the use of the model.

The scope does change depending on the intended use of the model. The model should be tested in the environment for which it was developed before being passed along to the user. Also, If a model was developed to infer on an existing data set, it would likely do poorly to predict on new values. Another reason is to consider the end user. Most often the model developer has little control over the model’s final use, which means there is potential for misuse by the end user.

The model validation process will differ depending on its final intended use. One of the main differences between statistical inference and predictive modeling is that for inference you're making assumptions about the probabilistic structures of the model, so if data does not conform to these assumptions the inference will be incorrect. For predictive modeling, due to intent, are validating a model for its ability to predict new observations, which can be done by testing against a new data set or splitting the original data.