GOF of a statistical model

250 Views Asked by At

What actions constitute an analysis of goodness-of-fit for a regression model? What are the typical graphical displays used in this analysis of goodness-of-fit? Why do we assess the goodness-of-fit of a regression model? What can go wrong in the interpretation of the results and the use of a regression model that would be deemed to “fit poorly”?

1

There are 1 best solutions below

0
On

Each of these questions is much much deeper than an answer here on SE can provide. I'll present a basic discussion, and if you want to go deeper, I suggest that you find a book or a course (there are quite a few online courses out there) on the subject.

To start, I think the most important thing to understand about inference and statistics in general, is that there is no such thing as a correct model, and there is no "set of actions" that constitute an analysis. To make clear this harsh statement, recall that the reason for doing statistics in the first place is that you don't have all the information about the system you are trying to understand, or there is too much of it to swallow. For example, if I try to predict the weight of an animal given its height by sampling a bunch of animals, the intrinsic cause for an animal to weigh as much as it does is due to a ridiculous number of degrees of freedom, such as its nutrition, genetics, daily events that shift energy consumption or feeding etc. And even these are just simplifications of more fundamental and more complex information such as metabolic paths and all the fine details of the exact number molecules the animal has eaten. And again this can go deeper still. But obviously these data are pretty much useless to work with, and you can probably get aweay with a fairly decent result just by taking 20 or so animals that live close by and do the regression.

So you always work with less information than what generates the data points that you see in the end. This means that whatever your analysis was, to estimate its value you are using the same information-space you used to produce it to begin with (therefore in many cases a separation of the data to subsets such as "training data" and "test data" is useful and even required), which is again inherently partial. So no matter how many tests you apply on the model, it eventually boils down to whether or not you believe your results or not. And the important thing to remember here is that tests don't give you answers, only better reasons to believe or discard your model.

Now we can address your questions. Goodness-of-fit is a measure of how close the data "agrees" with your model. There are many tests, and the important thing is to always remember what the tests achieve and what their limits are. A common member of this class of tests and coefficients is $R^2$ or coefficient of determination. It basically measures how much of the variance is explained by your model. But it doesn't tell you whether or not your model makes sense. A good "sanity check" is to look at the residuals and see how they are distributed. If they look like a gaussian, the model might be okay, since it passes in the "middle" of the distribution of the data points (VERY loosely speaking). But in many cases you would see that the residuals have a distinct shape, and draw some noticeable curve about the predicted model. Added is a graph I made for $y(x)=\sin(x)+x$. pic

You can see in the upper figure the graph in blue, and a linear model in red. I got a score of $R^2=0.9994$ (and adjusted $R^2$ the same) which is suspiciously high. And of course common sense tells you that a linear fit doesn't tell the whole story. This case is simple, but I wanted to show: a) how a high $R^2$ doesn't mean a good model, since it only calculates what it calculates (a silly but important statement); and b) how the residuals demonstrate the sinusodial pattern much much clearer than the original graph, because the scale of the y-values of the graph was much larger than that of the sine part (which is bounded by $\pm1$). Also note that the fact that the residuals here showed that there is a feature of the data that is not included in the model, doesn't necessarily mean you need to throw it away. Sometimes you just want the general shape or "leading order terms", and you simply don't care about local fluctuations. So again the lesson here is the necessity of common sense.

As you may see, numbers are blinding, therefore deep understanding of their calculation and meaning is crucial to make decent conclusions.

Some more commmon benchmark tests and scores are:

  1. $RMSE$,
  2. Pearson's $\chi^2$ test,
  3. p-value,

and the list goes on...

Finally, we mentioned that residuals that are not normally distributed is a plausible indicator of a poor fit, and you wanted more ways for things to go wrong. So here are some more common mistakes:

  1. overfitting the data,
  2. correlation vs. causation,
  3. mistreating outliers,
  4. improper error analysis.

I hope I was able to convey the message that data analysis is not a matter of applying a "readymade set of tests" as a techincal procedure, but rather a careful understanding of what you're doing and why the stuff you do mean what you think they mean. Good luck!