Goodness of Fit (question about fitting procedure)

163 Views Asked by At

I have some vector with data $X$ that I suspect is exponentially distributed (after some visual evaluation). I have fitted the exponential distribution to the data with the maximum likelihood estimator: $\lambda=(\frac{1}{N}\sum_{i=1}^N x_i)^{-1}$ for $x_i\in X$. $X$ contains about 9000 observations.

My problem relies is in how to assess the goodness of the fit. I have tried the Chi-square method and the Kolmogorov-Smirnov method by computing the likelyhood of $X$ being an observation of the fitted distribution. When I do this the $p$ value is very very small; the test absolutely rejects the hypothesis. But visually the fit looks really good. So I think I'm doing something wrong.

Checking some literature I have seen the following method:

  1. Fit the distribution with $X$, say we get a distribution $D_X$. Compute the Kolmogorov-Smirnov distance of the fit.
  2. Sample artificial data from $D_X$.
  3. Compute the KS distance now between the sampled artificial data and $D_X$.
  4. Repeat 2-3 many times to obtain distances $d_1,d_2...$.
  5. Compare $d_1,d_2..$ with $d$.

But I don't really understand this method because it looks like we assess the goodness of the fit without using the original data $X$.

I wonder if anyone has some advice on this. Thanks in advance.

1

There are 1 best solutions below

1
On BEST ANSWER

Keep in mind with null hypothesis significance testing (NHST) what you are actually testing is whether an effect is statistically detectable under the null hypothesis, not that its size is "significant" in the more typical modern meaning. Now also consider that NHST is often conducted with the null hypothesis being 0 effect, and it shouldn't be surprising that the null is rejected for any real world phenomena with enough data.

For example, let's say we have the null hypothesis that a given wheel is circular. If we measure a few dozen points with a crude ruler, we will likely not reject the null and happily keep rolling. But, if we precisely measure with a microscope at thousands of points, we will always eventually accumulate enough evidence to reject the null if our tools are sensitive enough. After all no real wheel is completely exactly circular. Even if it could have somehow been machined perfectly, gravity and temperature will still have deformed it.

This is analogous to your situation, where in the comments you've indicated that you have 9000 data points. This gives the NHST a lot of power to detect even small deviations. It's kind of like looking at that wheel under the microscope to see if it's perfectly round. What's the point? You should have known it wasn't perfectly round to begin with. The point is will the thing roll well enough? If we're talking about tires for go-charts there is going to be a completely different answer than roller bearings used in an MRI. Likewise, if your data comes from the real world, then of course it does not have a perfect exponential distribution. And even if you have all the reason in the world to believe the process that created it should be exponential, the real world likely has deformed it or its measurement somehow or other. The question is, is it good enough.

You need to specify what is good enough for your application and how to measure it. Perhaps the K-S statistic -- maximum distance between the ECDF and the hypothesized distribution is the most appropriate -- perhaps it's not. But, if you do use it, you should specify a different null hypothesis than the default of maximum distance of 0. Rather, specify that it is within your maximum tolerable distance of $t$, whatever $t$ is for your application. So your null becomes $|d| < t$ rather than the completely unrealistic $d = 0$.

The bootstrap method in the link you provided in your comments can help you develop this modification. It's intent is bootstrap corrected critical values for the problem of fitting distribution from the same data you are testing again. As noted in my comments, the procedure in the linked notes is similar to the one you have in your original post, but your step 3 is off -- you need to refit to an new $D_X^*$ at each iteration and measure against those rather than always $D_X$. With minor modification you also use this bootstrapping procedure to get estimates the critical values, or better, confidence intervals for a null specified within some tolerance range $|d| < t$ rather than precisely 0 difference $d = 0$.