Matching: Calculated data vs. sample from the real world

54 Views Asked by At

I have a small set of measured values from the real world. It is a rather small sample (10 from 1000). Actually there is no way for me to increase the sample, it is just possible to use 10 values whereas 1000 values are available. Its a random sample.

Now I have a theoretical model of the real world and here it is no problem to get 1000 values. However, I need to show whether the data of my model fits somehow to my rather small sample.

Actually I have no idea to start. I created a plot with both sets (theoretical and sample) and they are very similar - but some kind of proof is missing.

Some tip for further reading would be great.

1

There are 1 best solutions below

1
On

It is difficult to imagine why you can look at 10 observations but not at the 1000 that you say are 'available'. But I will try to answer your question.

There isn't much definitive you can tell from a sample as small as 10 as far as determining the shape of its distribution--specifically, whether it "fits" the population or the larger sample.

You could do a test or a confidence interval to see if the mean and variance of the small sample are consistent with the mean and variance of the large sample (or with the mean and variance of the population). With 10 observations you could tell if there is a big difference between the sample of 10 and the new sample of 1000 (or the population). You would not be able to detect small differences with only 10 observations.

For example, suppose I have a sample of 10 observations from $Norm(1, 1)$. In one such sample I got $\bar X = 0.713$ and $S = 1.169$, and so a 95% CI for the population mean $\mu$ is $(-0.123, 1.549)$ and a 95% CI for the population SD $\sigma$ is $(0.80, 2.13)$. Neither CI is very useful.

I also used a Kolmogorov-Smirnov goodness-of-fit (GOF) test to see if the sample is consistent with $Norm(1, 1)$, and it is. However, GOF tests also show that the sample is consistent with $Exp(1)$ and with $Unif(1 - \sqrt{3}, 1 + \sqrt{3}).$ These three different distributions have very different shapes. This illustrates my statement that there is very little information in a sample of size 10 as to the shape of the population from which it was drawn.

Note: In case you are interested, here is the sample I used above, sorted and rounded to three places.

  -1.097 -0.379 -0.219 -0.074  0.331  
   0.918  1.370  2.036  2.121  2.122