Statistics - suitable test for value to belong to a list of values

188 Views Asked by At

I have a group of lists of numerical values representing a measurement of distance. In each group, the values are close to each other. Sometimes I have to pick one between two lists to add a new measurement to or I have to determine that the measurement doesn't statistically belong to the list it's been assigned to:

List A: $\{1.0, 1.0, 1.0, 0.998, 1.0,...\}$ Test Value $X = 0.96.$ Question: Does $X$ belong to List A?

The test should take into account how many elements are there in List A, such that if there is only one element {1.0} and the added value is close (within a given threshold), say 0.98 with threshold 0.05, then the new value is accepted. But if List A has 100 elements and all of them are 1.0 then the new value is rejected.

I'm somewhat familiar with Z-test, but I'm not sure it would work well because the $\sqrt{N}$ would be always $\sqrt{1}$ in the formula.

1

There are 1 best solutions below

2
On BEST ANSWER

If you have $n$ observations from a normal population with sample mean $\bar X$ and sample standard deviation $S$ then a 95% prediction interval for an additional observation from the same population is $$\bar X \pm t^*S\sqrt{1 + \frac{1}{n}},$$ where $t^*$ cuts 2.5% of the probability from the upper tail of Student's t distribution with $n-1$ degrees of freedom. [If $n$ is as large as 100, then $t^* \approx 2.0.$] This interval is intended to predict how far from $\bar X$ a new observation from the same population might fall. That is not exactly what you have in mind.

I have to say that the kind of ad hoc tinkering with data you have described seems inappropriate. One might have hoped that the original list of $n$ is a random sample from a particular population. To append other values after the fact, possibly when conditions have changed, makes the new sample of size $n+1$ seem possibly inauthentic. But if you feel you must do this, then appending a value only if it lies within the prediction interval given above might do the least harm. Better, of course, would be to get the sample right on the first go.


Example: If your original sample of 100 observations had $\bar X = 9.999$ and $S = 0.093,$ min = 9.77, max = 10.22, then the 95% PI was $(8.03, 11.97).$ The new observation 10.53 falls into the PI. If you append it to the original sample of 100, then the new sample mean is $\bar X_+ = 10.004$ and the new sample standard deviation is $S_+ = 0.107.$ The sample mean and SD have not been changed much by the tinkering. However, as a cautionary note, a boxplot of he 'enlarged sample' shows 10.53 as an outlier (see below), while the original sample showed no outliers.

enter image description here