Problem with calculating Z Score

190 Views Asked by At

I'm comparing several measurements to a standard sample, and I would like to calculate the Z score in order to quantify the gravity of the wrong measurements. For example, I've got this data:

a/b
2.20
2.20
2.20
2.21
2.21
2.20
2.20
2.20
2.20
2.20
2.20
2.20
2.20
2.21
2.20
2.20
2.20
2.20
2.20
2.20
2.19
2.21
2.21
2.21

I've calculated mean (2.20), standard deviation(0.005) and the standard sample is equal to 2.17. In conclusion the Z Score, for a random measurement, would be (2.19 - 2.17)/0.005 = 4and (2.20 - 2.17)/0.005 = 6 for the entire measured sample.

Why do I get a Z score so high, when the values are not so different?

For example, I've found this on the internet

Glucose Levels

Where the Z score is equal to 0.4 for +0.1 compared to the Standard Sample.

2

There are 2 best solutions below

1
On

If you plot and look closely at your data, you will understand what's going on. I have plotted mean by the red vertical and z= 1 on both sides by the green vertical lines. enter image description here

Since, your data has very low variance, even a small deviation from the mean, results in a very high z-score.

You can also visualize this from the formula itself, the numerator is the distance from mean, and the denominator is scaling factor. So small variance in the underlying data would find any new data (even slightly away from mean) as highly surprising (or high z score).

0
On

I put your data into Minitab statistical software, with summary statistics as follows:

Variable   N    Mean  SE Mean    StDev  Minimum      Q1  Median      Q3  Maximum      IQR
x         24  2.2021  0.00104  0.00509   2.1900  2.2000  2.2000  2.2075   2.2100  0.00750

There are only three distinct values among the 24 observations, tallied as follows:

   x  Count
2.19      1
2.20     17
2.21      6
  N=     24

Looking just at the sample, I see no reason to be suspicious that measurements are erratic or unusually variable. They differ from the sample mean 2.2021 (or median and mode 2.20) by only about 0.01.

I am not sure what you are trying to compute, but a common statistic to look at is the (estimated) standard error of the mean, which is $S/\sqrt{n} = 0.00509/\sqrt{24} = 0.00105.$ This is called SE Mean in the Minitab printout above.

An approximate Z-score relative to the standard value $\mu_0 = 2.17$ would be

$$ Z = \frac{\bar X - \mu_0}{S/\sqrt{n}} = \frac{2.2021 - 2.17}{0.00104} = 30.86538.$$

Technically speaking, because the population standard deviation $\sigma$ is unknown and estimated as $S = 0.00509,$ this should be called a T-statistic, so it's $T = 30.86538.$

If you are testing the null hypothesis that your 24 observations come from a normal population with mean $\mu_0 = 2.17,$ against the null hypothesis that they do not, then this T-statistic provides very strong evidence that they do not. Formally, this is testing $H_0: \mu = \mu_0$ vs. $H_a: \mu \ne \mu_0,$ where $\mu$ is the mean of the tested population (which produced your data). You would reject $H_0$ in favor of $H_a$ at the 5% level of significance if $|T| > 2.064,$ and your value of $|T|$ is much larger than that.

The number 2.064 comes from a table of Student's t distribution with degrees of freedom $n-1 = 24-1 = 23,$ or (to needlessly many decimal places) from statistical software as below:

qt(.975, 24)
## 2.063899