Consistency of two measurements including means and standard deviations

2.5k Views Asked by At

This is a simplified version of a real life experiment where we have done two experiments attempt to measure the same quantity and we obtained the results $0.8 \pm 0.1$ and $1.2 \pm 0.2.$ (That's all we know!)

How can we calculate the probability that these two measurements are consistent with each other (i.e. they are consistent with a single true value)?

2

There are 2 best solutions below

3
On

If we assume that the $\pm$ denotes a $x$ confidence interval and the confidence intervals are symmetrical and that there is a single true value $y$ then you can say:

$$p=\begin{cases} \frac{(1-x)^2}{4},&y\lt0.7\\ \frac{x(1-x)}{2},&0.7\le y\le0.9\\\\ \frac{(1-x)^2}{4},&0.9\lt y\lt1.0\\\\ \frac{x(1-x)}{2},&1.0\le y\le1.4\\\\ \frac{(1-x)^2}{4},&1.4\lt y\\\\ \end{cases}$$

Now if you add all this up, you get a value which is (surprisingly) $\in[0,1]$. You might be tempted to say that this is the probability that there is a single true value but you would be wrong!

And this is why - consider $y$ and $y+\Delta$. For $\Delta$ sufficiently large so that you are happy to say that they are distinct, the probabilities that they each fall into their respective intervals would be exactly the same - this is the definition of a confidence interval! So the probability that they are identical is equal to the probability that they are not identical for $\Delta$, and $-\Delta$ and $2\Delta$ and an infinite number of other $\Delta$ variants. Add up an infinite number of numbers $\gt0$ and you will soon have a number $\gt 1$ so it cannot represent a probability.

Without more information on the methodology, you can only state that the probability of a single true value is $\in[0,1]$, but then, what isn't?

0
On

As clarified in the comments, the intervals in the question are confidence intervals. In that case, it makes no sense to ask for the probability that the two measurements are consistent. Confidence intervals are constructed in a frequentist framework. It’s a common misconception that the true parameter has a certain probability (in your case, $68\%$) to lie within the confidence interval. Rather, the probability that you obtain a confidence interval that contains the true parameter is (in this case) $68\%$.

In other words, confidence intervals are about the probability of the data given the parameter, not about the probability of the parameter given the data. There is no such thing as the probability of the parameter given the data in a frequentist framework.

To illustrate the difference, let’s say you’re a sceptic about telepathy and you consider it extremely unlikely that telepathy exists. You conduct an experiment where someone repeatedly tries to guess, say, the number you rolled on a standard die. Say you model this by assuming that these guesses are independently correct with probability $p$. Your null hypothesis would be $p=\frac16$ (no telepathy).

Say you do this in a frequentist framework and the $99\%$ confidence interval for $p$ that you obtain is $[0.18,0.2]$. Now, since you’re a sceptic, this would certainly not lead you to say that you’ve established with $99\%$ probability that telepathy exists. Rather, you’d think that this was most likely a fluke and would redo the experiment.

What this shows is that the probability that the true parameter lies in the confidence interval cannot be specified without taking into account your prior beliefs about that probability. That can be done in a Bayesian framework, leading to a credibility interval, but it has no place in the frequentist construction of confidence intervals. Even if we somehow knew for a fact that there is no telepathy, we would still get confidence intervals such as the one above that don’t contain $\frac16$ about $1\%$ of the time.

What we can ask about in a frequentist framework is the significance with which we can reject the hypothesis that the parameters measured by these two measurements are equal. This is usually answered using Student’s $t$-test or Welch’s $t$-test. They both assume that your data arose from sampling a normal distribution, and they both require the numbers of samples taken in each measurement. Since you don’t have those, let’s assume that the total number of samples was large. Then both tests reduce to applying a two-tailed $Z$-test for the standard normal distribution to the test statistic

$$ z=\frac{\mu_1-\mu_2}{\sqrt{\sigma_1^2+\sigma_2^2}}\;, $$

which in your case yields a $p$-value of

$$ 1+\operatorname{erf}\left(-\frac1{\sqrt2}\left|\frac{1.2-0.8}{\sqrt{0.1^2+0.2^2}}\right|\right)\approx7\% $$

(where $\operatorname{erf}$ is the error function). Thus, assuming a large number of samples from a normal distribution, you can reject the hypothesis that the means are equal at a significance level of $7\%$. (Again, this does not mean that the probability for the means to be equal is $7\%$; nothing can be said about that without knowing your prior beliefs.)