Determining precise value using imprecise measurements.

83 Views Asked by At

Suppose an accurate scale reports weight rounded to the nearest unit. If it reports the weight of an object as "6", all I know is that it could actually weigh anywhere between 5.5 and 6.5. But I need a more precise value for the weight.

Instead I could weigh myself twice, once holding the object and once not holding it, and take the difference as the measurement. I then repeat the same procedure with nine more people. 4 of us report "5" and 6 of us report "6", for an average result of 5.6.

And with 100 people, I might get 42 "5"s and 58 "6"s, for an average of 5.58.

If 5.58 actually is the correct weight, consider these results:

#!/usr/bin/env python3
import random
actual = 5.58
for digits in range(1,10):
    n = 10 ** digits
    sum = 0
    for i in range(n):
        test = random.random()
        sum += round(test + actual) - round(test)
    print(n, sum/n)

./weigh.py 
10 5.2
100 5.55
1000 5.58
10000 5.5871
100000 5.58281
1000000 5.580323
10000000 5.5802115
100000000 5.57997507
1000000000 5.5799838

The more people, the greater the precision of the resulting average, but what is that precision?

What is the formula that gives the statistical precision based on the number of measurements?

f(number-of-samples, %-confidence) = decimal-places

Or gives the number of samples required to produce a specified precision:

f(decimal-places, %-confidence) = number-of-samples

2

There are 2 best solutions below

0
On

The variance of the uniform distribution from $a$ to $b$ is $\frac 1{12}(b-a)^2$ Variances add if the variables are uncorrelated, so the variance of the sum of $n$ variables is $\frac n{12}(b-a)^2$. The standard deviation of the sum goes as $\sqrt n$ and the standard deviation of the average as $\frac 1{\sqrt n}$

0
On

Your idea seems to be that the fractional part of the weight of a person (in pounds) is a random variable uniformly distributed on $[0,1].$ This is probably not exactly true but the actual distribution may be uniform enough for your purposes, and it seems like a reasonable thing to assume as an approximation of the actual distribution.

So let's say the weight of the object is $a + b$ where $a$ is an integer and $-\frac12 \leq b < \frac12.$ And suppose you do two weighings of a person of weight $w + x$ (where $w$ is an integer and $-\frac12 \leq x < \frac12$), with and without the object, so that the weight without the object is $w.$

Then the weighing with the object comes out to $w + a + 1$ if $b + x \geq \frac12,$ $w + a$ if $-\frac12 \leq b + x < \frac12,$ and $w + a - 1$ if $b + x < -\frac12.$ Subtracting $w,$ the "weights" of the object are respectively $a + 1,$ $a,$ and $a - 1.$

Note that if have also weighed the object by itself to get the result $a,$ if you ever get $a + 1$ by weighing it with a person you know that $b > 0,$ and if you ever get $a - 1$ you know that $b \leq 0.$

But even without knowing the result of weighing the object by itself, we know that the result of weighing someone with and without the object is a binary function of $x$:

  • If $b > 0,$ you will get the result $a + 1$ when $x \geq \frac12 - b$ and $a$ otherwise.

  • If $b \leq 0,$ you will get the result $a$ when $x \geq -\frac12 - b$ and $a - 1$ otherwise.

Now consider a random set of people engaged in these weighings, where the weight of the $i$th person is $W_i + X_i$ where $W_i$ is an integer and $-\frac12 \leq X_i < \frac12.$ Moreover, assume the $X_i$ are iid random variables uniformly distributed over $\left[-\frac12,\frac12\right).$

In the case $b >0,$ for the $i$th person we get the result $a+1$ with probability $b.$ In the case $b \leq 0,$ we get the result $a$ with probability $b + 1.$

In either case, we get the higher of two possible results with probability $p,$ where $p$ is the fractional part of the weight of the object.

So the series of weighings is really just a series of independent Bernoulli trials, each with probability $p,$ and the number of "larger" weighings is a binomal variable with parameters $n,p$ where $n$ is the number of people weighed. The confidence interval of your measurement is the same as the confidence interval of a binomial variable with known parameter $n$ and unknown parameter $p$ for a given observed ratio of "success." Note that you get better precision if $p$ is close to $0$ or $1,$ and the least precision when $p$ is near $\frac12.$ But if you also take into account the result of weighing the object by itself, you actually get more precision for results that are very close to $\frac12$ than for results that are close but not so close, because you know whether $p < \frac12$ or $p \geq \frac12.$