How to describe a statistical dataset more precisely?

181 Views Asked by At

I am a newbie in stat and I need to be able to do this for a GIS application:

Say I have the following dataset of only two possible values $0$ and $1$:

$0,0,0,1,1,1,0,1$

The mean would equal $\frac{4}{8}=0.5$

Thus to find out how many $1$'s were in the intial dataset, we would simply multiply the mean by the total number of values and get $4$.

Now consider a dataset with $\it three$ distinct values, $0,0.5$ and $1$

$0,0.5,0.5,1,1,1,0,0$

The mean would again equal $\frac{4}{8}=0.5$. In this case, would there be a way of determining how many values of $0.5$ and how many values of $1$ were in the dataset? Perhaps using the standard deviation, sum, median, or range?

1

There are 1 best solutions below

0
On BEST ANSWER

This is not a good way to store data, but if you are absolutely certain that data can only take the values $0, \frac12, 1$ then it is usually possible to recover the numbers of each from Mean, Standard Deviation and Sum, and you may be able to something similar with any three possible values. Let's call $m=\text{Mean}$, $d=\text{Standard Deviation}$ and $s=\text{Sum}$.

First, deal with the case where $m=0$ is zero (the others will be zero too). Then all the individual values are $0$ too, though you do not know how many there are.

Otherwise you can find the number of values with $n=\frac{s}{m}$.

Next you need to know whether your standard deviation calculation involves division by $n$ or by $n-1$, so in your example of $0,0.5,0.5,1,1,1,0,0$ does it give about $0.4330127$ or about $0.462910$? If the latter, you may have an issue when $n=1$ but then the only value is $m$. In what follows, I will assume the latter, but if not then replace $(n-1)$ by $n$.

If the number of $1$s is $i$, the number of $\frac12$s is $h$ and the number of $0$s is $z$ then you have the number of terms, sums of terms, and sums of squares of terms giving
$$n = i + h + z$$ $$s = i + h/2$$ $$(n-1)d^2 + sm = i + h/4$$

and solving these three simultaneous equations gives

$$i = 2(n-1)d^2 + 2sm - s$$ $$h = 4s - 4(n-1)d^2 - 4sm$$ $$z = n + 2(n-1)d^2 + 2sm - 3s.$$

In you example of $0,0.5,0.5,1,1,1,0,0$ this will have $m=0.5$, $d=0.462910$, $s=4$ leading to $n=8$, $i=3$, $h=2$, $z=3$, perhaps with minor rounding such as reading $2.99999$ as $3$.