Is it possible to have 2 different but equal size real number sets that have the same mean and standard deviation?

5.6k Views Asked by At

By inspection I notice that

  • Shifting does not change the standard deviation but change mean. {1,3,4} has the same standard deviation as {11,13,14} for example.

  • Sets with the same (or reversed) sequence of adjacent difference have the same standard deviation. For example, {1,3,4}, {0,2,3}, {0,1,3} have the same standard deviation. But the means are different.

My conjecture: There are no two distinct sets with the same length, mean and standard deviation.

Question

Is it possible to have 2 different but equal size real number sets that have the same mean and standard deviation?

9

There are 9 best solutions below

3
On BEST ANSWER

$-2,-1,3$ and $-3,1,2$ both have a mean of $0$ and a standard deviation of $\sqrt\frac{14}{3}$.

1
On

Yes. Two sets of numbers has the same mean and the same SD iff their sum and the sum of their squares match.

The set $\{1,2,3\}$ has sum $6$ and squares' sum $14$. The set $\{x,y,z\}$ the same mean and SD iff $$\begin{cases}x+y+z=6\\x^2+y^2+z^2=14\end{cases}$$ This is the intersection of a spherical surface and a sectioning plane, that has certainly infinitely many points.

12
On

The example by auscrypt settles the question, but maybe it's worth mentioning why this should be obvious by considering degrees of freedom.

Mean and standard deviation are two quantities. A collection of $m$ real numbers has $m$ degrees of freedom. Specifying the mean and standard deviation removes two degrees of freedom, leaving $m-2$. So as long as $m > 2$, there should still be lots of room to have different sets with the same mean and standard deviation.

EDIT: This was intended as a heuristic, rather than a proof, but rigorous arguments can be made. For example, suppose $A$ and $B$ are two $m$-tuples with the same mean, such that $\sigma(A) > \sigma(B)$ (where $\sigma$ denotes standard deviation), $C$ and $D$ two $m$-tuples with the same mean such that $\sigma(C) < \sigma(D)$. Then for any $t$, $t A + (1-t) C$ and $t B + (1-t) D$ have the same mean, and (by the Intermediate Value Theorem) there exists $t \in [0,1]$ such that they have the same standard deviation. If $A-B$ and $C-D$ are linearly independent these will not be the same.

2
On

Let $A = \{x_1, x_2, \dots, x_n\}$ add up to $n\mu$.

Then $B = \{2\mu-x_1, 2\mu-x_2, \dots, 2\mu-x_n\}$ will also add up to $n\mu$.

We will also have \begin{align} \sum_{i=1}^n (2\mu - x_i)^2 &= 4n\mu^2 -4\mu\sum_{i=1}^n x_i + \sum_{i=1}^n x_i^2\\ &= 4n\mu^2 - 4n\mu^2 + \sum_{i=1}^n x_i^2\\ &= \sum_{i=1}^n x_i^2 \end{align}

Hence the sets $A$ and $B$ have the same mean and standard deviation.

As a side note, the $i^{th}$ and $(n+1-i)^{th}$ rows of many $n\times n$ magic squares, when $n$ is odd, have this property.

0
On

You seem to like the set $\{1,3,4\}$. Here are six more three element sets having the same mean and standard deviation as your given set. \begin{align*} &\frac{202}{171} & &\frac{1}{171} \left(583+\sqrt{19842}\right) & &\frac{1}{171} \left(583-\sqrt{19842}\right) \\ &\frac{688}{171} & &\frac{1}{171} \left(340+\sqrt{27861}\right) & &\frac{1}{171} \left(340-\sqrt{27861}\right) \\ &\frac{544}{171} & &\frac{1}{171} \left(412-\sqrt{62421}\right) & &\frac{1}{171} \left(412+\sqrt{62421}\right) \\ &\frac{32}{19} & &\frac{1}{19} \left(60-\sqrt{581}\right) & &\frac{1}{19} \left(60+\sqrt{581}\right) \\ &\frac{27}{19} & &\frac{1}{38} \left(125-\sqrt{1689}\right) & &\frac{1}{38} \left(125+\sqrt{1689}\right) \\ &-\frac{2}{3} \left(-4+\sqrt{7}\right) & &\frac{1}{6} \left(16+2 \sqrt{7}\right) & &\frac{1}{3} \left(8+\sqrt{7}\right) \end{align*} and, more generally, let $(x,y)$ be any point on the ellipse given by the equation $$ x^2 + xy + y^2 -8x -8y + 19 = 0 $$ and set $z = 8 - x - y$. This triple of values has the same mean and standard deviation as does $\{1,3,4\}$. (This is found by eliminating $z$ from the system mean$(x,y,z) = {}$mean$(1,3,4)$ and stddev$(x,y,z) = {}$stddev$(1,3,4)$, i.e., $x+y+z = 8, x^2 + y^2 + z^2 - xy - xz - yz = 7$.)

0
On

There are an infinite number of 3 element real number sets with any given real mean and any given positive real standard diviation.

Without loss of generality lets assume that $\mu = 0$ and $\sigma = 1$. Once we have a soloution set for these parameters we can find one for any parameters by scaling it to get the desired standard deviation, then shifting it to get the desired mean

$$x + y + z = 0$$

$$\frac{x^2 + y^2 + z^2}{3} - 0 = 1$$

$$x^2 + y^2 + z^2 = 3$$

We have two equations and 3 unknowns, so lets treat x as a parameter and solve for y and z.

$$y = -x -z$$

$$x^2 + (-x -z)^2 + z^2 = 3$$

$$x^2 + (x^2 + 2xz +z^2) + z^2 = 3$$

$$2z^2 + 2xz + (2x^2 -3) = 0$$

$$z = \frac{-2x\pm\sqrt{(2x)^2-8(2x^2-3)}}{4}$$

$$z = \frac{-2x\pm\sqrt{4x^2-16x^2+24}}{4}$$

$$z = \frac{-2x\pm\sqrt{-12x^2+24}}{4}$$

For this to have real soloutions we need the following inequality to hold.

$$-12x^2+24 \geqslant 0$$

$$12x^2 \leqslant 24 $$

$$x^2 \leqslant 2 $$

$$- \sqrt{2} \leqslant x \leqslant \sqrt{2} $$

There are an infinite number of $x$ values that satisfy this inequality, therefore there are an infinite number of sets of 3 real numbers with $\mu = 0$ and $\sigma = 1$ therefore there are an infinite number of sets of three real numbers with any given real mean and any given positive* real standard deviation.

* Negative standard diviations don't make any sense, and a zero standard deviation means all numbers are equal.

0
On

A simple way to find a counterexample to the conjecture is to focus on sets whose values are symmetrical about zero. This ensures that the two sets have the same mean, and also simplifies calculation of their standard deviations.

Let $a,b$ be any real numbers. Then the set $\{a,b,-a,-b\}$ has mean zero and SD $\sqrt{(2a^2+2b^2)/4}$. Now let $c$ be any real number not equal to any member of that set and such that $c^2 < a^2+b^2$. Let $d$ be given by:

$$d = \sqrt{a^2 + b^2 - c^2}$$

implying $a^2 + b^2 = c^2 + d^2$. Then the set $\{c,d,-c,-d\}$ has the same mean and SD.

1
On

It is possible, and a classical example that uses this fact to illustrate that simple descriptive statistics like mean and standard deviation can mislead you, is the Anscombe's quartet. It comprises four sets of eleven (x,y) points that have the same: mean and standard deviation (in x and y), same correlation, same linear regression parameters, and same R-squared, yet are qualitatively very different.

1
On
    /*
     * you can generate samples as fibonacci series for any given mu and sigma as follows.
     */
    static double[] fibonacci_samples(int n, double mu, double sigma)
    {
        double[] x= new double[n];
        double shift = mu;
        double scale = Math.Sqrt(
            n * sigma * sigma 
            / (
                (1 - fibonacci(n + 1)) * (1 - fibonacci(n + 1)) * (1 + fibonacci(n - 1) * fibonacci(n - 2))
                + 2 * (1 - fibonacci(n + 1)) * fibonacci(n) * (fibonacci(n - 1) * fibonacci(n - 1) - (n + 1) % 2)
                + fibonacci(n) * fibonacci(n) * fibonacci(n) * fibonacci(n - 1))
            );

        double min = (1 - fibonacci(n + 1));
        double max = fibonacci(n) ;

        for (int i=0; i < n; i++)
        {
            x[i] = shift + min * scale;
            max = max + min;
            min = max - min;
        }
        return x;
    }