Calculating mean of sequence given means of subsets

757 Views Asked by At

Let $(s_n) = \{s_1,s_2,\dots,s_n\}$ be a sequence of real numbers. Let us assume that we do not know the the value of any $s_i, \ i\in \{1,2,\dots,n\}$, but instead the sequence $(s_n)$ has been divided in some unknown way to $k$ non-overlapping subsequences and let the length of $j$th, $j\in \{1,2,\dots,k\}$ such a subsequence be $l_j$. We can assume that all lengths $l_j$ are greater than zero and thus smaller than $n$.

Assume we have been given only the set of means of these subsequences $(a_k) = \{a_1,a_2,\dots,a_k\}$, e.g. $a_1 = \frac{s_2+s_5+s_{n-2}}{3},$ thus $l_1 = 3.$

Question: is it possible to use only the values of $(a_k)$ and $n$ to find out the mean of $(s_n)$?

If $k = n$ or if $k=1$, the question is trivial. If $l_j$ are equal for all values of $j$, the mean of $(a_k)$ is equal to the mean of $(s_n)$.

Let us find all possible ways to represent $n$ as a sum of $k$ integers, call such a representation a set of weights. For example, if $n=5$ and $k=3$, $(1,2,2)$ is a set of weights. If we consider every possible way to multiply the values of $(a_k)$ with corresponding values in a set of weights, add them together and divide the result by $n$, at least one of these is the true mean of $(s_n)$, namely the one with the set of weights $(l_1,l_2,\dots,l_k)$. However, we can then only say that the true mean is in this list of possibilities found using this method.

1

There are 1 best solutions below

6
On BEST ANSWER

No, without the weights it is impossible. If you have the weights, then the answer is:

$$\frac{1}{n}\sum_{i=1}^ka_il_i$$

But if you don't have the weights, then the averages no longer contain enough information. This is alluded to in the final paragraph. For example, consider the sequences $1, 1, 2, 2$ and $1, 1, 1, 2$. We could split the first into $1, 1$ and $2, 2$ and the second into $1, 1, 1$ and $2$. In both cases, we get $k=2$ and $(a_1, a_2)$ = $(1, 2)$. But clearly the averages of the two sequences differ, so $k$, $n$ and the vector of $a_i$ is clearly not enough information to determine the average of the initial sequence. Entropy has won.