Estimating $\sum_{k=1}^N a_kb_k$ given the means $\bar a_k,\bar b_k$ and determining the error

107 Views Asked by At

I need to calculate the following expression:

$$\sum_{k=1}^N a_k b_k$$

I know the average values of $a_k$ , defined as $\overline {a_k} = {\sum_{k=1}^N a_k \over N } $ and $b_k$ , defined as $\overline {b_k} = {\sum_{k=1}^N b_k \over N } $.

I don't know the standard deviation but one extra information that I have is that with some accuracy, I can say that all the population $k=,..,N$ are in one of the three different states and I know that what fraction are in each states. In terms of numbers, it means that $a_k$ can only have 3 values. I don't know those values, but I know that for instance, 80% of N have the first value, $a_1$, 19% have the value of $a_2$ and 1% the value of $a_3$. The same kind of information is provided for $b_k$

If only knowing these quantities, I have to make some approximation, I would like to know how much error I am producing with that approximation. $N$ is relatively big.

Any help is appreciated. :)

Narj

3

There are 3 best solutions below

0
On BEST ANSWER

One possibility, is a crude estimate of $\sum a_k b_k$ as $N \, \overline{a}\,\overline{b}$.

If we have $A_k, B_k$ as the sequences when $a_k, b_k$ are arranged in ascending order, we have the bounds (by Rearrangement inequality), $$\sum A_k B_{n-k+1} \le \sum a_k b_k \le \sum A_k B_k$$

We also have the following bounds (by Chebyshev inequality), $$\sum A_k B_{n-k+1} \le \frac1N \left(\sum a_k \right) \left(\sum b_k \right) = N \, \overline{a}\,\overline{b} \le \sum A_k B_k$$

So both numbers are in the same (albeit possibly large) interval. Unfortunately it is possible for both numbers to be at opposite extremes, unless you have some measure of how they could be spread and correlated. Not sure if you can do any better with the information at hand.

0
On

Mathematically, the largest the product can be (assuming all then numbers are positive and "reasonable size") is $N^2\overline{a_k}\overline{b_k}$ when all but one of the $a_k,b_k$ are zero. The smallest it can be is zero if $a_k$ can be zero. In an engineering sense, you probably have some information on how much variation is reasonable, which will allow much better assessment.

0
On

I don't think you can say much if you don't know anything about the distribution of the values within the sequences.

For example, you could have $$ a=(1,0,1,0,\ldots), \ b=(0,1,0,1,\ldots) $$ and in this case the product is zero, while the averages are both $1/2$. But if you switch $b$ to be equal to $a$, now the product is $N/2$ and the averages are still both $1/2$.