Estimating value of sum of products using of two variables using Central Limit Theorem.

76 Views Asked by At

Consider this distirbution,

$S=a_0\cdot b_0 + a_1\cdot b_1 + a_2\cdot b_2 + \cdots +a_{n-1}\cdot b_{n-1}$

Now, I want find the expected value of $S$ when, $a_i$'s are selected from $\{0,1\}$ with equal probability. And $b_i$'s are uniformly selected from $[-1/2,1/2]$. All values are i.i.d.

According to my calculation, $\mu_a=0\ \&\ \mu_b=0$, $E[a^2]=1/2\ \&\ E[b^2]=(b-a)^2/12=1/12$

As mean is zero in the combined distribution, the variance of the joint distribution is $E[a^2\cdot b^2]=E[a^2]E[b^2]=1/24$. Hence, the variance of $S=n/24$ and $\sigma=\sqrt{n/24}$.

Now, in this paper (page 9), the authors say that

Each entry of the vector $Ws$ is a sum of $n$ (or around $n/2$ in the case where $s \in \{0, 1\}^n$) rational numbers in the interval $[−1/2, 1/2]$. Assuming the entries of $W$ are uniformly distributed then the central limit theorem suggests that each entry of $Ws$ has an absolute value roughly $1/4\sqrt{n/2}$.

I did not understand how the authors arrived at this value.

It is to be noted that the values of $W$ are sampled uniformly from $[-1/2,1/2]$ similar to the $b_i$'s and the values of $s$ is sampled from $\{0,1\}$ with equal probability similar to the $a_i$'s.

Also, In section 2.1 the authors say that the expected value (Euclidean norm) of a $n$ dimensional vector sampled from a discrete Gaussian distribution with s.d $\sigma$ is $\sqrt{n}\sigma$. So, according to the authors, $S$ is a summation $n/2$ values ($b_i$) sampled uniformly from $[−1/2, 1/2]$, so can we say that central limit theorem guarantees that the distribution of $b_i$ follow a Gaussian distribution with $\sigma=1/4$? I appreciate your help.

EDIT : $\mu_a$ will be $1/2$ and not $0$. But it still does not solve the issue. as $\mu_a \cdot \mu_b=0$

1

There are 1 best solutions below

0
On BEST ANSWER

Comment continued: Assuming my guesses in (c) of my Comment are correct, here is a simulation of a million sums $S$ for $n = 20.$ By part (a) of my Comment, you should have $E(S) = 0.$ Also, you say you should have $\sigma = \sqrt{20/24} = 0.9128709,$ which is approximately confirmed by the simulation. For large $n,$ you will have $S$ nearly (but not exactly) normal. It seems that $n = 20$ is large enough to get a good approximation to normality.

m = 10^6;  s = numeric(m);  n = 20
for(i in 1:m) {
  a = rbinom(n, 1, .5);  b = runif(n, -.5,.5)
  s[i] = sum(a*b) }
mean(s);  sd(s)
## -0.0004134797    # aprx E(S) = 0 
## 0.9128541        # aprx SD(S) = 0.9128709

The histogram below shows the approximate distribution of $S$ from simulation together with the well-fitting density function of $\mathsf{Norm}(0, .9129).$

enter image description here

Of course, the simulation doesn't 'prove' anything, but its results may encourage you that you are mainly on the right track. You should be able to clarify your statement of the problem and write a convincing solution for general $n$.