A name for this statistical approximation?

86 Views Asked by At

Heuristic calculations suggest the following approximation (see counter-example in comments) for random sets $\{x_i\}_{i\in[N]}$ and $\{c_{i,j}\}_{i\in[N]}$ for $j\in[C]$ (the larger $C$ the better the approximation) $$ \left(\sum_{i=1}^N 1\right)\sum_{j=1}^C \left(\sum_{i=1}^N c_{i,j}x_i\right)^2 \approx \left(\sum_{i=1}^N x_i^2\right) \sum_{j=1}^C \left(\sum_{i=1}^N c_{i,j}\right)^2 $$ I realize the similarity to Cauchy-Schwarz but I'm still stuck on why this seems to hold. Is there a name for this relationship? Any reasons it should or should not hold?

A statistical interpretation is that $\{x_i\}_{i\in[N]}$ is a data set with mean $0$ and variance $\left(\sum_{i=1}^N x_i^2\right)/N$. We calculate the variance of the dot products of $\{x_i\}_{i\in[N]}$ with $C$ i.i.d. random sets $\{c_{i,j}\}_{i\in[N]}$ for $j\in[C]$. We approximate this dot product variance by multiplying the variance of $\{x_i\}_{i\in[N]}$ with the variance across the $C$ sets $\{c_{i,j}\}_{i\in[N]}$ of the sum of values $\sum_{i=1}^N c_{i,j}$ in each $j\in[C]$ set. That is, $$ \frac{\sum_{j=1}^C \left(\sum_{i=1}^N c_{i,j}x_i\right)^2}{C} \approx \left(\frac{\sum_{i=1}^N x_i^2}{N}\right)\left(\frac{\sum_{j=1}^C \left(\sum_{i=1}^N c_{i,j}\right)^2}{C}\right) $$ Any name for this pattern in the statistical context? Any reasons it should or should not hold?