I'm looking at a random variable that takes vectors $\newcommand{\vv}{\mathbf{v}} \vv_1, \dotsc, \vv_n \in \mathbb{R}^d$ and calculates their average, after applying "blankout" noise to them. So we have independent Bernoulli random variables $\xi_1, \dotsc, \xi_n$ with parameters $p(\xi_i = 1) = p_i$. I multiply the $\newcommand{\vv}{\mathbf{v}} \vv_i$ by the $\xi_i$ and take the average of the ones that remain:
$$\newcommand{\vv}{\mathbf{v}} \frac{\xi_1 \vv_1 + \dotsb + \xi_n \vv_n}{\xi_1 + \dotsb + \xi_n}.$$
This might bother me because I'd be dividing by zero in the trivial case where I take the average of none of them. I suppose I could fix that by adding a dummy term to the denominator that will only show up if all $\xi_i$ are zero.
$$\newcommand{\vv}{\mathbf{v}} \frac{\xi_1 \vv_1 + \dotsb + \xi_n \vv_n}{\xi_1 + \dotsb + \xi_n + \prod_i(1 - \xi_i)}.$$
- Is there a clean way to calculate the mean and variance of such a thing*? (We could leave the $\prod$ term out if we want, and add some other fix.)
- Is there a better (perhaps standard) way to accomplish this? For instance, use constants $a_i>0$ such that $\mathbb{E}(a_i\xi_i) = \frac1n$, and write it as
$$\texttt{Avg} \approx \newcommand{\vv}{\mathbf{v}} a_1\xi_1 \vv_1 + \dotsb + a_n \xi_n \vv_n?$$
(This thing would have easy mean/variance.)
*Here I mean the mean/variance taken over the joint distribution of the $\xi_i$.)
The "beautiful" results are when we either consider $p_i=p$ for all $i \in \{1, ..., n\}$ (as in my comment above), or if we consider heterogeneous $p_i$ values but let $n\rightarrow \infty$. Here is an answer that considers $n\rightarrow \infty$.
Claim:
Suppose that $\{\xi_i\}_{i=1}^{\infty}$ is a sequence of independent Bernoulli random variables with $p_i = P[\xi_i=1]$ for $i \in \{1, 2, 3, ...\}$, and $$ \liminf_{n\rightarrow\infty} \frac{1}{n}\sum_{i=1}^n p_i > 0$$ Suppose $\{v_i\}_{i=1}^{\infty}$ is a deterministic sequence of vectors in $\mathbb{R}^n$. Suppose all vectors are bounded, so there is a $B\geq 0$ such that $||v_i||\leq B$ for all $i \in \{1, 2, 3, ...\}$. Then
$$ \lim_{n\rightarrow\infty}\left| \frac{\sum_{i=1}^n\xi_iv_i}{\sum_{i=1}^n \xi_i}-\frac{\sum_{i=1}^{n}p_iv_i}{\sum_{i=1}^np_i} \right| = 0 \quad \mbox{(with prob 1)} $$ In particular, $$ \lim_{n\rightarrow\infty} \frac{\sum_{i=1}^n\xi_iv_i}{\sum_{i=1}^n \xi_i} = \lim_{n\rightarrow\infty} \frac{\sum_{i=1}^{n}p_iv_i}{\sum_{i=1}^np_i} \quad \mbox{(with prob 1)}$$ whenever the right-hand-side limit exists.
This can be proven with the help of the following lemma.
Lemma:
Let $\{Y_i\}_{i=1}^{\infty}$ be independent random variables that are surely bounded by some constant $b\geq 0$, so that $|Y_i|\leq b$ for all $i \in \{1, 2, 3, ...\}$. Then $$ \lim_{n\rightarrow\infty} \left|\frac{1}{n}\sum_{i=1}^n (Y_i-E[Y_i]) \right| = 0 \quad \mbox{(with prob 1)} $$
Proof of claim:
Both $\{\xi_i\}_{i=1}^{\infty}$ and $\{\xi_i v_i\}_{i=1}^{\infty}$ are sequences of independent bounded variables and so by the lemma (applying the lemma component-wise for the case of the vector sequence $\{\xi_i v_i\}_{i=1}^{\infty}$): \begin{align} \epsilon_n &:= \frac{1}{n}\sum_{i=1}^n (\xi_iv_i - p_iv_i) \rightarrow 0 \quad \mbox{(with prob 1)} \\ \delta_n &:= \frac{1}{n}\sum_{i=1}^n (\xi_i-p_i) \rightarrow 0 \quad \mbox{(with prob 1)} \end{align} Hence \begin{align} \left| \frac{\sum_{i=1}^n\xi_iv_i}{\sum_{i=1}^n \xi_i}-\frac{\sum_{i=1}^{n}p_iv_i}{\sum_{i=1}^np_i} \right| &= \left|\frac{\frac{1}{n}\sum_{i=1}^n\xi_iv_i}{\frac{1}{n}\sum_{i=1}^n \xi_i}-\frac{\frac{1}{n}\sum_{i=1}^{n}p_iv_i}{\frac{1}{n}\sum_{i=1}^np_i} \right|\\ &=\left|\frac{\epsilon_n + \frac{1}{n}\sum_{i=1}^np_iv_i}{\delta_n + \frac{1}{n}\sum_{i=1}^n p_i}-\frac{\frac{1}{n}\sum_{i=1}^{n}p_iv_i}{\frac{1}{n}\sum_{i=1}^np_i} \right|\\ &=\frac{|\epsilon_n(\frac{1}{n}\sum_{i=1}^np_i) -\delta_n(\frac{1}{n}\sum_{i=1}^np_iv_i) |}{|(\delta_n + \frac{1}{n}\sum_{i=1}^n p_i)(\frac{1}{n}\sum_{i=1}^np_i)|} \rightarrow 0 \quad \mbox{(with prob 1)} \end{align} where we have used the fact that the denominator is positive and bounded away from zero (with prob 1), the terms $\frac{1}{n}\sum_{i=1}^np_i$ and $\frac{1}{n}\sum_{i=1}^np_iv_i$ are bounded, and $\epsilon_n\rightarrow 0$, $\delta_n\rightarrow 0$ with prob 1. $\Box$