CONTEXT:
I have a non-symmetric, k-nearest-neighbor adjacency matrix of a directed graph, $N\in\mathcal{R}^{n \times n}$, and data $D\in\mathcal{R}^{n\times m}$. To average each of $n$ observations in $D$ with their $k$ nearest neighbors, it is simply: $$\bar{D}=\frac{1}{k} N D$$
My goal is to find the mean and variance of each of $m$ features in $\bar{D}$. However, let's say $D$ is a large, sparse matrix and I do not wish to break the sparsity structure via this averaging due to memory constraints.
Is it possible to find the mean and variance of $\bar{D}$ without explicitly computing $ND$?
The mean is just a matter of order of operations: $$\bar{\mu} = \frac{1}{nk}1^T(ND) = \frac{1}{nk}(1^TN)D$$ This way we do not need to explicitly calculate $ND$.
The variance is super tricky and I haven't figured out how to do it efficiently: $$\bar{\sigma}^2 =\frac{1}{n}\left(\sum_{i=1}^n\bar{D}_i\circ \bar{D}_i\right) - \bar{\mu}^2$$ where $\bar{D}_i$ is the $i$th row of $\bar{D}$ and $\circ$ is the Hadamard product (element-wise multiplication).
$$\bar{\sigma}^2 =\frac{1}{n}\left(\sum_{i=1}^n (\frac{1}{k}N_iD)\circ(\frac{1}{k}N_iD)\right) - \bar{\mu}^2$$ $$\bar{\sigma}^2 =\frac{1}{nk^2}\left(\sum_{i=1}^n (N_iD)\circ(N_iD)\right) - \bar{\mu}^2$$ where $N_i$ is the $i$th row of $N$. To illustrate further, let's take an example for what the $i$th summand might look like: $$(N_iD)\circ(N_iD) = (D_1+D_3+D_7+D_9+\ldots)^2$$ $$ = (D_1 \circ D_1+D_3 \circ D_3+D_7 \circ D_7+D_9 \circ D_9) + (2D_1\circ D_3+2D_1\circ D_7+\ldots)$$ Thus, we can avoid $ND$ by writing the variance as $$\bar{\sigma}^2 =\frac{1}{nk^2}\left(\sum_{i=1}^n N_i(D\circ D)+2\sum_{i=1}^n\sum_{j=i+1}^n a_{ij}D_i\circ D_j\right) - \bar{\mu}^2$$ where $a_{ij}$ is a coefficient encoding the number of times two observations are in the same local neighborhood. $$\bar{\sigma}^2 =\frac{1}{nk^2}(1^T N)(D\circ D)+\frac{2}{k^2}\left(\sum_{i=1}^n\sum_{j=i+1}^n a_{ij}D_i\circ D_j\right) - \bar{\mu}^2$$
$$\boxed{\bar{\sigma}^2 =\frac{1}{nk^2}(1^T N)(D\circ D)+\frac{2}{k^2}\left(\sum_{i=1}^n D_i \circ\left(\sum_{j=i+1}^n a_{ij} D_j \right)\right) - \bar{\mu}^2}$$
QUESTION:
This is where I get stuck! The combinatorics here (the pesky $a_{ij}$) seem to be computationally prohibitive. Two questions:
1) Did I do anything wrong up to now?
2) Is there anything I missed or useful approximations / matrix calculus I can use to allow me to efficiently evaluate $\bar{\sigma}^2$ without explicitly computing $ND$?