Setup: Given that we have a noisy matrix signal $\breve{B}\in\mathbb{R}^{p\times L}$ of the true signal $B\in\mathbb{R}^{p\times L}$, where the empirical distribution of the rows of $B$ converge to $\bar{B}\in\mathbb{R}^L$ where $\bar{B}\sim\text{Categorical}(\pi)$, $\pi\in\mathbb{R}^L$, and $\sum_{l=1}^L\pi_l=1$ (i.e., $\pi$ is a probability vector). The rows of $B$ are generated by sampling independently from the distribution $\text{Categorical}(\pi)$. So the rows of $B$ are independent one-hot vectors.
The goal is to estimate $B$ from $\breve{B}$. We have additional statistical information about $\breve{B}$, namely: The empirical distribution of the rows of $\breve{B}$ converges to $M\bar{B}+G$, where:
- $M\in\mathbb{R}^{L\times L}$ is a non-random component;
- $G\sim \mathcal{N}(0,\Sigma)$, where $\Sigma\in\mathbb{R}^{L\times L}$, is the random noise component.
Note that we have access to $\pi$, $M$, and $\Sigma$. To be more precise, the convergence of rows means $$ \frac{1}{p}\sum_{j=1}^p\breve{B}_j \rightarrow \mathbb{E}\big[M\bar{B}+G\big], $$ where $\breve{B}_j$ denotes the $j$th row of $\breve{B}$.
Question: What is the best estimator $\widehat{B}_j=f(\breve{B}_j)$? Note that here, estimation is done row wise with $j\in\{1,\dots,p\}$. The best that I can think of is $$ f(\breve{B}_j)=\mathbb{E}\Big[\bar{B}\,\Big|\,M\bar{B}+G=\breve{B}_j\Big]. $$ Can we do better than this since we know that rows of $B$ are one-hot vectors (and we also know the prior distribution of the signal $B$)? Thanks.