Basis Density Estimator and Information Channel Capacity

35 Views Asked by At

Suppose that I would like to estimate the distribution $P$ with pdf $p$ over $[0,1]$ from the data $\{X_1,\cdots, X_T\}$, $X_i \sim_{iid} P$ using the basis estimator: \begin{equation*} \hat{p}_M(x) := \sum_{i=0}^M\hat{\theta}_i\phi_i(x), \quad \text{where} \quad \hat{\theta}_i := \frac{1}{T}\sum_{j=1}^T\phi_i(X_j). \end{equation*} Where $\{\phi_0,\phi_1,\phi_2, \cdots\}$ is a complete orthonormal $L^2[0,1]$ basis. For example, let's take the trigonometric basis for simplicity: $\phi_0(x) = 1, \phi_{2j}(x) = \sqrt{2}\cos(2\pi j x), \phi_{2j+1}(x) = \sqrt{2}\sin(2\pi j x)$, for $j=1,2,3,\cdots$. I can adjust $M$ as a hyperparameter of this estimator. If I set $M = 0$, then it looks like I use none of the data $\{X_1,\cdots,X_T\}$, and if I set $M = \infty$ then $\hat{p}_{M} \rightarrow_{L^2} p$ as $T\rightarrow \infty$.

May I think of $\hat{p}_0(x)$ as my prior knowledge about the random variable $X$ and $\hat{p}_M(x)$ as my knowledge after observing the data $\{X_1,\cdots, X_T\}$ given my limited information process capacity which is controlled by $M$?

If so, is there any relationship between $M$ and the channel capacity in information theory?

i.e. the channel capacity $C$ is the upper bound on the mutual information: \begin{equation*} I(X;\{X_i\}) := -\int_{[0,1]}\hat{p}_0(x)\log \hat{p}_0(x)dx + \mathbb{E}_{\text{data}}\left[\int_{[0,1]}\hat{p}_M(x)\log \hat{p}_M(x)dx\right] \end{equation*} where the expectation $\mathbb{E}_{\text{data}}$ is taken over all the data $\{X_1,\cdots, X_T\}$ distributed with the pdf $\prod_{i=1}^Tp(X_i)$. This should measure how much information is expected to be processed.