Asymptotic posterior of learning

18 Views Asked by At

Suppose you have finite feature vectors $x_1,x_2,\ldots,x_N\in \mathbb{R}^d$. For each vector $x_n$, you can observe noisy labels \begin{equation*} y_{n,j} = f(x_n;W)+\epsilon_{n,j} \in \mathbb{R} \end{equation*} where $f(x_n;W)$ is the ground-truth value of $f$ at $x_n$ with unknown parameters $W\in {R^K}$, and $\epsilon_{n,j}\stackrel{i.i.d.}\sim \mathcal{N}\left(0,\sigma_n^2\right)$. Now you impose a prior distribution on $W$, that is \begin{equation*} W\sim \mathcal{N}\left(\mathbf{0}_{K},\sigma^2_W\mathbf{I}_{K}\right). \end{equation*} The data set is denoted by $\mathcal{D}=\left \{ \left ( x_n,y_{n,j} \right ) \right \}_{j=1,2,\ldots,m_n;n=1,2,\ldots,N}$.

Suppose you can generate independent $\widehat{W}_{k}, k=1,2,\ldots,K$ from $p(W\mid \mathcal{D})$. Given a fixed $x_n$, you can infer $f(x_n)$ with samples $f\left(x_n;\widehat{W}_{k}\right)$ for $k=1,2,\ldots,K$. Prove that When the number of observations $m_n\rightarrow \infty$ at a fixed $x_n$, then $\operatorname{Var}\left ( f\left ( x_n ;\widehat{W}_k \right ) \right ) \rightarrow 0$. When the number of observations $m_n$ at a fixed $x_n$ is fixed, then $\operatorname{Var}\left ( f\left ( x_n ;\widehat{W}_k \right ) \right ) > 0$ even when $m_{n'}\rightarrow\infty$ for all other $x_{n'}$'s. (You can add regularity conditions if necessary.)