I'm not good at English, so I apologize in advance.
I have a question in "Pattern Recognition and Machine Learning", and it is a formula which is described in ch3.3.3 and (3.63)
I dont understand how we can transform (1) to (2), could anyone teach me ??
$ cov[y(x), y(x')] = cov[φ(x)^Tw,w^Tφ(x')] = φ(x)^TS_Nφ(x') = β^{-1}k(x,x')$
Let the prior $Q_0$ over the weights be given by the density function $$ P(w|\alpha)=\mathcal{N}(0,\alpha^{-1}I) $$ so that the posterior over the weights $Q_w$ given the training data targets $\vec{t}$ can be written (as a density) $$ P(w|\vec{t}) = \mathcal{N}(w|m_N,S_N)\\ m_N = \beta S_N\Phi^T\vec{t}\\ S_N^{-1}= \text{cov}(w)^{-1}=\alpha I + \beta\Phi^T\Phi $$ The predictive distribution $Q_p(x)$ (for input $x$) then has density computed via $$ P(t|x,\vec{t},\alpha,\beta)=\int P(t|x,w,\beta) P(w|\vec{t},\alpha,\beta) dw=\mathcal{N}(t|m_N^T\phi(x),\beta^{-1}+\phi(x)^TS_N\phi(x)) $$ Thus the model predictions are distributed via $$y(x) = w^T\phi(x) \sim Q_p(x) \tag{0} $$ so the predictive mean is simply $$ y(x,m_N) = \mathbb{E}[y(x)] = m_N^T\phi(x) \tag{1} $$ Notice that the covariance of the weights under the posterior is given by $$ \text{cov}(w) = S_N \tag{2} $$ And that the following formula holds (for covariances of vector-valued random variables in general) $$ \text{cov}(w) + \mathbb{E}[w] \mathbb{E}[w]^T = \mathbb{E}[ww^T] \tag{3} $$ One more note (eq. 3.62 in the book): $$ y(x,m_N)=\beta S_N\Phi^T \vec{t}=\sum_iK(x,x_i)t_i$$ $$\therefore \beta^{-1}K(x,x_i) = \phi(x)^TS_N\phi(x_i)\tag{4} $$ Ok, so what is the covariance between our scalar predictions for two inputs? $$ \text{cov}[y(x_1), y(x_2)] = \text{cov}[w^T\phi(x_1), w^T\phi(x_2)] $$ using (0). Then using the identity for covariance between scalars $$\text{cov}[s_1,s_2]=\mathbb{E}[s_1s_2] - \mathbb{E}[s_1]\mathbb{E}[s_2]$$ we get \begin{align} \text{cov}[w^T\phi(x_1), w^T\phi(x_2)] &= \mathbb{E}[\phi(x_1)^T w w^T\phi(x_2)] - \phi(x_1)^Tm_Nm_N^T\phi(x_2) \\ &= \phi(x_1)^T\mathbb{E}[ w w^T]\phi(x_2) - \phi(x_1)^Tm_Nm_N^T\phi(x_2) \\ &= \phi(x_1)^T\left[ \text{cov}(w) + \mathbb{E}[w] \mathbb{E}[w]^T\right]\phi(x_2) - \phi(x_1)^Tm_Nm_N^T\phi(x_2) \\ &= \phi(x_1)^T S_N \phi(x_2) + \phi(x_1)^T m_Nm_N^T \phi(x_2) - \phi(x_1)^Tm_Nm_N^T\phi(x_2)\\ &= \phi(x_1)^T S_N \phi(x_2) \\ &= \beta^{-1}K(x_1,x_2) \end{align} using (1) for line 1, linearity for line 2, (3) for line 3, (2) for line 4 and (4) for the last line.