I am trying to deduce the covariance matrix when i use kernel trick to the Bayesian Linear Regression: we have a data matrix: $X$ with a shape $(N×p)$ and a $y$ matrix with a shape $(N×1)$, and the model is $y = w^Tx$. In Bayesian Linear Regression we already know that:
- the conditional distribution of y in any single simple (x,y): $y|w,x \sim \mathcal{N}(w^Tx,\sigma^2)$
- the prior distribution of $w \sim \mathcal{N}(0,\Sigma_p)$ , here we assume the mean is 0 is to simplify the computation.
- the conditional distribution of a new sample $y^*|X,y,x^* \sim \mathcal{N}({x^*}^T\mu_w, {x^*}^T\Sigma_w x^*)$ here we assume that there is no noisy. $\Sigma_w = (\sigma^{-2}X^TXw+\Sigma_p^{-1})$ and $\mu_w=\sigma^2\Sigma_wX^Ty$ the $\Sigma_w$ and $\mu_w$ are computed in the inference of Bayesian Linear Regression
Now we apply the kernel trick to the Bayesian Linear Regression: the conditional distribution of a new sample $y^*|X,Y,x^*$ becomes $\mathcal{N}(\phi(x^*)^{T}\sigma^{-2}A^{-1}\Phi^TY,\phi(x^*)^{T}A^{-1}\phi(x^*))$, where $\Phi=(\phi(x_1), \phi(x_2), \cdots\phi(x_n))^T$. To compute the mean of this distribution, i have done: \begin{align} A&=\sigma^{-2}\Phi^T\Phi+\Sigma_p^{-1}\nonumber\\ \Leftrightarrow A\Sigma_p&=\sigma^{-2}\Phi^T\Phi\Sigma_p+\mathbb{I}\nonumber\\ \Leftrightarrow A\Sigma_p\Phi^T&=\sigma^{-2}\Phi^T\Phi\Sigma_p\Phi^T+\Phi^T=\sigma^{-2}\Phi^T(k+\sigma^2\mathbb{I})\nonumber\\ \Leftrightarrow \Sigma_p\Phi^T&=\sigma^{-2}A^{-1}\Phi^T(k+\sigma^2\mathbb{I})\nonumber\\ \Leftrightarrow \sigma^{-2}A^{-1}\Phi^T&=\Sigma_p\Phi^T(k+\sigma^2\mathbb{I})^{-1}\nonumber\\ \Leftrightarrow \phi(x^*)^T\sigma^{-2}A^{-1}\Phi^T&=\phi(x^*)^T\Sigma_p\Phi^T(k+\sigma^2\mathbb{I})^{-1} \end{align} but i have a trouble to compute the covariance $\phi(x^*)^{T}A^{-1}\phi(x^*)$, the answer in my document is$\phi(x^*)^T\Sigma_p\phi(x^*)-\phi(x^*)^T\Sigma_p\Phi^T(\sigma^2\mathbb{I}+k)^{-1}\Phi\Sigma_p\phi(x^*)$
but i have no idea how to simply to get the result like this, what i have after simplying is : $\begin{align}\phi(x^*)^T\sigma^{-2}A^{-1}\Phi^T&=\phi(x^*)^T\Sigma_p\Phi^T(k+\sigma^2\mathbb{I})^{-1}\\ \Leftrightarrow \phi(x^*)^{T}A^{-1}\phi(x^*)&=\sigma^2\phi(x^*)^T\Sigma_p\Phi^T(\sigma^2\mathbb{I}+k)^{-1}\Phi\Sigma_p\phi(x^*)(\Phi^T)^{-1}\phi(x^*)\end{align}$.
How can i continue to reach the answer showed in my document? The key seems to transform $(\Phi^T)^{-1}$ cause there is no such term in real answer, but i have no idea how to do.