Expression for kernel ridge regression

154 Views Asked by Bumbble Comm At 26 Mar 2026 - 3:09

In standard ridge regression, we find a $\beta$ that minimises $$\|Y-X\beta\|_2^2 + \lambda\|\beta\|_2^2,$$ and the solution is $\beta = XX^T(XX^T+\lambda I)^{-1}Y$. Swapping $XX^T$ for a kernel matrix $K$ of our choice, we obtain a new solution $\beta_K:=K(K+\lambda I)^{-1}Y$ which is the solution of some other minimisation problem. I am told that this other minimisation problem is $$\min_{f\in H}\ \sum_{i=1}^n (Y_i - f(x_i))^2 + \lambda\|f\|_{H}^2\tag{$\dagger$}$$ over the corresponding RKHS $H$. Thus we should have that the map $$\{x\longmapsto x^T\beta_K = x^TK(K+\lambda I)^{-1}Y\}$$ is in $H$. However, I don't see why this is true from the construction of $H$, given in our notes as (completion of) the inner product space $$H = \left\{\sum_{i=1}^n\alpha_ik(\cdot, x_i)\mid \alpha_i\in \mathbb{R},n\in \mathbb{N}, x_i\in \mathcal{X}\right\}$$ where $\mathcal{X}$ is the feature space and $k$ the kernel. I feel like this should be an obvious point, so does anyone see what I'm missing here?

EDIT: From the representer theorem, I can see that $K(K+\lambda I)^{-1}Y=K\alpha$ for some $\alpha$, which I guess shows that the predicted values $Y_i^{\text{pred}}$ can be written as $(K\alpha)_i$, and hence $Y_i^{\text{pred}}=f(x_i)$, where $f(\cdot) = \sum_{i}\alpha_i k(\cdot,x_i)\in H$. However, I feel like the whole premise of the representer theorem is that $(\dagger)$ does indeed represent kernel ridge regression - or am I wrong, and the representer theorem should be thought of as proving this fact?

Original Q&A

Expression for kernel ridge regression

Related Questions in MACHINE-LEARNING

Related Questions in REGRESSION

Related Questions in REPRODUCING-KERNEL-HILBERT-SPACES

Trending Questions

Popular # Hahtags

Popular Questions