Let $\mathcal{H}$ be a Hilbert space and $\Phi\colon\mathbb{R}^d\to\mathcal{H}$ some function with $K$ being the RKHS kernel. Let $B_i>0$ for all $i=1,2,\ldots ,n$ and we are interested in solving the following $2$-norm optimization problem (classes are $1$ or $-1$) $$\min_{w\in\mathcal{H},\ a\in\mathbb{R}}\left\{\frac12\langle w,w\rangle+\sum_{i=1}^n\frac{B_i}2\left(1-y_i(\langle w,\Phi(x_i)\rangle+a)\right)^2\right\}.$$ This is a standard $2$-norm soft margin optimization problem in statistical learning theory. My question has to do with the way Lagrangian is used.
I introduce slack variables $\xi_i$, transform the initial problem into $$\min_{h\in\mathcal{H}}\left\{\frac12\langle w,w\rangle+\sum_{i=1}^n\frac{B_i}2\xi_i^2\right\},$$ with $1-\xi_i-y_i(\langle w,\Phi(x_i)\rangle+a)\leq 0, \ \ \ i=1,2,\ldots ,n.$ Lagrangian is $$L(w,a,\xi,\alpha)=\frac12\langle w,w\rangle +\sum_{i=1}^n\frac{B_i}2\xi_i^2+\sum_{j=1}^n\alpha_j(1-\xi_j-y_j(\langle w,\Phi(x_j)\rangle+a)).$$ In order to find the extrema, I want to differentiate in respect to $w$. However, in that case we are considering the lagrangian as a functional on the Hilbert space $H$ and questions arise about differentiability and even the definition of the derivative. This problem is avoided by using the representation theorem, which states that $$w(\cdot)=\sum_{j=1}^nc_jK(x_j,\cdot),$$ for some $c_j\in\mathbb{R}$. That way we can vary the constants $c_j$ and differentiate in respect to $c_j$-s. Can we simply not use representation theorem and differentiate in respect to $w$? Can the method of Lagrange multipliers be extended onto Hilbert spaces?