I'm trying to show for general regularized empirical risk minimization problem that the minimizing $w$ for $$ \frac{1}{n}\sum_{i=1}^n \textrm{loss}(w^T y_i,x_i) + \mu \lVert{w\rVert}^2, $$ where the loss function is convex in the first argument and $w$ and each $y_i$ is a vector in $R^d$,
has the form $w_* = \alpha^T y$ for some $n$-dimensional vector $\alpha$. I've been able to show this is true for the case that we have the least-squares problem but I don't know how one can go about a generalization. Also will $w_*$ have the form $\alpha^T y$ if the loss function is not convex?