I'm working on a classification problem where you have $N$ data point $y_i$ with associated labels $l_i$. Besides the classification, i have to minimize a function $f(y,Q)$ so i like to construct and optimization scheme: $$\min_{W,Q} f(Y,Q)+\lambda g(L,Y,W)$$ in which $g$ is a discriminant function based on $L$and $Y$ and probably a weighting parameter $W$.
I know one easy candidate for $g$ is a linear discriminant objective as $\|L-WX\|_2^2$ which $X$ is a a projection of $Y$. But i need something better than the linear solution.
So i'm looking for non-linear options to add to the optimization objective and it should be preferably differentiable and convex, however i'm open to any sort of suggestion which can work in this case!