Gradient of constrained sum

157 Views Asked by At

I want to minimise a real-valued cost function for a single-layer perceptron which has a parameter vector $w \in \mathbb{R}^N$.

The cost function is $$ c(w) = \sum_{\{p: \, y_p \cdot \langle w, x_p \rangle \leq 0\}} y_p \cdot \langle w, x_p \rangle $$ My goal is to find $w \neq 0$ such that minimises $c(w)$ for a fixed set of possible input tuples $\{(x_p,y_p)\}_{p=1}^M$, were $x_p \in \mathbb{R}^N$, $y_p \in \mathbb{R}$.

If I want to find its minimums, I have to compute it's gradient with respect to $w$, but how can I compute it if $w$ appears in the sum's constraint!?

However, since for any given $w$ the set of indexes $p$ for the sum is fixed, I think I can ignore that constraint to compute the gradient and just write: $$\nabla_w c(w) = \sum_{\{p: \, y_p \cdot \langle w, x_p \rangle \leq 0\}} y_p \cdot x_p $$ and find the minimum using gradient descent.

I'd like to get some feedback, if this idea works in my problem and/or in general , and maybe some materials (books or links are appreciated!) I can check about this.

This problem is related to backpropagation and Rosenblat's Delta Rule, which I should be able to derive from the solution.

1

There are 1 best solutions below

0
On BEST ANSWER

(Not an answer, only a long comment). I think that your argument is not correct. Namely, you can write your function as $$ c(w) = \sum_{p=1}^M \psi_p(w), \qquad \psi_p(w) := \min\{0, y_p \langle w, x_p\rangle\}. $$ At points where $y_p \langle w, x_p\rangle\neq 0$ for every $p$, every $\psi_p$ is differentiable and the gradient of $c$ can be computed by your formula.

On the other hand, if $y_p \langle w, x_p\rangle = 0$ for some $p$, then the corresponding function $\psi_p$ is not differentiable. Anyway, you can compute the generalized gradient $\partial \psi_p(w)$ and try to work using it.