Gradient of loss containing multiple max functions

474 Views Asked by Bumbble Comm At 30 Mar 2026 - 1:16

I have the following loss I want to minimize:

$L=(y-(max(0,w^Tx_1) + max(0,w^Tx_2) + max(0,w^Tx_3)))^2$

Now I want the gradient w.r.t. my weight vector $w$:

if $w^Tx_1>0$ & $w^Tx_2>0$ & $w^Tx_3>0$ $\rightarrow$ $2(y-(w^Tx_1 + w^Tx_2 + w^Tx_3))(-(x_1 + x_2 + x_3))$

if $w^Tx_1<0$ & $w^Tx_2>0$ & $w^Tx_3>0$ $\rightarrow$ $2(y-(w^Tx_2 + w^Tx_3))(-(x_2 + x_3))$

So the terms are removed if $w^Tx_i < 0$. This means that if all terms are $<0$:

$w^Tx_1<0$ & $w^Tx_2<0$ & $w^Tx_3<0$ $\rightarrow$ $2y$

So in that case I get a scaler as gradient, while in the other cases I get a vector. Are my derivatives correct? If so, is it applicable to have a vector of size len(x) of y's as gradient?

Edit: After checking again I think that the derivative in case all $w^Tx_i<0$ wrt $w$ is 0? So the minimum will be to predict always predict negative?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 04 Mar 2017 - 1:35 BEST ANSWER

Note that the gradient wrt $w$ is actually a vector consisting of $\frac{\partial L}{\partial w_i}$, where $w_i$ is the $i$th element of $w$. If all $w^Tx_i<0$, then $\frac{\partial L}{\partial w_i} = 0$ for all $i$, i.e. the gradient is a vector of length $\text{len}(x)$ with each entry equal to $0$.

But this does not mean that the minimum is achieved when all $w^tx_i<0$. This is because the function is non-convex. As an example consider scalars $y = 2, x = 1$. Then $L = (2-\max(0, w))^2$. If you plot this function (see a plot here), you'll notice that there is a flat region with gradient $0$, but the minimum occurs at $w=2$.

Gradient of loss containing multiple max functions

There are 1 best solutions below

Related Questions in DERIVATIVES

Related Questions in OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions