How to compute gradient of a custom loss function in neural networks and make sure it is diffentiable?

279 Views Asked by Bumbble Comm At 10 May 2026 - 8:39

$ \newcommand{\wv}{\mathbf{w}} \newcommand{\xv}{\mathbf{x}} \newcommand{\loss}{L(\wv;\xv)} $

Let us consider we are using a neural networks to predict 4 points $P_i(x_i,y_i)$ which $i=\{1, 2, 3, 4\}$. Now we want to enforce the predicted points to create a rectangle with right angle between adjacent vertices and if it is not we penalize network's prediction using loss function. To make it more clear, I have put more details below:

Let's consider angle made by $P_1$, $P_2$ and $P_3$. The vertices are $P_{12}$ connecting $P_1$ to $P_2$ and similarly $P_{13}$ and $P_{23}$. Using the Law of Cosines it should work:

$ \theta_1 = \arccos((P_{12}^2 + P_{13}^2 - P_{23}^2) / (2 * P_{12} * P_{13})) $

where $P_12$ is the length of the segment from $P_1$ to $P_2$, calculated by

$P_{12} = \sqrt{(P_{2x} - P_{1x})^2 + (P_{2y} - P_{1y})^2}$

In order to do that, we want to minimize a loss function $\loss = pred(\xv, \wv)=\wv^T\xv$, which depends on $\xv$ and $\wv$. As having 3 right angle out of 4 is enough to make sure polygon is an rectangle, we consider only three angles.

$ \loss= \sum_{1}^{3}(|cos(\theta_i)|) $

I'm trying to clear up the calculation of the gradient of a loss function numerically and not analytically, and to find out whether it is differentiable.

Now, in order to minimize the loss, using for example a first order method such as stochastic gradient descent, we need to find the gradient of the loss function with respect to $\wv$. So we have:

$$ \loss = f(pred(\xv, \wv)) $$

$$ \nabla_w \loss = \nabla_w f\,(pred(\xv, \wv)) \\= \frac{\partial f\,(pred(\xv, \wv))}{\partial w} \cdot \nabla_w pred(\xv, \wv) \tag{lossGradient}\label{lossGradient} $$

This is where I get a bit lost with the chain rule so correct me if I'm wrong: I assume the gradient of loss is also sum of 3 gradients for each angle. For simplicty, I consider one angle.

$$ \nabla_w \loss_1 = ((2P_{12}\partial P_{12} + 2P_{13}\partial P_{13}^2 - 2P_{23}\partial P_{23}^2) * (2P_{12}P_{13}) - (2(\partial P_{12} * P_{13} + \partial P_{13} * P_{12}) * (P_{12}^2 + P_{13}^2 - P_{23}^2) )/ (2 * P_{12} * P_{13})^2 $$ Now if we consider e.g. $\partial P_{12}$, we should have: $$ \partial P_{12} = 0.5(P_{2x} - P_{1x})^2 + (P_{2y} - P_{1y})^2)(2(\partial P_{2x}/ \partial w - \partial P_{1x}/ \partial w )(P_{2x} - P_{1x}) + 2(\partial P_{2y}/ \partial w - \partial P_{1y}/ \partial w )(P_{2y} - P_{1y}))^{-0.5} $$ Is this correct and is this loss function differentiable without loss value explosion? How about when angle is 270 degrees? How can I penalize that value as well because we only need 90 degree angles?

Original Q&A

How to compute gradient of a custom loss function in neural networks and make sure it is diffentiable?

Related Questions in DERIVATIVES

Related Questions in OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions