$ \newcommand{\wv}{\mathbf{w}} \newcommand{\xv}{\mathbf{x}} \newcommand{\loss}{L(\wv;\xv)} $
Let us consider we are using a neural networks to predict 4 points $P_i(x_i,y_i)$ which $i=\{1, 2, 3, 4\}$. Now we want to enforce the predicted points to create a rectangle with right angle between adjacent vertices and if it is not we penalize network's prediction using loss function. To make it more clear, I have put more details below:
Let's consider angle made by $P_1$, $P_2$ and $P_3$. The vertices are $P_{12}$ connecting $P_1$ to $P_2$ and similarly $P_{13}$ and $P_{23}$. Using the Law of Cosines it should work:
$ \theta_1 = \arccos((P_{12}^2 + P_{13}^2 - P_{23}^2) / (2 * P_{12} * P_{13})) $
where $P_12$ is the length of the segment from $P_1$ to $P_2$, calculated by
$P_{12} = \sqrt{(P_{2x} - P_{1x})^2 + (P_{2y} - P_{1y})^2}$
In order to do that, we want to minimize a loss function $\loss = pred(\xv, \wv)=\wv^T\xv$, which depends on $\xv$ and $\wv$. As having 3 right angle out of 4 is enough to make sure polygon is an rectangle, we consider only three angles.
$ \loss= \sum_{1}^{3}(|cos(\theta_i)|) $
I'm trying to clear up the calculation of the gradient of a loss function numerically and not analytically, and to find out whether it is differentiable.
Now, in order to minimize the loss, using for example a first order method such as stochastic gradient descent, we need to find the gradient of the loss function with respect to $\wv$. So we have:
$$ \loss = f(pred(\xv, \wv)) $$
$$ \nabla_w \loss = \nabla_w f\,(pred(\xv, \wv)) \\= \frac{\partial f\,(pred(\xv, \wv))}{\partial w} \cdot \nabla_w pred(\xv, \wv) \tag{lossGradient}\label{lossGradient} $$
This is where I get a bit lost with the chain rule so correct me if I'm wrong: I assume the gradient of loss is also sum of 3 gradients for each angle. For simplicty, I consider one angle.
$$ \nabla_w \loss_1 = ((2P_{12}\partial P_{12} + 2P_{13}\partial P_{13}^2 - 2P_{23}\partial P_{23}^2) * (2P_{12}P_{13}) - (2(\partial P_{12} * P_{13} + \partial P_{13} * P_{12}) * (P_{12}^2 + P_{13}^2 - P_{23}^2) )/ (2 * P_{12} * P_{13})^2 $$ Now if we consider e.g. $\partial P_{12}$, we should have: $$ \partial P_{12} = 0.5(P_{2x} - P_{1x})^2 + (P_{2y} - P_{1y})^2)(2(\partial P_{2x}/ \partial w - \partial P_{1x}/ \partial w )(P_{2x} - P_{1x}) + 2(\partial P_{2y}/ \partial w - \partial P_{1y}/ \partial w )(P_{2y} - P_{1y}))^{-0.5} $$ Is this correct and is this loss function differentiable without loss value explosion? How about when angle is 270 degrees? How can I penalize that value as well because we only need 90 degree angles?