Suppose we have vectors $a = \begin{pmatrix} a_1 \\ a_2 \\ a_3\end{pmatrix}$ and $b = \begin{pmatrix} b_1 \\ b_2 \\ b_3\end{pmatrix}$. I would like to find the gradient of $f = \lVert a \times b \rVert_2$ with respect to $b$, that is $\nabla f$ with respect of $b$.
I know I can find the gradient by taking partial derivatives with respect to the inputs, but it's not clear what are the inputs in this case. Is it just $b$, since I'm taking the gradient with respect to $b$? Or maybe it's both $a$ and $b$?
The gradient of $f = \lVert a \times b \rVert_2$ with respect to $b$ is apparently equivalent to $$\frac{(a \times b) \times a}{\lVert a \times b \rVert_2}$$
But why?
If this was a normal derivative of a square root (i.e. the length), the denominator somehow would make sense to me, but I'm a bit lost, mostly because of the doubts that I tell you above.
I also know that cross product is not in general associative.
I have really not much experience with calculus in multiple dimensions, so maybe this is easy, but usually terminology and notation is what causes the confusion to me.
You wanted to have the gradient with respect to $b$, meaning that $a$ is constant. You can write the function as:
$$ f(\vec{b}) = || \vec{a} \times \vec{b} || $$ Meaning the gradient will be $$ \nabla_b f = \left ( \frac{\partial f}{\partial b_1} , \frac{\partial f}{\partial b_2} , \frac{\partial f}{\partial b_3} \right ) $$
@LutzL 's comment is the sophisticated way to calculate the derivative instead of writing the whole expression for $f$ and deriving.