Let $A = \mathbb{R}^{n \times n}$ and $f: \mathbb{R^{n}} \mapsto \mathbb{R}$
I can compute Jacobians of simple functions, but this question obliterated me, and I have spent days trying to understand it. Within the solution they derive that $[D(\vec{g}(\vec{x}))]_{jk} = f(\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})\frac{\partial \vec{x}_j}{\partial x_k} + \vec{x}_j \frac{\partial f(\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})}{\partial x_k}$
This is fine as it is just chain rule, but where they lose me is when they change to summation:
$f(\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})\frac{\partial \vec{x}_j}{\partial x_k} + \vec{x}_j \sum_{\ell=1}^{n} \frac{\partial f(\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})}{\partial (\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})_{\ell}} \cdot \frac{\partial (\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})_{\ell}}{\partial x_k}$
I've tried coming up with a simple example using the 1-norm of an A $\mathbb{R}^{2 \times 2}$, and the accompanying x and b vectors, but it doesn't help because it is too specific compared to how general this solution is.
If anyone can explain the change to summation, I'd be greatly appreciative.
Let the vector $\mathbf{y} = A \mathbf{x} + \mathbf{b}$, then
$ g(\mathbf{x} ) = f(\mathbf{y}) \mathbf{x} $
The Jacobian $J$ is define element by element as having its $ij$-th entry as follows
$ J_{ij} = \dfrac{\partial g_i}{\partial x_j} $
Now $g_i = f(\mathbf{y} ) x_i $. Therefore,
$ J_{ij} = \dfrac{\partial [f(\mathbf{y}) x_i]} {\partial x_j} = x_i \dfrac{ \partial f(\mathbf{y})}{\partial x_j} + f(\mathbf{y}) \delta_{ij} $
Now using the chain rule,
$\dfrac{\partial f(\mathbf{y})} {\partial x_j} = \displaystyle \sum_{k=1}^n \dfrac{\partial f(\mathbf{y})} {\partial y_k} \left( \dfrac{\partial y_k}{\partial x_j} \right)$
Since $\mathbf{y} = A \mathbf{x} + b $, then $\dfrac{\partial y_k}{\partial x_j} = A_{kj} $
And $\dfrac{\partial f(\mathbf{y})}{\partial y_k} $ is the $k$-element of the gradient of $f$. Therefore,
$ J_{ij} = x_i \left( (\nabla f)^T A_{j} \right) + f(\mathbf{y} ) \delta_{ij} $
where $\delta_{ij} =1$ if $i = j$, and $0$ otherwise.
where $A_{j}$ is the $j$-column of $A$.
Therefore,
$ J = f(\mathbf{y}) I_n + \mathbf{x} (\nabla f)^T A $
As an explicit example, suppose $ \mathbf{x} \in \mathbb{R}^2 $, and
$ A = \begin{bmatrix} 1 && -1 \\ 2 && 3 \end{bmatrix} $ and $\mathbf{b} = \begin{bmatrix} 1 && 5 \end{bmatrix} $
Let $\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} $, then
$ \mathbf{y} = \begin{bmatrix} x_1 - x_2 + 1 \\ 2 x_1 + 3 x_2 + 5 \end{bmatrix} $
Suppose $f(\mathbf{y}) = y_1^2 + 2 y_1 y_2 + y_2 $
Then
$ f = (x_1 - x_2 + 1)^2 + 2 (x_1 - x_2 + 1)(2 x_1 + 3 x_2 + 5) + 2 x_1 + 3 x_2 + 5 $
And this is equal to
$ f = 5 x_1^2 -5 x_2^2 + 18 x_1 - 3 x_2 +16 $
So that
$ g = \begin{bmatrix} 5 x_1^3 - 5 x_1 x_2^2 + 18 x_1^2 - 3 x_1 x_2 + 16 x_1 \\ 5 x_2 x_1^2 - 5 x_2^3 + 18 x_1 x_2 - 3 x_2^2 + 16 x_2 \end{bmatrix}$
Therefore, by direction evaluation,
$ J(g) = \begin{bmatrix} 15 x_1^2 - 5 x_2^2 + 36 x_1 - 3 x_2 + 16 && - 10 x_1 x_2 - 3 x_1 \\ 10 x_1 x_2 + 18 x_2 && 5 x_1^2 - 15 x_2^2 + 18 x_1 - 6 x_2 + 16 \end{bmatrix} $
From the formula we have
$ J = f(\mathbf{y}) I_n + \mathbf{x} (\nabla f)^T A $
Now,
$ \nabla f = \begin{bmatrix} 2 y_1 + 2 y_2 \\ 2 y_1 + 1\end{bmatrix} = \begin{bmatrix} 6 x1 +4 x_2 + 12 \\ 2 x_1 - 2 x_2 + 3 \end{bmatrix} $
so that,
$ (\nabla f)^T A = \begin{bmatrix} 10 x_1 + 18 && - 10 x_2 -3 \end{bmatrix} $
And now,
$ \mathbf{x} (\nabla f)^T A = \begin{bmatrix} 10 x_1^2 + 18 x_1 && - 10 x_1 x_2 - 3 x_1 \\ 10 x_1 x_2 + 18 x_2 && -10 x_2^2 - 3 x_2 \end{bmatrix} $
Add $f=5 x_1^2 -5 x_2^2 + 18 x_1 - 3 x_2 +16$ on the diagonal,
$ J = \begin{bmatrix} 15 x_1^2 - 5 x_2^2 + 36 x_1 - 3 x_2 + 16 && - 10 x_1 x_2 - 3 x_1 \\ 10 x_1 x_2 + 18 x_2 && 5 x_1^2 - 15 x_2^2 + 18 x_1 - 6 x_2 + 16 \end{bmatrix} $