The Jacobian of $g(\vec{x}) = f(A\vec{x} + \vec{b})\vec{x}$.

76 Views Asked by At

Let $A = \mathbb{R}^{n \times n}$ and $f: \mathbb{R^{n}} \mapsto \mathbb{R}$

I can compute Jacobians of simple functions, but this question obliterated me, and I have spent days trying to understand it. Within the solution they derive that $[D(\vec{g}(\vec{x}))]_{jk} = f(\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})\frac{\partial \vec{x}_j}{\partial x_k} + \vec{x}_j \frac{\partial f(\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})}{\partial x_k}$

This is fine as it is just chain rule, but where they lose me is when they change to summation:

$f(\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})\frac{\partial \vec{x}_j}{\partial x_k} + \vec{x}_j \sum_{\ell=1}^{n} \frac{\partial f(\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})}{\partial (\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})_{\ell}} \cdot \frac{\partial (\mathbf{A}\vec{\mathbf{x}} + \mathbf{b})_{\ell}}{\partial x_k}$

I've tried coming up with a simple example using the 1-norm of an A $\mathbb{R}^{2 \times 2}$, and the accompanying x and b vectors, but it doesn't help because it is too specific compared to how general this solution is.

If anyone can explain the change to summation, I'd be greatly appreciative.

2

There are 2 best solutions below

1
On

Let the vector $\mathbf{y} = A \mathbf{x} + \mathbf{b}$, then

$ g(\mathbf{x} ) = f(\mathbf{y}) \mathbf{x} $

The Jacobian $J$ is define element by element as having its $ij$-th entry as follows

$ J_{ij} = \dfrac{\partial g_i}{\partial x_j} $

Now $g_i = f(\mathbf{y} ) x_i $. Therefore,

$ J_{ij} = \dfrac{\partial [f(\mathbf{y}) x_i]} {\partial x_j} = x_i \dfrac{ \partial f(\mathbf{y})}{\partial x_j} + f(\mathbf{y}) \delta_{ij} $

Now using the chain rule,

$\dfrac{\partial f(\mathbf{y})} {\partial x_j} = \displaystyle \sum_{k=1}^n \dfrac{\partial f(\mathbf{y})} {\partial y_k} \left( \dfrac{\partial y_k}{\partial x_j} \right)$

Since $\mathbf{y} = A \mathbf{x} + b $, then $\dfrac{\partial y_k}{\partial x_j} = A_{kj} $

And $\dfrac{\partial f(\mathbf{y})}{\partial y_k} $ is the $k$-element of the gradient of $f$. Therefore,

$ J_{ij} = x_i \left( (\nabla f)^T A_{j} \right) + f(\mathbf{y} ) \delta_{ij} $

where $\delta_{ij} =1$ if $i = j$, and $0$ otherwise.

where $A_{j}$ is the $j$-column of $A$.

Therefore,

$ J = f(\mathbf{y}) I_n + \mathbf{x} (\nabla f)^T A $


As an explicit example, suppose $ \mathbf{x} \in \mathbb{R}^2 $, and

$ A = \begin{bmatrix} 1 && -1 \\ 2 && 3 \end{bmatrix} $ and $\mathbf{b} = \begin{bmatrix} 1 && 5 \end{bmatrix} $

Let $\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} $, then

$ \mathbf{y} = \begin{bmatrix} x_1 - x_2 + 1 \\ 2 x_1 + 3 x_2 + 5 \end{bmatrix} $

Suppose $f(\mathbf{y}) = y_1^2 + 2 y_1 y_2 + y_2 $

Then

$ f = (x_1 - x_2 + 1)^2 + 2 (x_1 - x_2 + 1)(2 x_1 + 3 x_2 + 5) + 2 x_1 + 3 x_2 + 5 $

And this is equal to

$ f = 5 x_1^2 -5 x_2^2 + 18 x_1 - 3 x_2 +16 $

So that

$ g = \begin{bmatrix} 5 x_1^3 - 5 x_1 x_2^2 + 18 x_1^2 - 3 x_1 x_2 + 16 x_1 \\ 5 x_2 x_1^2 - 5 x_2^3 + 18 x_1 x_2 - 3 x_2^2 + 16 x_2 \end{bmatrix}$

Therefore, by direction evaluation,

$ J(g) = \begin{bmatrix} 15 x_1^2 - 5 x_2^2 + 36 x_1 - 3 x_2 + 16 && - 10 x_1 x_2 - 3 x_1 \\ 10 x_1 x_2 + 18 x_2 && 5 x_1^2 - 15 x_2^2 + 18 x_1 - 6 x_2 + 16 \end{bmatrix} $

From the formula we have

$ J = f(\mathbf{y}) I_n + \mathbf{x} (\nabla f)^T A $

Now,

$ \nabla f = \begin{bmatrix} 2 y_1 + 2 y_2 \\ 2 y_1 + 1\end{bmatrix} = \begin{bmatrix} 6 x1 +4 x_2 + 12 \\ 2 x_1 - 2 x_2 + 3 \end{bmatrix} $

so that,

$ (\nabla f)^T A = \begin{bmatrix} 10 x_1 + 18 && - 10 x_2 -3 \end{bmatrix} $

And now,

$ \mathbf{x} (\nabla f)^T A = \begin{bmatrix} 10 x_1^2 + 18 x_1 && - 10 x_1 x_2 - 3 x_1 \\ 10 x_1 x_2 + 18 x_2 && -10 x_2^2 - 3 x_2 \end{bmatrix} $

Add $f=5 x_1^2 -5 x_2^2 + 18 x_1 - 3 x_2 +16$ on the diagonal,

$ J = \begin{bmatrix} 15 x_1^2 - 5 x_2^2 + 36 x_1 - 3 x_2 + 16 && - 10 x_1 x_2 - 3 x_1 \\ 10 x_1 x_2 + 18 x_2 && 5 x_1^2 - 15 x_2^2 + 18 x_1 - 6 x_2 + 16 \end{bmatrix} $

0
On

$ \def\LR#1{\left(#1\right)} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $First, define a few variables $$\eqalign{ w &= Ax+b &\qiq dw &= A\,dx \\ f &= f(w) &\qiq \;h &= \grad fw \qiq df= h^Tdw = h^TA\:dx \\ }$$ Then calculate the Jacobian of the composite function $$\eqalign{ g &= x f \\ dg &= f\:dx + x\,df \\ &= \LR{fI + xh^TA} dx \\ \grad{g}{x} &= \LR{fI + xh^TA} \;\:\equiv\; J \\ }$$ where $J$ is the Jacobian and $I$ is the Identity matrix.