Gradient and Hessian of $g(x) = f(Ax + b)$

2.4k Views Asked by At

Given scalar field $f : \Bbb R^m \to \Bbb R$, matrix $A \in \Bbb R^{m \times n}$ and vector $b \in \Bbb R^m$, find the gradient and the Hessian of the scalar field $g : \Bbb R^n \to \Bbb R$ defined by $g(x) := f(Ax + b)$.


I cannot find the expression for the derivative: $g'(x) = f'(Ax + b)*(Ax + b)'$

I believe the derivative $f'(Ax + b)$ is simply A*partial derivatives. But I do dot know how to proceed with the other terms. I know the expressions for gradient and Hessian, but I did never see it in matrix form.

2

There are 2 best solutions below

0
On BEST ANSWER

First, observe that if we may write $g(x+\Delta x)=g(x)+[h(x)]^T(\Delta x)+o(\Delta x)$, where $o(\Delta x)$ satisfies $\lim_{\Delta x\to 0}\frac{o(\Delta x)}{\|\Delta x\|}=0$, then $\nabla g(x)=h(x)$. Well using differentiability of $f$, \begin{align*} g(x+\Delta x) &= f(Ax+b+A\Delta x) \\ &= f(Ax+b) + [\nabla f(Ax+b)]^T(A\Delta x)+o(A\Delta x) \\ &= g(x)+[A^T\nabla f(Ax+b)]^T (\Delta x)+o(A\Delta x), \end{align*} where $o(A\Delta x)$ satisfies $\lim_{A\Delta x\to 0}\frac{o(A\Delta x)}{\|A\Delta x\|}=0.$ Then $\lim_{\Delta x\to 0}\frac{o(A\Delta x)}{\|\Delta x\|}=0$. Hence $\nabla g(x)=A^T\nabla f(Ax+b)$.

For the second derivative, use the fact that $f$ satisfies $$f(x+\Delta x)=f(x)+\nabla f(x)^T(\Delta x) + \frac{1}{2}(\Delta x)^T\nabla^2 f(Ax+b)(\Delta x) + o[(\|\Delta x\|)^2],$$ where $o[(\|\Delta x\|)^2]$ means $\lim_{\Delta x\to 0} \frac{o[(\|\Delta x\|)^2]}{\|\Delta x\|^2}=0$. Well, we have \begin{align*} g(x+\Delta x) &= f(Ax+b+A\Delta x) \\ &= f(Ax+b)+[\nabla f(Ax+b)]^T \cdot (A\Delta x) \\ &\quad\quad+ \frac{1}{2}(A\Delta x)^T\nabla^2 f(Ax+b)(A\Delta x)+o[(\|A\Delta x\|)^2] \\ &= g(x)+[A^T\nabla f(Ax+b)]^T(\Delta x)\\ &\quad\quad+\frac{1}{2}(\Delta x)^T\left[A^T\nabla^2 f(Ax+b)A\right](\Delta x) + o[(\|A\Delta x\|)^2] \\ &= g(x)+ [\nabla g(x)]^T(\Delta x)+ \frac{1}{2}(\Delta x)^T\left[A^T\nabla^2 f(Ax+b)A\right](\Delta x) + o[(\|A\Delta x\|)^2]. \end{align*} Now, assuming $\|A\|\ne 0$, $$\lim_{\Delta x\to 0}\frac{o[\|A\Delta x\|^2]}{\|\Delta x\|^2}=\lim_{\Delta x\to 0}\frac{o[(\|A\Delta x\|)^2]}{\|A\Delta x\|^2}=0.$$ By the uniqueness of Taylor expansions, we have $\nabla^2 g(x) = A^T\nabla^2 f(Ax+b)A$.

0
On

Gradient

Since $g$ takes an input of $\mathbf{x} \in \Bbb{R}^n$, $\mathbf{x} = (x_1,\dots,x_n)$ $$g: \Bbb{R}^n \rightarrow \Bbb{R} \\ g(\mathbf{x}) = g(x_1,...,x_n)$$ And the derivative of $g$ in this case is usually called $grad(g)$, and can be calculated through partial derivatives: $$grad(g(\mathbf{x})): \Bbb{R}^n \rightarrow \Bbb{R}, \\ grad(g(\mathbf{x})) = \left({\frac {\partial g(\mathbf{x})}{\partial x_{1}}},\dots ,{\frac {\partial g(\mathbf{x})}{\partial x_{n}}}\right)$$ So $$grad(g(\mathbf{x})) = grad(f(A\mathbf{x}+b)) = \\ = \left({\frac {\partial f(A\mathbf{x}+b)}{\partial x_{1}}},\dots ,{\frac {\partial f(A\mathbf{x}+b)}{\partial x_{n}}}\right) = \bigstar$$ I will write up one of these terms: $${\frac {\partial f(A\mathbf{x}+b)}{\partial x_{i}}} \stackrel{(*)}{=} \left(\frac{\partial f}{\partial x_i}\right)(A \mathbf{x} + b) \cdot \frac{\partial (A \mathbf{x} + b)}{\partial x_1} \stackrel{(**)}{=} \\ \stackrel{(**)}{=} \left(\frac{\partial f}{\partial x_i}\right)(A \mathbf{x} + b) \cdot \begin{bmatrix} A_{1i} \\ A_{2i} \\ \vdots \\ A_{mi} \\ \end{bmatrix}$$

(Where the dot icon ($\cdot$) means multiply by terms, then add up.)

(*) This makes sense, since $(A\mathbf{x} + b)$ is a forumla containing $x_1, ..., x_m$, and you simply plug them into the $i$th partial derivative of $f$.

(**) You can check that this is true, just take a simple matrix, like $A = \begin{bmatrix} 2 & 1 \\ 1 & 3 \\ \end{bmatrix}$, and any $b$ vector, like $b= \begin{bmatrix} 1 \\ 2 \\ \end{bmatrix}$, and see that $f(A\mathbf{x} + b) = f(2x_1+x_2+1,x_1+3x_2+2)$, and similarly for example $\frac{\partial f}{\partial x_1}(A\mathbf{x} + b) = \frac{\partial f}{\partial x_1}(2x_1+x_2+1,x_1+3x_2+2)$.

$$\bigstar = \\ = \left(\left(\frac{\partial f}{\partial x_1}\right)(A \mathbf{x} + b) \cdot \begin{bmatrix} A_{11} \\ A_{21} \\ \vdots \\ A_{m1} \\ \end{bmatrix}, \dots, \left(\frac{\partial f}{\partial x_n}\right)(A \mathbf{x} + b) \cdot \begin{bmatrix} A_{1n} \\ A_{2n} \\ \vdots \\ A_{mn} \\ \end{bmatrix} \right) = \\ = \left( \frac{\partial f}{\partial x_1}(A \mathbf{x} + b),\dots,\frac{\partial f}{\partial x_n}(A \mathbf{x} + b)\right) \cdot A$$

Hessian matrix

The Hessian matrix is the matrix of second derivatives, in general, if $f : \Bbb{R}^n \rightarrow \Bbb{R}$, then:

Hessian matrix

You need to differentiate the ($\bigstar$) vector again, now n more times for each term. With what I've shown you, this shouldn't be too difficult.