How to find the gradient vector and hessian of a summation?

976 Views Asked by At

I am trying to find the gradient vector and Hessian of the following summation:

$$ f(x) = \sum_{i=1}^n [(x_i +b_i)^4 + e^{2x_i}] $$

Trying to find the gradient vector:

$\nabla f = (\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n})$, so we need to work out $\frac{\partial f}{\partial x_i}$

$\frac{\partial f}{\partial x_i} = \sum_{i=1}^n [4(x_i +b_i)^3 + 2e^{2x_i}]$

= $ 4\sum_{i=1}^n (x_i +b_i)^3 + 2\sum_{i=1}^ne^{2x_i}$

From here, I don't really know how to continue finding the gradient vector.

Thanks a lot.

3

There are 3 best solutions below

0
On BEST ANSWER

It might be clearer to calculate the gradient vector if you wrote out the sum explicitly: $$ f(x) = \sum_{i=1}^{n}[(x_i+b_i)^4+e^{2x_i}] $$ $$= (x_1+b_1)^4+e^{2x_1}+(x_2+b_2)^4+e^{2x_2}+\ldots+(x_j+b_j)^4+e^{2x_j}+\ldots+(x_n+b_n)^4+e^{2x_n} $$ for some $1\leq j\leq n$. Computing $\frac{\partial f}{\partial x_j}$, we arrive at: $$\frac{\partial f}{\partial x_j} = 4(x_j+b_j)^3+2e^{2x_j}$$ Since the gradient is the vector of partial derivatives, we have: $$\nabla f = \left(4(x_1+b_1)^3+2e^{2x_1},\ldots, 4(x_j+b_j)^3+2e^{2x_j},\ldots, 4(x_n+b_n)^3+2e^{2x_n} \right)$$ The Hessian will be an $n\times n$ matrix, where the $ij$th entry is found according to $H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j} = \frac{\partial}{\partial x_i}\left(\frac{\partial f}{\partial x_j}\right)$. Notice that for $i\neq j $, we have $\frac{\partial}{\partial x_i}\left(\frac{\partial f}{\partial x_j}\right) = 0$, since in the $j$th component of the gradient, there is no dependency on $x_{i}$. For $i=j $, we have that $$ \frac{\partial^2 f}{\partial x_j^2} = \frac{\partial}{\partial x_j}\left(4(x_j+b_j)^3+2e^{2x_j}\right) = 12(x_j+b_j)^2+4e^{2x_j}$$ Then you end up with: $$\begin{pmatrix} 12(x_1+b_1)^2+4e^{2x_1} & 0& 0 & \cdots & 0\\ 0 & 12(x_2+b_2)^2+4e^{2x_2} & 0 & \cdots & 0 \\ \vdots & \vdots& \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 12(x_n+b_n)^2+4e^{2x_n} \end{pmatrix}$$


0
On

One has $$ \frac{\partial f}{\partial x_i} = \sum_{j=1}^n \frac{\partial}{\partial x_i}\left[(x_j+b_j)^4 + e^{2x_j}\right]. $$ Now, note that if $i\neq j$ then $$ \frac{\partial}{\partial x_i}\left[(x_j+b_j)^4 + e^{2x_j}\right] = 0. $$ If $i=j$, the we simply compute the derivative; $$ \frac{\partial}{\partial x_i}\left[(x_i+b_i)^4 + e^{2x_i}\right] = 4(x_i+b_i)^3 + 2e^{2x_i}. $$ We conclude that $$ \frac{\partial f}{\partial x_i} = 4(x_i+b_i)^3 + 2e^{2x_i}. $$ So the gradient vector is $$ \nabla f = \left(4(x_1+b_1)^3 + 2e^{2x_1}, \dots, 4(x_n+b_n)^3 + 2e^{2x_n}\right) $$

0
On

For typing convenience, define the vectors $$\eqalign{ y &= x+b \quad&\implies\quad &dy = dx \\ w &= y\odot y \quad&\implies\quad &dw = 2y\odot dy \;\implies\; y\odot dw = 2 w\odot dy\\ e &= \exp(x) \;&\implies\quad &de = e\odot dx \\ }$$ and the diagonal matrices $$\eqalign{ W &= \operatorname{Diag}(w) \quad&\implies\quad Wz &= w\odot z \quad\big({\rm for\,an\,arbitrary\,vector\,}z\big) \\ E &= \operatorname{Diag}(e) \quad&\implies\quad Ez &= e\odot z \\ Y &= \operatorname{Diag}(y) \quad&\implies\quad W &= Y^2 = (X+B)^2 \\ }$$ Write the function in terms of these new variables, then calculate the gradient $(g)$. $$\eqalign{ f &= w:w + e:e \\ df &= 2w:dw + 2e:de \\ &= 2w:(2y\odot dy) + 2e:(e\odot dx) \\ &= (4y\odot w):dy + (2e\odot e):dx \\ &= (4y\odot w + 2e\odot e):dx \\ \frac{\partial f}{\partial x} &= 4y\odot w + 2e\odot e \;\doteq\; g \\ }$$ Now calculate the gradient of the gradient, i.e. the Hessian $(H)$. $$\eqalign{ dg &= 4\,(dy\odot w + y\odot dw) + 2\,(de\odot e + e\odot de) \\ &= 4\left(w\odot dy + y\odot dw + e\odot de\right) \\ &= 4\left(w\odot dy + 2w\odot dy + e\odot e\odot dx\right) \\ &= 4\left(3w + e\odot e\right)\odot dx \\ &= \left(12W + 4E^2\right)dx \\ \frac{\partial g}{\partial x} &= \left(12Y^2 + 4E^2\right) \;\doteq\; H \\ }$$


In the above, $(\odot)$ is used to denote the elementwise/Hadamard product, while a colon is used to denote the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB)$$ Interestingly, the Hessian is a diagonal matrix.