Standard result for the gradient of a multidimensional Gaussian

105 Views Asked by At

If $\vec x$ is a vector of dimension $n$ and $A$ is a symmetric matrix of dimension $n\times n$. I would like to know what is the standard result for computing the following expression?

$$\frac{\partial}{\partial {\vec x}}\exp(-{\vec x}^T\cdot A \cdot {\vec x})$$

So far, for me:

$$\frac{\partial}{\partial {\vec x}}\exp(-{\vec x}^T\cdot A \cdot {\vec x}) = -(A\cdot {\vec x} +{\vec x}^T\cdot A)\exp(-{\vec x}^T\cdot A \cdot {\vec x})$$

Am I missing something to get the exact result?

2

There are 2 best solutions below

0
On BEST ANSWER

Let $f : \mathbb R^n \to \mathbb R$ be defined by

$$f (\mathrm x) := \exp \left( - \rm x^\top A \, x \right)$$

Hence,

$$\begin{array}{rl} f (\mathrm x + h \mathrm v) &= \exp \left( - (\mathrm x + h \mathrm v)^\top \mathrm A \, (\mathrm x + h \mathrm v) \right)\\ &= \exp \left( - \mathrm x^\top \mathrm A \, \mathrm x - h \, \mathrm v^\top \mathrm A \,\mathrm x - h \, \mathrm x^\top \mathrm A \,\mathrm v - h^2 \, \mathrm v^\top \mathrm A \,\mathrm v \right)\\ &= \exp \left( - \rm x^\top A \, x \right) \exp \left( - h \left(\mathrm v^\top \mathrm A \,\mathrm x + \mathrm x^\top \mathrm A \,\mathrm v \right) \right) \exp \left( - h^2 \, \mathrm v^\top \mathrm A \,\mathrm v \right)\\ &= f (\mathrm x) \left( 1 - h \left(\mathrm v^\top \mathrm A \,\mathrm x + \mathrm x^\top \mathrm A \,\mathrm v \right) + O \left(h^2\right) \right)\\ &= f (\mathrm x) - h \, f (\mathrm x) \left(\mathrm v^\top \mathrm A \,\mathrm x + \mathrm x^\top \mathrm A \,\mathrm v \right) + O \left(h^2\right)\end{array}$$

Thus, the directional derivative of $f$ in the direction of $\rm v$ at $\rm x$ is

$$\begin{array}{rl} \displaystyle\lim_{h \to 0} \dfrac{f (\mathrm x + h \mathrm v) - f (\mathrm x)}{h} &= - f (\mathrm x) \left(\mathrm v^\top \mathrm A \,\mathrm x + \mathrm x^\top \mathrm A \,\mathrm v \right)\\ &= - f (\mathrm x) \left( \langle \mathrm v , \mathrm A \,\mathrm x \rangle + \langle \mathrm A^\top \mathrm x , \mathrm v \rangle \right)\\ &= \langle \mathrm v , \color{blue}{- f (\mathrm x)\left(\mathrm A + \mathrm A^\top\right) \,\mathrm x} \rangle\end{array}$$

Lastly, the gradient of $f$ with respect to $\rm x$ is

$$\nabla_{\mathrm x} \, f (\mathrm x) = \color{blue}{- f (\mathrm x)\left(\mathrm A + \mathrm A^\top\right) \,\mathrm x}$$

If $\rm A$ is symmetric, then

$$\nabla_{\mathrm x} \, f (\mathrm x) = \color{blue}{- 2 \, f (\mathrm x) \, \mathrm A \mathrm x}$$


0
On

You have a composition of:

the diagonal (linear) $$\Delta:\Bbb R^n\longrightarrow\Bbb R^n\times\Bbb R^n$$ $$x\longmapsto(x,x)$$ a symmetric bilinear form $$B: \Bbb R^n\times\Bbb R^n\longrightarrow\Bbb R$$ $$(x,y)\longmapsto B(x,y) = -x^TAy$$ and the exponential $$\exp: \Bbb R\longrightarrow\Bbb R.$$ By the chain rule, $$D(\exp\circ B\circ\Delta)(x) = \exp'(B(\Delta(x_0)))DB(\Delta(x_0))D\Delta(x_0)$$ Obviously, $$\exp'(B(\Delta(x_0))) = \exp(B(x_0,x_0)),\qquad D\Delta(x_0) = \Delta.$$ While the differential of the bilinear form in $(x_0,y_0)$ is $$(x,y)\longmapsto B(x,y_0) + B(x_0,y).$$ Written as row vector, $DB(\Delta(x_0))\Delta$ is $$ -(x_0^TA +x_0^TA).$$ So the required differential in $x_0$, written as row vector is:

$$ -2\exp(-x_0^TAx_0)x_0^TA.$$