What is the general expression for the gradient of $\dfrac{1}{2} x^TA(x)x$?

244 Views Asked by At

Let $A$ be some matrix function of $x$, $A:\mathbb{R}^n \to \mathbb{R}^{n\times n}$.

My question is, is there some general formula for the gradient of

$$f(x) =\dfrac{1}{2} x^TA(x)x$$

I only know some special cases:

  • $A$ is a constant, symmetric matrix, then $\nabla f(x) = Ax$

    $A$ is a constant, assymmetric matrix, then $\nabla f(x) = \dfrac{1}{2} (A^T+A)x$

    $A$ is $\text{diag}(x)$, then $\nabla f(x) = \dfrac{3}{2} Ax$

I'm stuck because I don't understand how to find a simpler expression for the derivative $D[A(x)x]$, where,

\begin{align*} \nabla f(x) &= \nabla \dfrac{1}{2} x^TA(x)x\\ &= \nabla \dfrac{1}{2} x^Tg(x) \\ &= \dfrac{1}{2}A(x)x + \dfrac{1}{2}x^TD[A(x)x] \end{align*}

3

There are 3 best solutions below

0
On BEST ANSWER

I assume you mean $A: \mathbb R^n \to \mathbb R^{n\times n}$ so that $A$ takes in a vector and gives a matrix. I'll operate under this assumption.

We see $$f(x) = \frac 1 2\sum_{i,j = 1}^n x_ix_jA_{ij}(x).$$ Now fix $k \in \{ 1,\ldots, n\}$, Then by the product rule $$\frac{\partial f }{\partial x_k}(x) = \frac{1}{2} \sum^n_{i,j=1} \left( x_ix_j\frac{\partial A_{ij}}{\partial x_k}(x) + \delta_{ik} x_j A_{ij}(x) + \delta_{jk} x_i A_{ij}(x)\right) $$ where $\delta_{ab} = \left\{\begin{smallmatrix} 1, & a = b, \\ 0, & a \neq b.\end{smallmatrix} \right.$ Resolving these $\delta$'s, we see $$\frac{\partial f }{\partial x_k}(x) = \frac{1}{2} \left(\sum^n_{i,j=1} x_ix_j\frac{\partial A_{ij}}{\partial x_k}(x) \right) + \frac 1 2 \left( \sum_{j=1}^n x_j A_{kj}(x)\right) + \frac 1 2\left( \sum^n_{i=1} x_i A_{ik}(x)\right).$$ Now the latter two terms turn into $\frac 1 2(A(x) + A^T(x))x$ when you put everything together. For the first term, you sort of need to invent notation (actually we'll use tensor notation), and what you have is $$\frac 1 2 x^T(\nabla \circ A)(x)x$$ where $\nabla \circ A: \mathbb R^n \to \mathbb R^{n\times n \times n}$ is given by $(\nabla \circ A)_{ijk}(x) = \frac{\partial A_{ij}}{\partial x_k}(x).$ Note that $x^T (\nabla\circ A)(x) x \in \mathbb R^n$ for any $x \in \mathbb R^n$ and by convention the inner products operate on the first two dimensions of $\nabla \circ A$. Then you can write $$\nabla f(x) = \frac 1 2 (x^T(\nabla\circ A)(x) x + A(x)x + A^T(x) x).$$

2
On

Consider that $$x^T A \,x = \sum_{i=1}^{n}\sum_{j=1}^n x_i a_{ij} \, x_j$$

where $x_i$ are the entries of $x$ and $a_{ij}$ are the entries of $A$.

Then all you need is to differentiate this expression with respect to any component $x_i$ to get the entries of the gradient (careful with the $i=j$ cases in the sum, and also with the fact that the entries of $A$ may depend on $x$.)

2
On

I am just going to write some stuff. Let $ A \in \mathbb{R}^{ n \times n} x \in \mathbb{R}^{n}$

$$ r(x) = \frac{x^{T}Ax}{x^{T}x} \in \mathbb{R} \tag{1}$$

is called the Rayleigh quotient. If $x$ is an eigenvector of $A$ then $Ax=\lambda x$

$$ r(x) = \frac{\lambda x^{T}x}{x^{T}x} = \lambda \tag{2}$$

you can show in general $ \lambda_{min} \leq r(x) \leq \lambda_{max}$

now then

$$\nabla r(x) = \frac{2}{x^{T}x}((Ax) -(r(x)x) \tag{3}$$

$r(x)$ is simply $\frac{2}{x^{T}x}$ times $f(x)$