Differentiate $f(x)=x^TAx$

75.7k Views Asked by At

Calculate the differential of the function $f: \Bbb R^n \to \Bbb R$ given by $$f(x) = x^T A x$$ with $A$ symmetric. Also, differentiate this function with respect to $x^T$.


How exactly does this work in the case of vectors and matrices? Could anyone please help me out?

5

There are 5 best solutions below

4
On BEST ANSWER

As a start, things work "as usual": You calculate the difference between $f(x+h)$ and $f(x)$ and check how it depends on $h$, looking for a dominant linear part as $h\to 0$. Here, $f(x+h)=(x+h)^TA(x+h)=x^TAx+ h^TAx+x^TAh+h^TAh=f(x)+2x^TAh+h^TAh$, so $f(x+h)-f(x)=2x^TA\cdot h + h^TAh$. The first summand is linear in $h$ with a factor $2x^TA$, the second summand is quadratic in $h$, i.e. goes to $0$ faster than the first / is negligible against the first for small $h$. So the row vector $2x^TA$ is our derivative (or transposed: $2Ax$ is the derivative with respect to $x^T$).

3
On

@Hagen von Eitzen's answer is certainly the fastest route here, but since you asked, here is a chain rule.

Here are two useful facts about linear and bilinear bounded maps from normed vectors spaces to normed vector spaces.

If $f$ is linear and bounded, then trivially: $$ df_x(h)=f(h). $$

And if $g$ is bilinear and bounded ($\|g(h,k)\|\leq C\|h\|\|k\|$), we have $$ dg_{(x,y)}(h,k)=g(x,k)+g(h,y). $$

Now take $f(x)=(x,x)$ and $g(x,y)=x^tAy$. The former is linear and bounded, the latter is bilinear and bounded.

So, by the chain rule, $g\circ f(x)=x^tAx$ is differentiable and $$ d(g\circ f)_x(h)=dg_{f(x)}\circ df_x(h)=dg_{(x,x)} (h,h)=x^tAh+h^tAx. $$

This is true for any matrix $A$. Now if $A$ is symmetric, this can be simplified since $$ x^tAh+h^tAx=x^tAh+h^tA^tx=x^tAh+(Ah)^tx=2x^tAh. $$

Removing $h$, this gives $$ d(g\circ f)_x=2x^tA. $$

9
On

There is another way to solve the problem:

Let $\mathbf{x}^{n\times 1}=(x_1,\dots ,x_n)'$ be a vector, the derivative of $\mathbf y=f(\mathbf x)$ with respect to the vector $\mathbf{x}$ is defined by $$\frac{\partial f}{\partial \mathbf x}=\begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \vdots\\ \frac{\partial f}{\partial x_n} \end{pmatrix}$$ Let \begin{align} \mathbf y&=f(\mathbf x)\\&=\mathbf x'A\mathbf x \\&=\sum_{i=1}^n\sum_{j=1}^n a_{ij}x_ix_j\\&=\sum_{i=1}^na_{i1}x_ix_1+\sum_{j=1}^na_{1j}x_1x_j+\sum_{i=2}^n\sum_{j=2}^n a_{ij}x_ix_j \\\frac{\partial f}{\partial x_1} &=\sum_{i=1}^na_{i1}x_i+\sum_{j=1}^na_{1j}x_j\\&=\sum_{i=1}^na_{1i}x_i+\sum_{i=1}^na_{1i}x_i \,[\text{since}\,\, a_{i1}=a_{1j}]\\ &=2 \sum_{i=1}^na_{1i}x_i \\ \frac{\partial f}{\partial \mathbf x}&=\begin{pmatrix} 2 \sum_{i=1}^na_{1i}x_i \\ \vdots\\ 2 \sum_{i=1}^na_{ni}x_i \end{pmatrix} \\&=2\begin{pmatrix} a_{11} & a_{12} & \dots & a_{1n}\\ \vdots & \vdots &\ddots & \vdots \\ a_{n1} & a_{n2} & \dots & a_{nn} \end{pmatrix}\begin{pmatrix}x_1 \\ \vdots \\ x_n \end{pmatrix}\\ &= 2A\mathbf x \end{align}

0
On

Here is relationship between directional derivative whenever f is differentiable.

$f'(p: v)$ denotes the derivative of $f$ at $p$ in the direction of $v$.

Let $f:U \subset \mathbb{R}^n \rightarrow \mathbb{R}$ and $p\in U$, $v \in \mathbb{R}^n$. Suppose that $f$ is differentiable at $p$. Then we have \begin{equation*} df_p(v)=f(p;v)= \lim_{t \rightarrow 0}\frac{f(\sigma(t))-f(p)}{t} \end{equation*} for any differentiable curve $\sigma:(-\epsilon, \epsilon)\rightarrow U$ such that $\sigma(0)=p$ and $\sigma '(0)=v$.

In our case $f(x)=x^TAx$ and $\sigma (t) = x+th$,

$$f'(x; h) = \lim_{t\rightarrow 0} \frac{(x+th)^TA(x+th)-x^TAx}{t}$$ $$f'(x; h) = x^TAh+h^TAx$$

Since A is symmetric and we have the follwing: $$f'(x; h) = x^TAh+h^TAx=x^TAh+x^TA^Th$$ $$f'(x; h) = x^T(A+A^T)h$$

So the differential/gradient is simply $2x^TA$. $$f'(x; h) = 2x^TAh $$

0
On

There are two small errors in the top 1 answer:

  1. The sum equation should be minus $a_{11}x_{1}^{2}$, since it was counted twice when reinform the sum equation, as @keineahnung2345 comment;
  2. When carry out the derivative, the second order terms and first order terms should be calculated separately.

Other logic is fine.

$$\begin{aligned}y&=\sum ^{n}_{j=1}a_{1j}x_{1}x_{j}+\sum ^{n}_{i=1}a_{i1}x_{i}x_{1}-a_{11}x_{1}^{2}+\sum ^{n}_{i=2}\sum ^{n}_{i=2}a_{ij}x_{i}x_{j}\\ &=a_{11}x_{1}^{2}+\sum ^{n}_{i=2}a_{1j}x_{1}x_{j}+\sum ^{n}_{i=2}a_{i1}x_{i}x_{1}+\sum ^{n}_{i=2}\sum ^{n}_{i=2}a_{ij}x_{i}x_{j}\end{aligned}$$

$$\begin{aligned}\dfrac{\partial y}{\partial x_{1}}&=2a_{11}x_{1}+\sum ^{n}_{j=2}a_{1j}x_{j}+\sum ^{n}_{i=2}a_{i1}x_{i}\\ &=\sum ^{n}_{j=1}a_{1j}x_{j}+\sum ^{n}_{i=1}a_{i1}x_{i}\\ &=2\sum ^{n}_{i=1}a_{1i}x_{i} \end{aligned}$$