Matrix differentiation proof of quadratic product $x^TAx$

1.2k Views Asked by At

would appreciate any hints with the proof for $x^TAx$ using index notation:

Suppose $x$ is an $n$ x 1 vector, $A$ is an $n$ x $n$ matrix. $A$ does not depend on $x$, and also $\alpha = x^TAx$.

Let $\alpha = \sum_{j=1}^{n}\sum_{i=1}^{n} x_i a_{ij} x_j$

Differentiating wrt to the $k^{th}$ element of x:

$\frac{\delta\alpha}{\delta x_k} = \sum_{i=1}^{n} x_ia_{ik} + \sum_{j=1}^{n} a_{kj}x_j$ for all $k$ = 1, ... , $n$

$\frac{\delta\alpha}{\delta \boldsymbol{x}} = x^TA + x^TA^T = x^T(A+A^T)$

Now I understand that the $\sum_{i=1}^{n} x_ia_{ik}$ component gives us $x^TA$ when we take $\frac{\delta\alpha}{\delta \boldsymbol{x}}$. This is because for each $k$ (which are the columns of $A$) we calculate a sum-product using the vector $x$ and the $k^{th }$ column of $A$. But how does the $\sum_{j=1}^{n} a_{kj}x_j$ component result in $x^TA^T$ and not $Ax$ in $\frac{\delta\alpha}{\delta \boldsymbol{x}}$ since we are effectively calculating the sum product of $x$ and the $k^{th}$ row of $A$?

Thank you for your time and help.

2

There are 2 best solutions below

3
On BEST ANSWER

The explanation is the following: the numbers $$ \sum_{j=1}^{n} a_{kj}x_j $$ are the entries of $Ax$, which is a column. On the other hand, $$ \frac{d\alpha}{dx}=\left(\frac{\partial\alpha}{\partial x_1},\ldots,\frac{\partial\alpha}{\partial x_n}\right) $$ is a row. That is why you need to take the transpose $(Ax)^T=x^TA^T$.

Added: alternatively, note that $$ \begin{split} (x+h)^TA(x+h) &=x^TAx+x^TAh+h^TAx+h^TAh\\ &=x^TAx+(x^TA+x^TA^T)h+h^TAh, \end{split} $$ and so $$ \frac{d}{dh}(x+h)^TA(x+h)|_{h=0}=x^TA+x^TA^T. $$

1
On

We see that $$\sum_{j=1}^n a_{kj} x_j = [x_1,x_2,...,x_n]\left[\begin{array}{c} a_{k1} \\ a_{k2} \\ ... \\ a_{kn}\end{array}\right]$$ $$=x^T\left[\begin{array}{c} a_{1k}^T \\ a_{2k}^T \\ ... \\ a_{nk}^T \end{array}\right]$$ $$=x^T a_{,k}^T,$$ Where $a_{,k}^T$ is the $k$th column of the matrix $A^T$. This multiplication holds for each column in $A^T$.