Use of the chain rule in the derivation of $x^\top Ax$

57 Views Asked by At

I'm trying to understand the derivation of $x^\top Ax$ from this step by step explanation (from this previous question), which I'm going to copy for the sake of clarity:

The only thing you need to remember/know is that

$$\dfrac{\partial (x^Ty)}{\partial x} = y$$ and the chain rule, which goes as $$\dfrac{d(f(x,y))}{d x} = \dfrac{\partial (f(x,y))}{\partial x} + \dfrac{d( y^T(x))}{d x} \dfrac{\partial (f(x,y))}{\partial y}\quad \text{(1)}$$ Hence, $$\dfrac{d(b^Tx)}{d x} = \dfrac{d (x^Tb)}{d x} = b$$ $$\dfrac{d (x^TAx)}{d x} = \dfrac{\partial (x^Ty)}{\partial x} + \dfrac{d (y(x)^T)}{d x} \dfrac{\partial (x^Ty)}{\partial y}$$ where $y = Ax$. And then, that is, $$\dfrac{d (x^TAx)}{d x} = \dfrac{\partial (x^Ty)}{\partial x} + \dfrac{d( y(x)^T)}{d x} \dfrac{\partial (x^Ty)}{\partial y} = y + \dfrac{d (x^TA^T)}{d x} x = y + A^Tx = (A+A^T)x$$ The definition of the multivariate chain rule for multiplication says: $$ f(u,v) = uv $$ the partials are $D_1f = v$ and $D_2f = u$. Thus, $$ \frac{d}{dx}(g(x)h(x)) = h(x)\frac{d}{dx}g(x) + g(x)\frac{d}{dx}h(x) $$ Here is my question: How can we make the connection between the 2?

1

There are 1 best solutions below

3
On BEST ANSWER

All matrix differentiation questions can be answered by expanding out the indices, in this case $$ x^TAx=\sum_{ij}x_iA_{ij}x_j $$ from which it follows that $$ \frac{\partial(x^TAx)}{\partial x^k}=\sum_{ij}\frac{\partial x_i}{\partial x_k}A_{ij}x_j+\sum_{ij}x_iA_{ij}\frac{\partial x_j}{\partial x_k}, $$ where all we have done is apply the usual product rule to each term.

Now $\partial x_i/\partial x_j$ equals $1$ if and only if $i=j$, and is zero otherwise, so $$ \frac{\partial(x^TAx)}{\partial x^k}=\sum_{j}A_{kj}x_j+\sum_{i}x_iA_{ik}=(Ax)_k+(A^Tx)_k, $$ which means that $$ \frac{\partial(x^TAx)}{\partial x}=Ax+A^Tx. $$