Differentiation of a matrix in respect to a vector

2.8k Views Asked by At

Could someone show and explain the differentiation of the following

$$\frac{\partial(x^t Ax)}{\partial x} $$

Where x is a column vector and A is a symmetric square matrix. I'm in highschool so I have no experience with vector calculus and I only need this derivation for a partial derivative in a Lagrange function. It should come out as $2Ax$

Thanks a lot, Tom

2

There are 2 best solutions below

2
On BEST ANSWER

If you're not comfortable with vector calculus, try out the assertion on a small case, say $n=2$. Then you are trying to differentiate the scalar (one-dimensional) quantity $$\begin{align} Q(x_1,x_2):=x^TAx&= \begin{matrix}(x_1 &x_2)\end{matrix} \left(\begin{matrix}A_{1,1} &A_{1,2}\\ A_{2,1} &A_{2,2}\end{matrix}\right) \left(\begin{matrix}x_1 \\x_2\end{matrix}\right)\\ &=x_1A_{1,1}x_1 + x_1A_{1,2}x_2 + x_2 A_{2,1}x_1 + x_2A_{2,2}x_2. \end{align} $$ Taking the partial derivative of $Q$ with respect to $x_1$ gives $$ \frac{\partial Q}{\partial x_1}=2A_{1,1}x_1+A_{1,2}x_2 + x_2A_{2,1}=2(A_{1,1}x_1 + A_{1,2}x_2)\tag1$$ since $A_{1,2}=A_{2,1}$. We recognize the RHS of (1) as the first element in the column vector $2Ax$, which is the product of the scalar $2$ with the matrix $A$ and the column vector $x$.

A similar calculation shows that $\frac{\partial Q}{\partial x_2}$ is the second element in $2Ax$.

EDIT: Now that you see how the $n=2$ case works, the general case is similar. Write out the matrix product: $$ Q(x_1,x_2,\ldots,x_n):=x^TAx=\sum_{i=1}^n\sum_{j=1}^n x_iA_{i,j}x_j. $$ For a fixed $k$ you compute the partial derivative of $Q$ wrt $x_k$ by considering which of the indices $i,j$ are equal to $k$: $$\begin{align} \frac{\partial Q}{\partial x_k}&= \frac{\partial}{\partial x_k}(x_kA_{k,k}x_k)+ \frac{\partial}{\partial x_k}(\sum_{j\ne k}x_kA_{k,j}x_j)+ \frac{\partial}{\partial x_k}(\sum_{i\ne k}x_iA_{i,k}x_k)\\ &=2A_{k,k}x_k +\sum_{j\ne k}A_{k,j}x_j + \sum_{i\ne k} x_iA_{i,k}\\ &=2\sum_k A_{k,j}x_j, \end{align} $$ the last equality arising after relabeling index $i$ as $j$ and using the fact $A_{j,k}=A_{k,j}$. We recognize the final quantity as the $k$th element in the column vector $2Ax$.

Writing out the matrix product is typically the way to prove these vector calculus identities. You should be aware that different authors use different conventions for notation in these identities, depending on whether the derivative of a scalar with respect to a vector is seen as a column vector or as a row vector. See https://en.wikipedia.org/wiki/Matrix_calculus for a very detailed discussion.

5
On

First of all, the convention in linear algebra is to express vector $x$ as a $n\times 1$ matrix. Conforming to this would reduce some confusion while doing complicated matrix algebra.

I convinced myself with the following two steps while struggling the first year of grad school, and hope it helps you.

Step1

before getting to the expression, I guess you had already met $$ \dfrac {\partial Ax}{\partial x^T}=A $$ let me explain this one first.

define $g(x): \mathbb R^n \rightarrow \mathbb R^m$ where the $i^{th}$ element of $g$ is $g_i(x_1,x_2,...,x_n)$. Then we call it "Jacobian", the $m\times n$ matrix of first partial derivative : $$ \dfrac {\partial g(x)}{\partial x^T}=\dfrac {\partial (g_1,g_2,...,g_m)}{\partial (x_1,x_2,...,x_n)} $$ where element in row i and column j is $g_{ij}=\dfrac {\partial g_i(x_1,x_2,...,x_n)}{\partial x_j}$

Now, let $g(x)=Ax$, then $$ g_i(x_1,x_2,...,x_n)=a_{i1}x_{i1}+a_{i2}x_{i2}+...+a_{in}x_{in} $$ and we get $\dfrac {\partial g_i(x_1,x_2,...,x_n)}{\partial x_j}=a_{ij}$ which means $\dfrac {\partial Ax}{\partial x^T}=A$

Step2

note that $x^TAx=x^T(Ax)$, using the product rule: $$ \dfrac {\partial g^T h}{\partial x^T}= h^T \dfrac {\partial g}{\partial x^T}+g^T \dfrac {\partial h}{\partial x^T} $$ we get $$ \begin{align} \dfrac {\partial x^TAx}{\partial x^T}&=(Ax)^T\dfrac {\partial x}{\partial x^T}+x^T\dfrac {\partial (Ax)}{\partial x^T}\\ &=x^TA^T+x^TA\\ &=x^T(A+A^T)\\ &=2x^TA \end{align} $$ the last line comes form the symmetry of $A$.

ps. many Ph.D. students in econ or some other fields still don't know why it is true, based on my observation...