I am watching the following video lecture:
https://www.youtube.com/watch?v=G_p4QJrjdOw
In there, he talks about calculating gradient of $ x^{T}Ax $ and he does that using the concept of exterior derivative. The proof goes as follows:
- $ y = x^{T}Ax$
- $ dy = dx^{T}Ax + x^{T}Adx = x^{T}(A+A^{T})dx$ (using trace property of matrices)
- $ dy = (\nabla y)^{T} dx $ and because the rule is true for all $dx$
- $ \nabla y = x^{T}(A+A^{T})$
It seems that in step 2, some form of product rule for differentials is applied. I am familiar with product rule for single variable calculus, but I am not understanding how product rule was applied to a multi-variate function expressed in matrix form.
It would be great if somebody could point me to a mathematical theorem that allows Step 2 in the above proof.
Thanks! Ajay
\begin{align*} dy & = d(x^{T}Ax) = d(Ax\cdot x) = d\left(\sum_{i=1}^{n}(Ax)_{i}x_{i}\right) \\ & = d \left(\sum_{i=1}^{n}\sum_{j=1}^{n}a_{i,j}x_{j}x_{i}\right) =\sum_{i=1}^{n}\sum_{j=1}^{n}a_{i,j}x_{i}dx_{j}+\sum_{i=1}^{n}\sum_{j=1}^{n}a_{i,j}x_{j}dx_{i} \\ & =\sum_{i=1}^{n}(Ax)dx_{i}+\sum_{i=1}^{n}(Adx)x_{i} =(dx)^{T}Ax+x^{T}Adx \\ & =(dx)^{T}Ax+(dx)^{T}A^{T}x =(dx)^{T}(A+A^{T})x. \end{align*}