matrix differentiation rule

86 Views Asked by At

If $X \in \mathbb{R}^{n}, A \in M_{n \times n}, Y = X^{T}A,$ show that $\frac{d(Y)}{dX} = A^{T}.$

Here is my attempt:

Let $X^{T} = [x_{1}, \dots,x_{n}], A = [\vec{a_{1}},\dots,\vec{a_{n}}],$ then $X^{T}A = [X^{T}\vec{a_{1}},\dots,X^{T}\vec{a_{n}}],$ Assume $\vec{a_{i}} = [a_{1i},\dots,a_{ni}]^{T}$ then $X^{T}\vec{a_{i}} = \sum_{k=1}^{n}a_{ki}x_{i},$ thus $\frac{dX^{T}\vec{a_{i}}}{dX} = [a_{1i},\dots,a_{ni}],$ which is a row vector.. but know I get confused since $\frac{dX^{T}A}{dX} = [\frac{d(X^{T}\vec{a_{1}})}{d(X)},\dots,\frac{d(X^{T}\vec{a_{n}})}{d(X)}]$ where each element is a row vector... it does not make sense.

Is there anything wrong in my procedure?

2

There are 2 best solutions below

2
On BEST ANSWER

One simple approach is just to work one element at a time. Since $X$ and $Y$ are vectors, $dY/dX$ is a matrix whose $(j,i)^{th}$ entry is $\frac{\partial Y_i}{\partial X_j}$. We can compute this directly from the definition of matrix multiplication: $$ \frac{\partial Y_i}{\partial X_j}=\frac{\partial \left[\sum_{k=1}^n X_{k}A_{ik}\right]}{\partial X_j}=\frac{\partial X_jA_{ij}}{\partial X_j}=A_{ij}. $$ Thus, $\frac{dY}{dX}=A^T$, since they have the same $(j,i)^{th}$ entries for each $i$ and $j$.

0
On

I find it instructive to use the definition: The derivative of $f$ at $x$, denoted $Df_x$, is the the unique linear function such that for all directions $h,$ $$ f(x+h) = f(x) + Df_x( h) + o(|h|). $$ The derivative $Df_x$ can thus be found by perturbing $x$ in an arbitrary direction $h$, evaluating $f$ at the perturbed point, and throwing away any $o(|h|)$ terms. $$ f(x + h) = (x + h)^\top A = x^\top A + h^\top A =x^\top A + (A^\top h)^\top , $$ from which we identify the linear term in $h$ as the derivative. In other words, the derivative $Df_x$ in some direction $h$ is the unique linear function given by $(A^\top h)^\top $.