Gradient calculated with Directional Derivative not similar to Gradient calculated with Product Rule for $X^{T}X$

73 Views Asked by At

Calculate the gradient of $f(X) = X^{T}X$

Using Directional Derivative
$ Df(X) \cdot H = \lim_{t\to0} \frac{(X+tH)^{T}(X+tH) - X^TX}{t} = X^{T}H + H^{T}X$

Using Product Rule
$Df(X) = X^{T} \cdot D(X) + D(X^{T}) \cdot X = X^{T} + X $
$Df(X)\cdot H = (X^{T} + X) \cdot H = X^{T} \cdot H + X \cdot H$

What am i getting wrong ? They should be the same. $X^{T}H + H^{T}X$ is symmetric but not equal to $X^{T}H + XH$.

1

There are 1 best solutions below

0
On

$ \require{enclose} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\l{\ell} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\F{F_{ij}} \def\d{\delta} \def\Ek{E_{k\l}} \def\El{E_{\l k}} \def\X{X_{k\l}} $There is a very general product rule for the differential of two arbitrary tensors {$A,B$} and any product {$\star$} with which they are dimensionally compatible. $$\eqalign{ d(A\star B) &= (A+dA)\star(B+dB) \;-\; A\star B \\ &= \big(A\star B +A\star dB +dA\star B +dA\star dB\big) \;-\; A\star B \\ &= A\star dB + dA\star B + (\enclose{horizontalstrike}{dA\star dB}) \\ &= A\star dB + dA\star B \\ }$$ This rule is only valid for differentials. It cannot be applied to gradients or directional derivatives or other types of derivatives.

In the current problem {$\star$} is the ordinary matrix product and $\,\{A,B\}=\{X^T,X\}$
Therefore, for the matrix-valued function $F=f(X)\;$ we have $$\eqalign{ F &= X^TX \qiq dF &= X^TdX + dX^TX \\ }$$ The directional derivative can be recovered by setting $\:dX=H.$

Unfortunately, the gradient is a fourth-order tensor which is impossible to render using standard matrix notation. However, the component-wise gradient is merely matrix-valued and is obtained by setting $\,dX=\Ek\;$ (which is a matrix whose elements are all zero except for the $(k,\l)$ element which is equal to one) $$\eqalign{ \grad X\X &= \Ek \qiq \grad F\X &= X^T\Ek + \Ek^TX \;=\; X^T\Ek+ \El X \\ }$$ Taking the $(i,j)$ component of each term in this expression (and introducing Kronecker delta symbols) yields the fully indexed form of the tensor gradient $$\eqalign{ \grad \F\X &= X_{ik}^T\d_{\l j} + \d_{i\l}X_{kj} \;=\; \d_{j\l}X_{ki} + \d_{i\l}X_{kj} \\ }$$ With this tensor expression (and some familiarity with index notation) you can calculate anything you need regarding the behavior of the $F$ matrix.