How to prove the following $\nabla X^T Y = \nabla X Y + \nabla Y X$

400 Views Asked by At

This should an very easy question. But my brain just can't work with matrix that well. Can anyone show me how to prove the following?

$$\nabla( X^T Y ) = \nabla X Y + \nabla Y X$$ wherer $\nabla$ means computing the Jacobian. Thanks!


Edited:

Just to clarify the question, both X and Y are vector value functions.

$$X(\theta): \mathbb R^n \rightarrow \mathbb R^m$$ $$Y(\theta): \mathbb R^n \rightarrow \mathbb R^m$$

Further more, $X$ and $Y$ can be written as the following two matrixes:

\begin{equation} X=\begin{bmatrix}X_1(\theta) \\ \vdots\\ X_m(\theta)\end{bmatrix}, ~~ Y=\begin{bmatrix}Y_1(\theta) \\ \vdots\\ Y_m(\theta)\end{bmatrix} \end{equation}

Computing a Jacobian matrix (or the derivative of a vector function) is very well defined in wikipedia.


Follow up:

A colleague told me this property $\nabla( X^T Y ) = \nabla X Y + \nabla Y X$ today but I don't quick understand how to prove it. And I don't want to wait till tomorrow to arrange another meeting with him again. Thanks to the comments of @TZakrevskiy and @Will Jagy. This property becomes obvious if we write it out with $m=2$. Here it goes:

\begin{eqnarray} \nabla (X^TY) &=& \nabla \left( \begin{bmatrix}X_1 \\ X_2\end{bmatrix}^T \begin{bmatrix}Y_1 \\ Y_2\end{bmatrix} \right) \\\\ &=& \nabla \left( X_1 Y_1 + X_2 Y_2 \right) \\\\ &=& \nabla X_1 Y_1+ X_1 \nabla Y_1 + \nabla X_2 Y_2 + X_2 \nabla Y_2 \\\\ &=& \nabla X Y + \nabla Y X \end{eqnarray}

It is not hard to see that it works for any $m$.

1

There are 1 best solutions below

0
On BEST ANSWER

Yuchen do you understand that you write ? $X^TY$ is the standard scalar product $(X(\theta),Y(\theta)):\mathbb{R}^n\rightarrow \sum_iX_j(\theta)Y_j(\theta)\in\mathbb{R}$ and $\nabla_{\theta}((X,Y))=(\dfrac{\partial(X,Y)}{\partial\theta_i})_i$ is its gradient, a vector of $\mathbb{R}^n$.

EDIT: there is a mistake about the dimension of a jacobian matrix. On the other hand, the Jacobian matrices $Jac_{\theta}(X),Jac_{\theta}(Y)$ are $m\times n$ matrices. One has $\dfrac{\partial(X,Y)}{\partial\theta_i}=\sum_j(\dfrac{\partial X_j}{\partial\theta_i}Y_j+\dfrac{\partial Y_j}{\partial\theta_i}X_j)$ that is equivalent to $\nabla_{\theta}((X,Y))=(Jac_{\theta}(X))^TY+(Jac_{\theta}(Y))^TX$. Usually, the gradient $\nabla_{\theta}(f)$ is not defined if $im(f)\subset \mathbb{R}^m$ ; yet, sometimes one puts in such a case $\nabla_{\theta}(f)=(Jac_{\theta}(f))^T$ ; then the formula becomes $\nabla_{\theta}(X^TY)=\nabla_{\theta}(X)Y+\nabla_{\theta}(Y)X$.