Prove various $\mathbb{R}^n$ differentiation identities

56 Views Asked by At

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}^n$, $x \in \mathbb{R}^n$. Let $\frac{\partial g}{\partial x}$ be a Jacobian matrix so that $\frac{\partial g}{\partial x} =\begin{bmatrix} \frac{\partial g_1}{\partial x_1} & \frac{\partial g_1}{\partial x_2} & \dots & \frac{\partial g_1}{\partial x_n} \\[1ex] % <-- 1ex more space between rows of matrix \frac{\partial g_2}{\partial x_1} & \frac{\partial g_2}{\partial x_2} & \dots & \frac{\partial g_2}{\partial x_n} \\[1ex] % \dots & \dots & \dots & \dots \\ \frac{\partial g_m}{\partial x_1} & \frac{\partial g_m}{\partial x_2} & \dots & \frac{\partial g_m}{\partial x_m} \end{bmatrix}$.

If $m = 1$, then $\frac{\partial g}{\partial x}$ is a gradient. In my notes, the gradient is expressed as a column, instead of a row, so I've gotten a little bit confused with dimensionality.

Prove that

  1. If $a \in \mathbb{R}^n$, $x \in \mathbb{R}^n$, then $\frac{\partial(a^{\intercal}x)}{\partial x}= a.$
  2. If $\mathbf{A} \in \mathbb{R}^{m \times n}$, $x \in \mathbb{R}^n$, then $\frac{\partial(\mathbf{A}x)}{\partial x}= \mathbf{A}$.
  3. If $\mathbf{A} \in \mathbb{R}^{m \times n}$, $x \in \mathbb{R}^n$, then $\frac{\partial(x^\intercal\mathbf{A}x)}{\partial x} = (\mathbf{A} + \mathbf{A^\intercal})x$; in particular, if $\mathbf{A}^\intercal = \mathbf{A}$, then $\frac{\partial(x^\intercal\mathbf{A}x)}{\partial x} = 2\mathbf{A}x$.
  4. If $x \in \mathbf{R}^n$, then $\frac{\partial ||x||^2}{\partial x} = 2x$.

I believe it should not be too hard.

  1. By multiplying a vector and vector transpose, we obtain $a^\intercal x = \langle a_1x_1 + \dots + a_nx_n \rangle$. Therefore, $\frac{\partial(a^{\intercal}x)}{\partial x}= [\frac{\partial(a^{\intercal}x)}{\partial x_1}, \dots, \frac{\partial(a^{\intercal}x)}{\partial x_n}] = [a_1, \dots, a_n] = a.$
  2. Similarly to the first, $\frac{\partial(Ax)}{dx} = [\frac{\partial(a_1x)}{\partial x}, \dots, \frac{\partial(a_mx)}{\partial x}]$ = $[a_1,\dots, a_m] = \mathbf{A}$.
  3. For $\mathbf{A}$ being symmetrical, we could write out $x^\intercal\mathbf{A}x = \sum_{i = 1}^{n} \sum_{i = 1}^{n} x_i a_{ij} x_j$ and show that $a_{1i} = a_{i1}$. How do I proceed with $\mathbf{A}$ being non-symmetrical $m \times n$?
  4. $\frac{\partial||x||^2}{\partial x} = \frac{\partial}{\partial x}\sum_ix^2_i = \sum_i2x_i = 2x$.

Could you please check it up and point out mistakes, perhaps making it more rigorous? Thanks.

1

There are 1 best solutions below

2
On BEST ANSWER

Everything you write is fine. For what concerns point 3., first note that it makes sense only if $m=n$. After that, you just decompose $A$ in its symmetric and antisymmetric part: $$ A=\frac{A+A^T}{2}+\frac{A-A^T}{2}. $$ Only the symmetric part of $A$ gives a contribution to the expression $x^T A x$. Indeed, if $B$ is an antisymmetric matrix, i.e., if $B^T=-B$, then $$ x^T B x=Bx\cdot x=x\cdot B^T x=- x\cdot B x=-x^T B x $$ from which $2 x^T B x=0$, from which $x^T B x=0$.

Therefore, $x^T A x=x^T \frac{A+A^T}{2}x$ and you can apply the result you computed for $A$ symmetric. Namely $$ \partial_x (x^T A x)=\partial_x(x^T \frac{A+A^T}{2}x)=2(\frac{A+A^T}{2})x=(A+A^T)x. $$ Summarizing, you just need to prove the formula for $A$ symmetric.

PS Your book is "right", the gradient must be a column vector. When $m=1$ is better to think of the jacobian matrix as the transposed gradient. You will get the reason for that in future classes.