Rank-1 matrix derivative

606 Views Asked by At

I am trying to prove that

$$\displaystyle{\frac{\partial}{\partial x} \left( x\cdot x^\top \right) = x\otimes \mathbb{I}+\mathbb{I}\otimes x}$$

The product $x\cdot x^\top$ is a rank-$1$ matrix. Thus, we actually have a derivative of a matrix by a vector. How do we handle such case? Can you please help prove the equality?

2

There are 2 best solutions below

7
On BEST ANSWER

I prefer to use Frechét derivatives to think about derivatives with matrices involved. If you prefer other concepts of derivatives, let me know.

Computation

Let $F: \mathbb R^n \to \mathrm{Mat}(n\times n): x \mapsto x \cdot x^T$, we compute $$F(x + h) - F(x) = (x+h) \cdot (x+h)^T - x \cdot x^T\\ = h \cdot x ^T + x \cdot h^T + h \cdot h^T,$$ hence $$F(x+h)-F(x) = (\mathbb I \otimes x + x \otimes \mathbb I)[h] + \mathcal o(h).$$

This implies $Df(x) = \mathbb I \otimes x + x \otimes \mathbb I$.

($\mathbb I \otimes x: \mathbb R^n \to \mathrm{Mat}(n\times n): h \mapsto h \cdot x^T$ and $x \otimes \mathbb I: \mathbb R^n \to \mathrm{Mat}(n\times n): h \mapsto x \cdot h^T$.)

Interpretation

The Frechét derivative of $f$ at a point $x$ is a linear map $Df(x): \mathbb R^n \to \mathrm{Mat}(n \times n)$, which is the best linear approximation of $f$.

We already have computed how $\mathrm D f(x)$ acts on an vector $h$. ($\mathrm D f(x): h \mapsto h \cdot x^T + x \cdot h^T$.)

Now sometimes a better notation or a matrix-like representation is needed. One possibility is to identify $\mathrm{Mat}(n \otimes n)$ with $\mathbb R^{n\cdot n}$ and to represent the linear map as a matrix (with dimension $n \times (n^2)$. But this is not really instructive.

Instead we use the tensor notation as above and get a nice and short representation for this linear map.

To practice this concept, you may could try to compute the Frechét derivative of the map $\mathrm{Mat}(n \times n) \to \mathrm{Mat}(n \times n): X \mapsto X^T \cdot X$.

Clarification

Like pointed out by RodrigodeAzevedo: The term $\mathbb I \otimes x$ is maybe more commonly interpreted as a Kronecker-product, which leads to a vectorized expression.

Hence $$\mathbb I \otimes x = \begin{pmatrix} 1 \cdot x& \dots &0 \cdot x \\ \vdots & \ddots & \vdots \\ 0 \cdot x & \dots &1 \cdot x \end{pmatrix} \in \mathbb R^{n^2 \times n}$$ and $$x \otimes \mathbb I = \begin{pmatrix} x_1 \cdot \mathbb I \\ \vdots \\ x_n \cdot \mathbb I \end{pmatrix} \in \mathbb R^{n^2 \times n}.$$

If we use this definition for $\otimes$, we have to vectorize the matrices in order to prove that our candidate is a derivative in the usual sense, i.e. $$ \lim_{h\to 0} \frac{\mathrm{vec}( F(x+h) - F(x) ) - (\mathbb I \otimes x + x \otimes \mathbb I)h }{||h||} = 0.$$

0
On

Let matrix-valued function $\mathrm F : \mathbb R^n \to \mathbb R^{n \times n}$ be defined by

$$\mathrm F (\mathrm x) := \mathrm x \mathrm x^\top$$

Its directional derivative in the direction of $\rm v$ at $\rm x$ is

$$\lim_{h \to 0} \frac 1h \big( \mathrm F (\mathrm x + h \mathrm v) - \mathrm F (\mathrm x) \big) = \mathrm v \mathrm x^\top + \mathrm x \mathrm v^\top$$

Vectorizing, we obtain

$$\mbox{vec} \big( \mathrm v \mathrm x^\top + \mathrm x \mathrm v^\top \big) = \mbox{vec} \big( \mathrm I_n \mathrm v \mathrm x^\top + \mathrm x \mathrm v^\top \mathrm I_n \big) = \big( \color{blue}{\mathrm x \otimes \mathrm I_n + \mathrm I_n \otimes \mathrm x} \big) \mathrm v$$