Derivative of Kronecker product of vector with itself

1.1k Views Asked by At

I'm struggling with the following problem. Suppose $\pmb{x}$ and $\pmb{y}$ are vectors of the same length and $\pmb{y}$ is not a function of $\pmb{x}$. What is the following derivative?

$$ \frac{\partial}{\partial \pmb{x}} (\pmb{y} - \pmb{x}) \otimes (\pmb{y} - \pmb{x}) $$

My thought was to write use $\pmb{z} = \pmb{y} - \pmb{x}$ and $\pmb{f} = \pmb{z} \otimes \pmb{z}$ and derive first:

\begin{align} d\pmb{f} &= ((d\pmb{z}) \otimes \pmb{z}) + (\pmb{z} \otimes (d\pmb{z})) \\ &= (\pmb{I} \otimes \pmb{z})d\pmb{z} + (\pmb{z} \otimes \pmb{I})d\pmb{z} \\ &= ((\pmb{I} \otimes \pmb{z}) + (\pmb{z} \otimes \pmb{I}))d\pmb{z} \\ \frac{\partial \pmb{f}}{\partial \pmb{z}} &= (\pmb{I} \otimes \pmb{z}) + (\pmb{z} \otimes \pmb{I}) \end{align}

and then obtain by chain rule:

$$ \frac{\partial}{\partial \pmb{x}} (\pmb{y} - \pmb{x}) \otimes (\pmb{y} - \pmb{x}) = -\left( (\pmb{I} \otimes (\pmb{y} - \pmb{x})) + ((\pmb{y} - \pmb{x}) \otimes \pmb{I}) \right) $$

Which seems sensble. However, this is part of a Hessian I am deriving, and it's corresponding transpose element I derived to be:

$$ -2\left(\pmb{I} \otimes (\pmb{y} - \pmb{x})\right) $$

Which is very similar but not the same. Am I missing something obvious?

2

There are 2 best solutions below

2
On BEST ANSWER

Let's clear out some definitions first.

If $f:z \to f(x)$ is a matrix valued function and there is a function $D_f$ such that $$ f(z+h) = f(z) + D_f(z,h) + o(\|h\|) $$ Then $D_f$ is the differential of $f$. If there exists a matrix valued function $A(z)$ such that $$ D_f(z,h) = A(z) h$$ Then $A(z)$ is the derivative of $f$.

(This is sometimes called the first identification theorem; see for instance Magnus and Neudecker, 1999).

In the case at hand, we have $$f(z+h) = (z+h)\otimes (z+h) = \underbrace{z\otimes z}_{f(z)} + \underbrace{(h\otimes z) + (z\otimes h)}_{D_f(z,h)} + \underbrace{h\otimes h}_{o(\|h\|)}$$

So, by the definition $D_f(z,h) = (h\otimes z) + (z\otimes h)$ is the differential of $z\otimes z$. Now we can use the identification theorem to say that, since $$ D_f(z,h) = \big[(I\otimes z) + (z \otimes I)\big]h = A(z) h$$ the matrix $$ A(z) = (I\otimes z) + (z \otimes I)$$ is the derivative of $z \otimes z$.

So in the case at hand, the same reasoning brings you to the correct derivative $$ \frac{\partial}{\partial x}(y-x)\otimes (y-x) = \big(I\otimes(x-y)\big) + \big((x-y) \otimes I\big)$$ which is what you found.

Check the other half of the Hessian: there has to be something wrong there!

2
On

The gradient that you found is correct. I re-derive it here to make this answer self-contained...

First note that the Kronecker product of two vectors can be expanded in two ways: $$a\otimes b = (I_a\otimes b)\,a = (a\otimes I_b)\,b$$ where $I_a$ is the identity matrix whose dimensions are compatible with the $a$ vector, while $I_b$ is compatible with the $b$ vector.

Define two new vectors $$\eqalign{ z &= x-y \quad\implies dz = dx\cr f &= z\otimes z \cr }$$ Then use the Kronecker expansion to calculate the differential and gradient of $f$. $$\eqalign{ df &= z\otimes dz + dz\otimes z = (z\otimes I + I\otimes z)\,dx \cr G=\frac{\partial f}{\partial x} &= (z\otimes I + I\otimes z) = (x-y)\otimes I + I\otimes(x-y) \cr }$$ Let $e_k$ denote the $k^{th}$ column of the $I$ matrix and $w={\rm vec}(I)$.
Use these to vectorize the $G$ matrix. $$\eqalign{ G &= (z\otimes I + I\otimes z), \quad M = \pmatrix{I\otimes e_1\cr I\otimes e_2\cr\vdots\cr I\otimes e_n} \cr g &= {\rm vec}(G) = \Big(M + w\otimes I\Big)\,z \cr }$$ Now find the differential and gradient of the $g$ vector. $$\eqalign{ dg &= \Big(M + w\otimes I\Big)\,dx \cr H = \frac{\partial g}{\partial x} &= \Big(M + w\otimes I\Big) \cr }$$ So that's the hessian in matrix form. The true Hessian is a 3rd order tensor.