Derivative of a Euclidean distance matrix

62 Views Asked by At

Let's say I have $X \in \mathbb{R}^{n \times d}$, a collection of $n$ row vectors of size $d$. We can calculate an $n \times n$ distance matrix, $D$, from $X$, where each $\{i,j\}$ entry denotes the squared L2 distance between the vectors $X_i$ and $X_j$. That is,

$$ D_{ij} = \|X_{i} - X_{j}\|_2^2 = \sum_{d}(X_{id} - X_{jd})^2 $$

I wish to find the derivative of D with respect to X. Since this is a matrix by matrix operation, I'm aware that this would probably resolve to a fourth-order tensor. This is what I tried so far ($\delta_{ij}$ is the Kronecker delta):

$$ \frac{\partial D_{ij}}{\partial X_{kl}} = \frac{\sum_{d}(X_{id} - X_{jd})^2}{\partial X_{kl}} = 2\sum_{d}(X_{id} - X_{jd}) \left ( \frac{\partial X_{id}}{\partial X_{kl}} - \frac{\partial X_{jd}}{\partial X_{kl}} \right ) $$

$$ = 2\sum_{d}(X_{id} - X_{jd})(\delta_{ik}\delta_{dl} - \delta_{jk}\delta_{dl}) \\ = 2\sum_{d}(X_{id} - X_{jd})(\delta_{ik} - \delta_{jk}) \delta_{dl} \\ = 2\sum_{d}(X_{id}\delta_{ik} - X_{jd}\delta_{ik} - X_{id}\delta_{jk} + X_{jd}\delta_{jk})\delta_{dl}\\ = 2\sum_{d}(X_{kd} - X_{jd}\delta_{ik} - X_{id}\delta_{jk} + X_{kd})\delta_{dl}\\ = 2(2X_{kd}\delta_{dl} - X_{jd}\delta_{ik}\delta_{dl} - X_{id}\delta_{jk}\delta_{dl})\\ = 4X_{kl} - 2(X_{jl}\delta_{ik} + X_{il}\delta_{jk}) $$

I'm not entirely sure that what I've derived is correct. The sum between $X_{kl}$ and the fourth order terms seem strange, and the fourth order terms are also hard to interpret; I think the fourth order terms can be interpreted like this; if $Y_{ijkl} = X_{jl}\delta_{ik}$: $$ Y_{ijkl} = \begin{cases} X_{jl}, & \text{if}\ i=k \\ 0, & \text{otherwise} \end{cases} $$

It would be great if someone could point me to some way to get this derivative (or perhaps some other vectorized solution).

1

There are 1 best solutions below

3
On BEST ANSWER

$ \def\d{\delta} \def\e{\bar{e}} \def\h{\odot} \def\w{\widehat} \def\o{{\tt1}} \def\E{\w{E}} \def\Eij{\E_{ij}} \def\Ekl{\E_{kl}} \def\Elk{\E_{lk}} \def\BR#1{\Big[#1\Big]} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $The distance matrix can be written using the Hadamard product $(\h)$ and the all-ones matrix $J$ (with the same dimensions as $X)$ $$\eqalign{ D = (X\h X)^TJ + J^T(X\h X) - 2X^TX \\ }$$ The component-wise self-gradient of a matrix is $$\eqalign{ \grad{X}{X_{kl}} &= \e_k\e_l^T \;=\; \Ekl \\ \grad{X^T}{X_{kl}} &= \gradLR{X}{X_{kl}}^T \,=\; \Elk \\ }$$ where $\e_i$ denotes the $i^{th}$ Cartesian basis vector and $\E_{kl}$ is the matrix whose components are all zero, except for the $(k,l)$ component which equals one.

The gradients wrt the components of $X$ can then be calculated as $$\eqalign{ \grad{D}{X_{kl}} &= 2(X\h\Ekl)^TJ + 2J^T(X\h\Ekl) - 2\Elk X - 2X^T\Ekl \\ }$$


Extracting the individual components of this expression gets uglier.

First note the following equalities $({\rm where}\ \o$ is the all-ones vector) $$\eqalign{ \Eij\,\e_k &= \e_i\e_j^T\e_k \:=\: \e_i\,\d_{jk} \\ \e_k^T\Eij &= \e_k^T\e_i\e_j^T \:=\: \d_{ki}\,\e_j^T \\ J\e_k &= \o, \qquad \e_k^TJ = \o^T \\ (A\h B)\,\e_k &= A\e_k\h B \e_k \\ \e_k^T(A\h B) &= \e_k^TA\h \e_k^TB \\ }$$

Now extract the components of the previously derived expression $$\eqalign{ \grad{D_{ij}}{X_{kl}} &= \e_i^T\gradLR{D}{X_{kl}}\e_j \\ &= 2(\e_i^TX\h\e_i^T\Ekl)^T\o + 2\o^T(X\e_j\h\Ekl\e_j) - 2\d_{il}(\e_k^TX\e_j) - 2(\e_i^TX^T\e_k)\d_{lj} \\ &= 2\d_{ik}(\e_i^TX\h\e_l^T)^T\o + 2\d_{lj}\o^T(X\e_j\h\e_k) - 2\d_{il}X_{kj} - 2\d_{lj}X_{ki} \\ &= 2\d_{ik}X_{il} + 2\d_{lj}X_{kj} - 2\d_{il}X_{kj} - 2\d_{lj}X_{ki} \\ }$$ Depending on your mathematical background, you may be tempted to apply the Einstein summation convention and sum over the repeated indices in the first two terms.

To avoid this temptation, let's introduce the third-order tensor $$ \def\H{{\cal H}} \H_{ijk} = \begin{cases} \o\qquad{\rm if}\;\;i=j=k \\ 0\qquad{\rm otherwise} \\ \end{cases} $$ and rewrite the result in a form consistent with the summation convention and also balance the indices on the RHS by including explicit factors of unity (i.e. components of the $\o$ vector) $$\eqalign{ \grad{D_{ij}}{X_{kl}} &= 2\d_{pk}\H_{piq}X_{ql}\o_{j} + 2\d_{lp}\H_{pjq}X_{kq}\o_{i} - 2\d_{il}X_{kj} - 2\d_{lj}X_{ki} \\ &= 2\,\H_{ikq}X_{ql}\o_{j} + 2X_{kq}\H_{qlj}\o_{i} - 2\d_{il}X_{kj} - 2\d_{lj}X_{ki} \\ }$$