Optimization problem with Mahalanobis distance

1k Views Asked by At

This question is a follow-up of one of my previous questions: Optimizing a vector equation

Let $x$ and $b$ be two vectors of real numbers in k dimensional space.

Let $W$ be a k-by-k matrix of real numbers representing a transformation.

Let $\alpha$ be a scalar value.

We are looking for the optimal alpha, which minimizes the squared Mahalanobis distance between x and the multivariate distribution, $ N(b, \Sigma) $ scaled by $\alpha$:

$$ D = (x - \alpha b)^T (\alpha W)^{-1} (x - \alpha b) $$

The Mahalanobis distance:

$$ D = (x - \mu) \Sigma^{-1} (x - \mu) $$

As it can be seen, in our case, $\mu = \alpha b$ and $\Sigma = \alpha W$

I took the derivative of the Mahalanobis distance, according to Differentiating mahalanobis distance and applied the chain rule to get to $\alpha$:

$$ \eqalign { \frac {dD} {d \alpha} &= \frac {dD} {d \mu} \frac {d \mu} {d \alpha} + \frac {dD} {d \Sigma} \frac {d \Sigma} {d \alpha} \cr &= -2 \Sigma^{-1}(x - \mu) b - \Sigma^{-1} (x - \mu) (x - \mu) \Sigma^{-1} W \cr &= 0 \cr 2 \Sigma^{-1}(x - \mu) b &= -\Sigma^{-1} (x - \mu) (x - \mu) \Sigma^{-1} W}$$

This expression is reduced by multipliing by $\Sigma$ from the left and by expanding the $\Sigma^{-1}$ term on the rightmost side, getting

$$ 2 (x - \mu) b = - (x - \mu) (x - \mu) \alpha^{-1} $$

Expanding the $\mu$ terms on both sides and expanding the parentheses yields

$$ \eqalign { 2xb - 2 \alpha bb &= -(xx - 2 \alpha xb + \alpha^2 bb) \alpha^{-1} \cr 2xb - 2 \alpha bb &= -\alpha^{-1}xx + 2xb - \alpha bb}$$

This can be simplified by adding $2xb + \alpha bb$ to both sides and multipliing by -1 getting

$$ \eqalign { \alpha bb &= \alpha^{-1} xx \cr \alpha &= \sqrt { \frac {||x||^2} {||b||^2} } \cr \alpha &= \frac {||x||} {||b||}}$$

I have the strong intuition that I went wrong somewhere. Could someone please check my calculations?

1

There are 1 best solutions below

3
On BEST ANSWER

Let's use the product notation (:) for the trace, i.e. $$A:B={\rm tr}(A^TB)$$ For convenience, let's define new scalar, vector and matrix variables $$\eqalign{ \beta &= \alpha^{-1} &\implies d\beta=-\beta^2d\alpha \cr z &= \alpha b-x &\implies dz=b\,d\alpha \cr M &= W^{-1} &\implies \beta M=(\alpha W)^{-1} \cr }$$ Now we can find the differential and gradient of the distance directly $$\eqalign{ D &= \beta M:zz^T \cr dD &= \beta M:d(zz^T) + d\beta\,M:zz^T \cr &= 2\beta M:(dz\,z^T) - \beta^2M:zz^T\,d\alpha \cr &= (2\beta Mz:b - \beta^2M:zz^T)\,d\alpha \cr \frac{\partial D}{\partial\alpha} &= 2\beta Mz:b - \beta^2M:zz^T \cr &= 2\beta b^TMz - \beta^2z^TMz \cr }$$ Set the gradient to zero, and multiply by $\alpha^2$ $$\eqalign{ 2\alpha b^TMz &= z^TMz \cr 2\alpha b^TM(\alpha b-x) &= (\alpha b-x)^TM(\alpha b-x) \cr 2\alpha^2 b^TMb - 2\alpha b^TMx &= \alpha^2b^TMb -2\alpha b^TMx + x^TMx \cr \alpha^2 b^TMb &= x^TMx \cr }$$ Yielding this expression for the optimal parameter value $$\eqalign{ \alpha^2 &= \frac{x^TW^{-1}x}{b^TW^{-1}b} \cr\cr }$$ Although it was not stated, I assumed that $W$ is symmetric.