Derivation wrt a vector variable: what happens to transpose of the vector?

226 Views Asked by At

Considering $x,y$ are vectors and $\mu,\Sigma$ are mean (vector) and covariance (matrix); how to solve:

$(1): \displaystyle \frac{\partial }{\partial X}{[(y-x-\mu_N)^T\Sigma_N^{-1}(y-x-\mu_N) + (x-\mu_X)^T \Sigma_X^{-1}(x-\mu_X))]}=0$

I have the final answer which first simplifies as:

$(2):-\Sigma_N^{-1}(y-x-\mu_N)+\Sigma_X^{-1}(x-\mu_X)=0$

And then:

$(3): x=(\Sigma_N^{-1}+\Sigma_N^{-1})^{-1}(\Sigma_N^{-1}(y-\mu_N)+\Sigma_X^{-1}\mu_X) $

from $(2)$ to $(3)$ is obvious. But when I simplify $(1)$, I can't get to $(2)$ and I have extra terms with $x^T$ in my equation before taking the derivative. As I know (reference1, reference2):

$\displaystyle \frac{\partial }{\partial x}x^T={[\displaystyle \frac{\partial }{\partial x}x]}^T$

so those extra terms can not be eliminated. If I eliminate those terms with only $x^T$ after taking derivative, I exactly get to $(2)$ then $(3)$.

Any idea what's wrong?

1

There are 1 best solutions below

2
On BEST ANSWER

If $(\mu_X, \mu_N)$ are vectors, and you are adding them to $(X,Y)$ -- then the latter must also be vectors (not matrices). So I'll denote them by $(x,y)$.

Define a few variables for convenience, and ease of typing $$\eqalign{ A &= \Sigma_N^{-1} \cr B &= \Sigma_X^{-1} \cr a &= y-x-\mu_N \cr b &= x-\mu_X \cr }$$ In terms of these variables, the function can be written as $$\eqalign{ f &= a^TAa + b^TBb \cr &= A:aa^T + B:bb^T }$$ where the colons denote the Frobenius product.

Taking the differential $$\eqalign{ df &= A:d(aa^T) + B:d(bb^T) \cr &= A:(da\,a^T+a\,da^T) + B:(db\,b^T+b\,db^T) \cr &= (A+A^T)\,a:da + (B+B^T)\,b:db \cr &= 2A\,a:da + 2B\,b:db \cr &= 2\,A\,a:(-dx) + 2\,B\,b:dx \cr }$$ Since $df=\big(\frac{\partial f}{\partial x}:dx\big),\,$ the gradient is $$\eqalign{ \frac{\partial f}{\partial x} &= 2\,(B\,b-A\,a) \cr\cr }$$ Setting this to zero yields equation (2).