Considering $x,y$ are vectors and $\mu,\Sigma$ are mean (vector) and covariance (matrix); how to solve:
$(1): \displaystyle \frac{\partial }{\partial X}{[(y-x-\mu_N)^T\Sigma_N^{-1}(y-x-\mu_N) + (x-\mu_X)^T \Sigma_X^{-1}(x-\mu_X))]}=0$
I have the final answer which first simplifies as:
$(2):-\Sigma_N^{-1}(y-x-\mu_N)+\Sigma_X^{-1}(x-\mu_X)=0$
And then:
$(3): x=(\Sigma_N^{-1}+\Sigma_N^{-1})^{-1}(\Sigma_N^{-1}(y-\mu_N)+\Sigma_X^{-1}\mu_X) $
from $(2)$ to $(3)$ is obvious. But when I simplify $(1)$, I can't get to $(2)$ and I have extra terms with $x^T$ in my equation before taking the derivative. As I know (reference1, reference2):
$\displaystyle \frac{\partial }{\partial x}x^T={[\displaystyle \frac{\partial }{\partial x}x]}^T$
so those extra terms can not be eliminated. If I eliminate those terms with only $x^T$ after taking derivative, I exactly get to $(2)$ then $(3)$.
Any idea what's wrong?
If $(\mu_X, \mu_N)$ are vectors, and you are adding them to $(X,Y)$ -- then the latter must also be vectors (not matrices). So I'll denote them by $(x,y)$.
Define a few variables for convenience, and ease of typing $$\eqalign{ A &= \Sigma_N^{-1} \cr B &= \Sigma_X^{-1} \cr a &= y-x-\mu_N \cr b &= x-\mu_X \cr }$$ In terms of these variables, the function can be written as $$\eqalign{ f &= a^TAa + b^TBb \cr &= A:aa^T + B:bb^T }$$ where the colons denote the Frobenius product.
Taking the differential $$\eqalign{ df &= A:d(aa^T) + B:d(bb^T) \cr &= A:(da\,a^T+a\,da^T) + B:(db\,b^T+b\,db^T) \cr &= (A+A^T)\,a:da + (B+B^T)\,b:db \cr &= 2A\,a:da + 2B\,b:db \cr &= 2\,A\,a:(-dx) + 2\,B\,b:dx \cr }$$ Since $df=\big(\frac{\partial f}{\partial x}:dx\big),\,$ the gradient is $$\eqalign{ \frac{\partial f}{\partial x} &= 2\,(B\,b-A\,a) \cr\cr }$$ Setting this to zero yields equation (2).