What is $\nabla_A \epsilon^TA^T(AA^T)^{-1}A\epsilon$?

Question

What is $\nabla_A \epsilon^TA^T(AA^T)^{-1}A\epsilon$?

125 Views Asked by Bumbble Comm At 11 Apr 2026 - 6:34

Let $q$ be the multivariate Normal distribution $\mathcal{N}(\mu, \Sigma)$ and $x$ be a sample from $q$. Hence, $x$ can be written as $$x = \mu + A\epsilon \,, \Sigma = AA^T\,, \epsilon \sim \mathcal{N}(0, I)$$ and $I$ represents the identity matrix. I am trying to compute $\nabla_{A}\log{q(x)}$.

Now, $$ \nabla_A \log{q(x)} = -\frac{1}{2}\nabla_A \log\det(AA^T) - \frac{1}{2}\nabla_A \epsilon^TA^T(AA^T)^{-1}A\epsilon$$

The first gradient evaluates to $-A^T(AA^T)^{-1}$ (with help from stack exchange answers). However, since I don't have a formal training in graduate level calculus (I am a CS student), I don't know how to evaluate the gradient of the second term. Can anybody help?

After reading up a bit about matrix calculus, this is my effort.

Let $B = A^T(AA^T)^{-1}A$ and $(B + \delta B) = (A+\delta A)^T((A+\delta A)(A+\delta A)^T)^{-1}(A+\delta A)$ This implies $$AB = A$$ and $$(A+\delta A)(B + \delta B) = A+\delta A$$ Expanding the last equation, we get $$A\delta B = \delta A(I-A^T(AA^T)^{-1}A) $$

I am not sure how to proceed after this. Thanks

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Note that $$A^T(AA^T)^{-1}=A^+$$ So we can write the function in terms of the pseudo-inverse and the Frobenius (:) Inner Product $$\eqalign{ f &= ee^T:A^+A \cr }$$ Now we can borrow a result from Harville's "Matrix Algebra from a Statistician's Perspective" $$\eqalign{d(A^+A)=2\,{\rm sym}\big(A^+\,dA\,(I-A^+A)\big)}$$ to find the differential of the function $$\eqalign{ df &= 2ee^T:{\rm sym}\big(A^+\,dA\,(I-A^+A)\big) \cr &= 2ee^T:\big(A^+\,dA\,(I-A^+A)\big) \cr &= 2\,(A^+)^Tee^T(I-A^+A):dA \cr }$$ Since $df=\big(\frac{\partial f}{\partial A}:dA\big),\,$ the gradient is $$\eqalign{ \frac{\partial f}{\partial A} &= 2\,(A^+)^Tee^T(I-A^+A) \cr &= 2\,(AA^T)^{-1}A\,\,ee^T\Big(I-A^T(AA^T)^{-1}A\Big) \cr }$$
Update
I just noticed that you wrote the gradient of the first term as $$A^T(AA^T)^{-1}$$ whereas I would write it as $$(AA^T)^{-1}A$$ so you are using a convention which is the transpose of my usual convention.

Which is fine, but you will need to use the transpose of the result above to be consistent with your previous derivation.

What is $\nabla_A \epsilon^TA^T(AA^T)^{-1}A\epsilon$?

After reading up a bit about matrix calculus, this is my effort.

There are 1 best solutions below

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions