Derivative of projection with respect to a parameter: $D_{a}: X(a)[X(a)^TX(a)]^{-1}X(a)^Ty$

531 Views Asked by At

Suppose we have a matrix $X(a)\in\mathbb{R}^{N\times K}$ where $N>K$ and $\left[X^TX \right]^{-1}$ is non-singular. The parameter $a$ is a scalar $a\in{\mathbb{R}}$. A vector $y\in\mathbb{R}^N$ is not a function of $a$. What is:

$$ \frac{d}{da}X(a)\left[X(a)^TX(a)\right]^{-1}X(a)^Ty $$

Other info:

  • It would be most useful for me if the answer is expressed in terms of $X(a)'$, the element-wise derivative of $X$ (along with $X(a)$ and $y$)
  • Highly related to derivative of a projection matrix. Could be I'm just not experienced at mixing calculus and matrices, but I'm not seeing the solution here as directly applicable to my problem.
  • Not particularly interested in edge cases. You may assume everything is well-conditioned, the derivative exists, etc.
1

There are 1 best solutions below

2
On BEST ANSWER

Write $a$-derivatives with primes. As $a$ is a scalar, properties such as the product rule are convenient. First note how to differentiate a matrix inverse:$$MN=I\implies O=(MN)^\prime=MN^\prime+M^\prime N\implies N^\prime=-M^{-1}M^\prime N,$$i.e. $(M^{-1})^\prime=-M^{-1}M^\prime M^{-1}$. So$$(X^TX)^{-1\prime}=-(X^TX)^{-1}(X^TX)^\prime(X^TX)^{-1}=-(X^TX)^{-1}(X^{\prime T}X+X^TX^\prime)(X^TX)^{-1}.$$Hence$$\begin{align}(X(X^TX)^{-1}X^Ty)^\prime&=X^\prime(X^TX)^{-1}X^Ty\\&-X(X^TX)^{-1}(X^{\prime T}X+X^TX^\prime)(X^TX)^{-1}X^Ty\\&+X(X^TX)^{-1}X^{\prime T}y\\&+X(X^TX)^{-1}X^Ty^\prime.\end{align}$$Edit: as @Matterhorn notes, the problem as stated satisfies $y^\prime=0$, allowing us to delete the last term.