Background: Let $M^{n\times k}(\mathbb{R})$ denote the $n\times k$ matrices with real entries. For any smooth function $f: M^{n\times k}(\mathbb{R}) \to \mathbb{R}$, define the derivative $\frac{\partial f}{\partial A}$ by $$\left(\frac{\partial f}{\partial A}\right)_{i,j}:= \frac{\partial f}{\partial a_{ij}},$$
where $a_{i,j}$ is the entry of $A$ at the $i$-th row and $j$-th column. For any smooth curve $c:(-\epsilon,\epsilon)\to M^{n\times k}(\mathbb{R})$ with $c(0) = A, \dot c(0) = v$, $\frac{\partial f}{\partial A}$ satisfies: $$\frac{d}{dt}|_{t=0}f(c(t)) = \text{trace}\left(v^T \frac{\partial f}{\partial A}\right).$$
Question: Let $f: M^{n\times k}(\mathbb{R}) \to \mathbb{R}$ be defined by $$f(A):= \|(I - A(A^TA)^{-1}A^T) z\|^2,$$ where $z \in \mathbb{R}^n$ is a fixed vector and $\|\cdot\|$ denotes the Euclidean norm. I would like to compute $\frac{\partial f}{\partial A}$, at least at any $A$ which is full rank. Are there any tricks that can simplify this process, or is the easiest thing simply to compute each $\frac{\partial f}{\partial a_{i,j}}$ by "brute force"?
My attempt at a simpler solution: Let $c:(-\epsilon,\epsilon)\to M^{n\times k}(\mathbb{R})$ be a smooth curve with $c(0) = A, \dot c(0) = v$. I will assume that $A$ is full rank.
By the chain and product rules, we have $$\frac{d}{dt}|_{t=0}f(c(t)) = 2 z^T(I-A(A^TA)^{-1}A)\left[-v (A^TA)^{-1}A^T + A(A^TA)^{-1}(v^TA + A^Tv)(A^TA)^{-1}A^T - A(A^TA)^{-1}v^T\right]z, $$ where I have used the fact that for any smooth curve $b:(-\epsilon,\epsilon)\to GL(n,\mathbb{R})$, $\frac{d}{dt}|_{t=0}b^{-1}(t) = -b^{-1}(0)\dot b(0) b^{-1}(0)$.
To simplify notation, let $A^\dagger:= (A^T A)^{-1} A^T$ denote the Moore-Penrose pseudoinverse of $A$ and let $\Pi_{A_\perp}:= I - AA^\dagger$ be the orthogonal projection onto the orthogonal complement of the range of $A$. Substitution of this new notation yields: $$\frac{d}{dt}|_{t=0}f(c(t)) = 2 z^T\Pi_{A_\perp}\left[-v A^\dagger + (A^\dagger)^T(v^TA + A^Tv)A^\dagger - (A^\dagger)^T v^T\right]z. $$ Now, the kernel of $A^\dagger$ is equal to the orthogonal complement to the range of $A$, so the range of $(A^\dagger)^T$ is equal to the range of $A$. It follows that $\Pi_{A_\perp} (A^\dagger)^T = 0$. Thus we now have a simpler expression:
$$\frac{d}{dt}|_{t=0}f(c(t)) = -2 z^T\Pi_{A_\perp}v A^\dagger z = -2\text{trace}\left(z^T(A^\dagger)^Tv^T\Pi_{A_\perp}z\right) = -2\text{trace}\left(v^T \Pi_{A_{\perp}}zz^T(A^\dagger)^T\right).$$
Since $v \in M^{n\times k}(\mathbb{R})$ was arbitrary, it follows that
$$\frac{\partial f}{\partial A}(A) = -2 \Pi_{A_\perp} zz^T(A^\dagger)^T.$$
For convenience, define a new variable and its differential (borrowing a result from Harville's "Matrix Algebra From a Statistician's Perspective") $$\eqalign{ M &= I-AA^\dagger \cr dM &= -d(AA^\dagger) \cr &= -M\,dA\,A^\dagger - (A^\dagger)^T\,dA^T\,M^T \cr &= -2\,{\rm sym}(M\,dA\,A^\dagger) \cr }$$ Then write the function in terms of the Frobenius (:) Inner Product and find its differential $$\eqalign{ f &= Mz:Mz \cr df &= 2\,Mz:dM\,z \cr &= 2\,Mzz^T:dM \cr &= -2\,Mzz^T:2\,{\rm sym}(M\,dA\,A^\dagger) \cr &= -2\,(2\,{\rm sym}(Mzz^T)):M\,dA\,A^\dagger \cr &= -2\,(Mzz^T+zz^TM^T):M\,dA\,A^\dagger \cr &= -2\,M^T(Mzz^T+zz^TM^T)(A^\dagger)^T:dA \cr }$$ Since $df=\big(\frac{\partial f}{\partial A}:dA\big),\,$ the gradient must be $$\eqalign{ \frac{\partial f}{\partial A} &= -2\,M^T(Mzz^T+zz^TM^T)(A^\dagger)^T \cr }$$ Since $M$ is an orthoprojector, we know several additional facts $$\eqalign{ M^2 &= M \cr M^T &= M \cr M(A^\dagger)^T &= 0 \cr MA &= 0 \cr }$$ which can be used to simplify the result to $$\eqalign{ \frac{\partial f}{\partial A} &= -2\,Mzz^T(A^\dagger)^T \cr }$$