What is $\frac{\partial{w}}{\partial{\alpha}}$ for $w=(X^\top X + \alpha \boldsymbol{I})^{-1} X^\top y$ where X is an $N \times D$ matrix, y is an N dimensional vector, $\boldsymbol{I}$ is an identity matrix of size $D \times D$ and $\alpha$ is a scalar?
edit: Actually I was trying to differentiate $\mathcal{F} (\hat{w}(\alpha)) = (\boldsymbol{y}-X\hat{\boldsymbol{w}})^\top (\boldsymbol{y}-X\hat{\boldsymbol{w}})+ \alpha (||{\hat{\boldsymbol{w}}||}^2 - c^2)$ w.r.t $\alpha$. Can I do $\dfrac{\partial{\mathcal{F} (\hat{w}(\alpha))}}{\partial{\alpha}}= \dfrac{\partial{\mathcal{F} (\hat{w}(\alpha))}}{\partial{\hat{w}}}\times \dfrac{\partial{\hat{w}}}{\partial{\alpha}}$?
if I do so, then $\dfrac{\partial{\mathcal{F} (\hat{w}(\alpha))}}{\partial{\alpha}}= (-2X^\top y + 2X^\top X \hat{w} )\dfrac{\partial{\hat{w}}}{\partial{\alpha}} + (||{\hat{\boldsymbol{w}}||}^2 - c^2)+ (2\alpha\hat{w}) \dfrac{\partial{\hat{w}}}{\partial{\alpha}} $
But the dimension of $\dfrac{\partial{\hat{w}}}{\partial{\alpha}}$ is $D \times 1$ since its expression is $-(X^\top X +\alpha \boldsymbol{I})^{-1}\boldsymbol{\hat{w}}$ and that of $\boldsymbol{\hat{w}}$ is also $D \times 1$ so they can't be multiplied in the order seen in the third term.
Is there a problem with the chain rule?
Define the matrix variable $$\eqalign{A &= X^TX+\alpha I \cr dA &= I\,d\alpha}$$ Write the function in terms of this new variable, then find its differential and gradient. $$\eqalign{ \def\c#1{\color{red}{#1}} \def\a{\alpha} \def\A{A^{-1}} \def\F{\cal F} w &= \A X^Ty \\ dw &= d\A X^Ty \\ &= -\big(\A\,dA\,\A\big)X^Ty \\ &= -\A\,dA\,w \\ &= -\A w\,\,d\a \\ \frac{\partial w}{\partial\a} &= -\A w \\ }$$ For typing convenience, define the vector variable $$\eqalign{ z = (Xw-y) \quad\implies\quad dz = X\:dw }$$ Now we are ready to differentiate the main function $$\eqalign{ \F &= z^Tz + \a w^Tw - \a c^2 \\ d\F &= 2z^T\c{dz} + 2\a w^Tdw \\ &= 2\,z^T\c{X\:dw} + 2\,\a w^Tdw \\ &= 2\,(z^TX + \a w^T)\:\c{dw} \\ &= 2\,(z^TX + \a w^T)\:(\c{-\A w\:d\a}) \\ &= -2\,(X^Tz + \a w)^T\A w\:d\a \\ \frac{d\F}{d\a} &= -2\,(X^Tz + \a w)^T\A w \\ }$$