Consider the problem
$$ \underset{x \in \Bbb R^n}{\operatorname{argmin}} \|Ax-y\|_2^2 + \lambda\|Wx\|_2^2 $$
where $A \in \Bbb R^{m \times n}$ with $m<n$, $W \in \Bbb R^{n \times n}$ is an diagonal matrix of weights and a scalar $\lambda > 0$, and a vector $y \in \Bbb R^m$ vector. In the studying material the solution was given as
$$ \hat{x} = W^2 A^T \left( A W^2 A^T + \lambda I \right)^{-1}y $$
By taking the gradient of the cost in the problem and setting it to $0$, however, the solution I get is
$$ \hat{x} = \left( A^T A + \lambda W^2 \right)^{-1}A^Ty $$
My guess is that the former can be achieved by rewriting the latter form using the matrix inversion lemma and the fact that $W$ is diagonal, such that an $n \times n$ inversion is replaced by an $m \times m$ inversion. However, I haven't been able to figure out how.
Can anybody point me in the right direction?
Edit: there was a typo in the given problem, where $W$ was used as a weight matrix instead of $W^{-1}$. By rewriting the correct problem as $$ \underset{x \in \Bbb R^n}{\operatorname{argmin}} \|Ax-y\|_2^2 + \lambda\|W^{-1}x\|_2^2 $$
and computing the new optimal $x$ by equating the gradient to 0 we get
$$ \hat{x} = \left( A^T A + \lambda W^{-2} \right)^{-1}A^Ty $$
Which can then be rewritten using the matrix inversion lemma as $$ \hat{x} = \left(\frac{1}{\lambda}W^2 - \frac{1}{\lambda}W^2 A^T \left( \frac{1}{\lambda}A W^2 A^T + I \right)^{-1}A\frac{1}{\lambda}W^2\right)A^Ty $$ .
By making the multiplication by $A^T$ explicit, grouping the $W^2A^T$ factor, and bringing one $\frac{1}{\labmda} inside the inverse we get
$$ \hat{x} = \frac{1}{\lambda}W^2A^T\left( I - \left( A W^2 A^T + \lambda I \right)^{-1}AW^2 A^T\right)y $$
which can further simplified by substituting the $ A W^2 A^T$ factor with its equivalent $ A W^2 A^T + \lambda I - \lambda I $ so that our optimal solution becomes
$$ \hat{x} = \frac{1}{\lambda}W^2A^T\left( I - \left( A W^2 A^T + \lambda I \right)^{-1}\left(AW^2 A^T + \lambda I - \lambda I\right)\right)y $$
by distributing the multiplication by the inverse term we then get
$$ \hat{x} = \frac{1}{\lambda}W^2A^T\left( I - I + \lambda \left( A W^2 A^T + \lambda I \right)^{-1}\right)y $$
which after some final simplifications gets us our desired answer
$$ \hat{x} = W^2 A^T \left( A W^2 A^T + \lambda I \right)^{-1}y $$