Rewriting the solution of $\ell_2$-reweighted least-squares

54 Views Asked by At

Consider the problem

$$ \underset{x \in \Bbb R^n}{\operatorname{argmin}} \|Ax-y\|_2^2 + \lambda\|Wx\|_2^2 $$

where $A \in \Bbb R^{m \times n}$ with $m<n$, $W \in \Bbb R^{n \times n}$ is an diagonal matrix of weights and a scalar $\lambda > 0$, and a vector $y \in \Bbb R^m$ vector. In the studying material the solution was given as

$$ \hat{x} = W^2 A^T \left( A W^2 A^T + \lambda I \right)^{-1}y $$

By taking the gradient of the cost in the problem and setting it to $0$, however, the solution I get is

$$ \hat{x} = \left( A^T A + \lambda W^2 \right)^{-1}A^Ty $$

My guess is that the former can be achieved by rewriting the latter form using the matrix inversion lemma and the fact that $W$ is diagonal, such that an $n \times n$ inversion is replaced by an $m \times m$ inversion. However, I haven't been able to figure out how.

Can anybody point me in the right direction?

Edit: there was a typo in the given problem, where $W$ was used as a weight matrix instead of $W^{-1}$. By rewriting the correct problem as $$ \underset{x \in \Bbb R^n}{\operatorname{argmin}} \|Ax-y\|_2^2 + \lambda\|W^{-1}x\|_2^2 $$

and computing the new optimal $x$ by equating the gradient to 0 we get

$$ \hat{x} = \left( A^T A + \lambda W^{-2} \right)^{-1}A^Ty $$

Which can then be rewritten using the matrix inversion lemma as $$ \hat{x} = \left(\frac{1}{\lambda}W^2 - \frac{1}{\lambda}W^2 A^T \left( \frac{1}{\lambda}A W^2 A^T + I \right)^{-1}A\frac{1}{\lambda}W^2\right)A^Ty $$ .

By making the multiplication by $A^T$ explicit, grouping the $W^2A^T$ factor, and bringing one $\frac{1}{\labmda} inside the inverse we get

$$ \hat{x} = \frac{1}{\lambda}W^2A^T\left( I - \left( A W^2 A^T + \lambda I \right)^{-1}AW^2 A^T\right)y $$

which can further simplified by substituting the $ A W^2 A^T$ factor with its equivalent $ A W^2 A^T + \lambda I - \lambda I $ so that our optimal solution becomes

$$ \hat{x} = \frac{1}{\lambda}W^2A^T\left( I - \left( A W^2 A^T + \lambda I \right)^{-1}\left(AW^2 A^T + \lambda I - \lambda I\right)\right)y $$

by distributing the multiplication by the inverse term we then get

$$ \hat{x} = \frac{1}{\lambda}W^2A^T\left( I - I + \lambda \left( A W^2 A^T + \lambda I \right)^{-1}\right)y $$

which after some final simplifications gets us our desired answer

$$ \hat{x} = W^2 A^T \left( A W^2 A^T + \lambda I \right)^{-1}y $$