Why does this optimization problem have a closed form solution that resembles least squares so much?

744 Views Asked by At

Consider the following optimization problem

$$\min_{\mathbf{Q}} \sum\limits_{i=1}^{n}{\|{{\mathbf{b}}_{i}}-{{\mathbf{Q}}^{T}}{{\mathbf{x}}_{i}}\|^2}+\lambda \|\mathbf{Q}\|^{2}$$

where $\mathbf{b_i} $ is an $r$-dimensional vector, $\mathbf{Q}$ is an $n \times r$ matrix and $\mathbf{x_i} $ is an $n$-dimensional vector. The closed form solution is

$$\mathbf{Q}={{(\mathbf{S}{{\mathbf{S}}^{T}}+\lambda \mathbf{I})}^{-1}}\mathbf{S}{{\mathbf{B}}^{T}}$$

Why does it resemble the least squares solution so much? How to conduct that?

2

There are 2 best solutions below

3
On BEST ANSWER

Given $\mathrm A \in \mathbb R^{m \times n}$ and $\mathrm B \in \mathbb R^{m \times p}$, we form a matrix equation in $\mathrm X \in \mathbb R^{n \times p}$

$$\mathrm A \mathrm X = \mathrm B$$

If we want to find the least-squares solution, then we minimize $\| \mathrm A \mathrm X - \mathrm B \|_F$. However, if we do not want the norm of $\mathrm X$ to become too large, then we minimize the following objective function

$$\begin{array}{rl} \| \mathrm A \mathrm X - \mathrm B \|_F^2 + \lambda \|\mathrm X\|_F^2 &= \mbox{tr} ((\mathrm A \mathrm X - \mathrm B)^T (\mathrm A \mathrm X - \mathrm B)) + \lambda \,\mbox{tr} (\mathrm X^T \mathrm X)\\ &= \mbox{tr} (\mathrm X^T (\mathrm A^T \mathrm A + \lambda \mathrm I_n) \mathrm X - \mathrm B^T \mathrm A \mathrm X - \mathrm X^T \mathrm A^T \mathrm B + \mathrm B^T \mathrm B)\end{array}$$

where $\lambda > 0$. Differentiating with respect to $\mathrm X$, we obtain

$$2 (\mathrm A^T \mathrm A + \lambda \mathrm I_n) \mathrm X - 2 \mathrm A^T \mathrm B$$

Finding where the derivative vanishes, we obtain the matrix equation

$$(\mathrm A^T \mathrm A + \lambda \mathrm I_n) \mathrm X = \mathrm A^T \mathrm B$$

If $\lambda > 0$, then $\mathrm A^T \mathrm A + \lambda \mathrm I_n$ is always invertible. Hence, the unique minimizer is

$$\hat{\mathrm X} := (\mathrm A^T \mathrm A + \lambda \mathrm I_n)^{-1} \mathrm A^T \mathrm B$$

2
On

First for convenience you could just stuff the ${\bf x}_i$ as columns into a $\bf X$ matrix and for ${\bf b}_i$ into ${\bf B}$ your objective function will become:

$$\min_{\bf Q}\|{{\bf B - Q}^T\bf X}\| + \lambda\|{\bf Q}\|$$

Where the norm is taken as the Frobenius-norm which is equivalent to least squares norm for matrices. Now we can start with the vectorization. If we take a look at the matrix equations (replacing ${\bf X}$ for ${\bf Y}$ to avoid confusion), we see the following

$${\bf AYB = C} \Leftrightarrow ({\bf B}^T {\bf \otimes A}) \, \mbox{vec} ({\bf Y}) = \mbox{vec} ({\bf C})$$

Now how can we substitute $\bf Q$ and $\bf X$ to for $\bf A,Y,B$ to get something we can use? Well we want to solve for $\bf Q$, so it should go into $\bf Y$. The $\bf X$ multiplication is from the right so it should be $\bf B$ and what remains is $\bf A$ from the left which must then be an identity matrix.

Now we need to express transposition of $\bf Q$ as a matrix multiplication - or we could change the definition of $\bf Q$ and just take the transpose after having solved it. Whatever is most convenient for us.