Explicit formula for a 'translated' least square problem

80 Views Asked by At

When penalizing the problem $$\begin{array}{r@{} l@{}} \text{minimize }& {c}^{T} x \\ \text{subject to } & {A} x = {b}, \end{array}$$ it appears the following optimization problem: $$\text{minimize } c^{T} x + \rho \|Ax-b\|^2, $$ for some $\rho>0.$ Since this function is coercive, it achieves a minimum, hence it has a minimizer, say $x^{*}$ -- see this link, for example. What is an expression for this minimizer $x^{*}$?

2

There are 2 best solutions below

7
On BEST ANSWER

Let $f(x) = c^Tx + \rho \|Ax - b\|^2$, the gradient of this function is given by $$\nabla f(x) = c - 2\rho A^Tb + 2\rho A^TAx.$$ Setting this to zero and solving we have \begin{align} 0 = c - 2\rho A^Tb + 2\rho A^TAx & \iff A^TAx = A^Tb - \frac{1}{2\rho}c \\ & \iff x = (A^TA)^{-1}\left (A^T b - \frac{1}{2\rho}c \right ) \end{align} Which is the solution assuming that $A^TA$ is invertible (and this is guaranteed when the columns of $A$ are linearly independent).

A great resource for computing gradients involving vectors and matrices is this website.

Edit: If $A^TA$ is not invertible there are two possibilities for what can happen, and what you can do about it. Some important background facts to remember are

  • $A^TA$ is invertible if and only if the columns of $A$ are linearly independent (See this post for further discussion). In particular this implies that the matrix $A$ has a non-trivial null-space.

  • If $U \subset \mathbb{R}^d$ is a subspace and $U^\perp = \{x \in \mathbb{R}^d \ | x^Tu = 0 \ \forall \ u \in U \}$ is the orthogonal subspace to $U$, then $x$ can be written in a unique way as $x = u + v$ where $u \in U$ and $v \in U^\perp$.

With these in mind, let $c^\perp$ be the space of vectors that are orthogonal to $c$ and let $N$ be the null-space of $A$. Using $N$ and $c^\perp$ we can address the two possible cases:

  1. $N \subseteq c^\perp$. In this case we can write uniquely any vector $x$ in the form $x = y + z$ where $y \in N$ and $z \in N^\perp \subset c^\perp$. In this case the objective becomes $$ c^Tx + \rho\|Ax - b\|^2 = c^T(y + z) + \rho\|A(y+z) - b\|^2 = c^Tz + \rho\|Az - b\|^2 $$ where we used that $y \in N$ which implies $Ay = 0$ and also $y \in c^\perp$ and therefore $c^Ty = 0$. This shows that the only vectors that matter are those that belong to $N^\perp$, and that there is no unique minimizer since we can add any $y \in N$ without changing the objective.

This is not all that can be done in this case however. We can still find a minimizer by performing a trick which reduces the problem to the case above where $A^TA$ is invertible. Assume that the original problem takes in vectors of $\mathbb{R}^n$ and that $\text{dim}(N^\perp) = n'$. Now introduce a new matrix $B \in \mathbb{R}^{n \times n'}$ such that the range of $B$ equals $N^\perp$. This is most easily done by letting $B$ be the transpose of $n'$ linearly independent rows of $A$.

Using this matrix solve the new problem $$\min_{z \in \mathbb{R}^{n'}} c^T(Bz) + \rho \|ABz - b\|^2 = \min_{z \in \mathbb{R}^{n'}} (B^Tc)z + \rho \|(AB)z - b\|^2 = $$ which is of the same form as the original problem with $c' = B^Tc$ and $A' = AB$. Importantly in this new problem it must be the case that $(AB)^T(AB) = B^TA^TAB$ is invertible since $AB$ has no null space (since $Bz \in N^\perp \implies ABz \neq 0$, unless $z = 0$)

To recover a minimizer $x^*$ in the original problem from a minimizer of the new problem $z^*$, they are related by $x^* = Bz^*$. Interestingly, this $x^*$ is the unique minimizer which is contained in $N^\perp$.

The intuition for why this approach works is that since the null space does not contribute to the objective, we really just want to parameterize the subspace that is orthogonal to the null space. This is done by using $\mathbb{R}^{n'}$ and the matrix $B$ which transforms $\mathbb{R}^{n'}$ into $N^\perp$.

  1. $N \not \subseteq c^\perp$. In this case there is no minimum because there is a vector $y$ such that $Ay = 0$ and $c^Ty \neq 0$. Multiplying this by a scalar $\alpha$ gives $$c^T(\alpha y) + \|A(\alpha y) - b\|^2 = \alpha c^Ty + \|0 - b\|^2 = \alpha c^Ty + \|b\|^2$$ and since $c^Ty \neq 0$ we can choose $\alpha$ as large as we want (and either positive or negative depending on the sign of $c^Ty$) and make the objective arbitrarily negative. Overall in this case there is no minimizer.
0
On

Let $f(x) = c^Tx + \rho \|Ax - b\|^2$, for all $x \in \mathbb{R}^n$. This function is convex, hence finding a point zeroing the gradient is enough -- it is also necessary -- to find a solution. The gradient of this function is given by $$\nabla f(x) = c - 2\rho A^Tb + 2\rho A^TAx.$$ Setting this to zero, we have to find an $x$ such that \begin{align} 0 = c - 2\rho A^Tb + 2\rho A^TAx & \iff A^TAx = A^Tb - \frac{1}{2\rho}c. \end{align}

As commented by others, if $c \not\in (\text{kern}(A))^{\perp}, $ thus we can find an $v \in \text{kern} A$ such that $c^{T} v < 0$. Hence, $f(n v) = n c^{T} v + \rho \|b\|^2,$ which converges to $-\infty$ as $n$ goes to $\infty$. Hence, for now on, we can assume, to find a solution, that $c \in (\text{kern}(A))^{\perp} = (\text{range}(A^{T}))$. Thus, $c= A^{T} x^{0}$, for some $x^0$ -- one can use least squares in this problem to determine whether there is or not such a $x_0$ point. Hence, after this step, we need to find $x$ such that, $$A^TAx = A^T\left(b - \frac{1}{2\rho} x^{0}\right), $$ which can be solved solving the least squares problem: $$\min \left\|A x -\left(b-\dfrac{1}{2\rho} x^{0}\right)\right\|^2.$$