When penalizing the problem $$\begin{array}{r@{} l@{}} \text{minimize }& {c}^{T} x \\ \text{subject to } & {A} x = {b}, \end{array}$$ it appears the following optimization problem: $$\text{minimize } c^{T} x + \rho \|Ax-b\|^2, $$ for some $\rho>0.$ Since this function is coercive, it achieves a minimum, hence it has a minimizer, say $x^{*}$ -- see this link, for example. What is an expression for this minimizer $x^{*}$?
Explicit formula for a 'translated' least square problem
80 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
Let $f(x) = c^Tx + \rho \|Ax - b\|^2$, for all $x \in \mathbb{R}^n$. This function is convex, hence finding a point zeroing the gradient is enough -- it is also necessary -- to find a solution. The gradient of this function is given by $$\nabla f(x) = c - 2\rho A^Tb + 2\rho A^TAx.$$ Setting this to zero, we have to find an $x$ such that \begin{align} 0 = c - 2\rho A^Tb + 2\rho A^TAx & \iff A^TAx = A^Tb - \frac{1}{2\rho}c. \end{align}
As commented by others, if $c \not\in (\text{kern}(A))^{\perp}, $ thus we can find an $v \in \text{kern} A$ such that $c^{T} v < 0$. Hence, $f(n v) = n c^{T} v + \rho \|b\|^2,$ which converges to $-\infty$ as $n$ goes to $\infty$. Hence, for now on, we can assume, to find a solution, that $c \in (\text{kern}(A))^{\perp} = (\text{range}(A^{T}))$. Thus, $c= A^{T} x^{0}$, for some $x^0$ -- one can use least squares in this problem to determine whether there is or not such a $x_0$ point. Hence, after this step, we need to find $x$ such that, $$A^TAx = A^T\left(b - \frac{1}{2\rho} x^{0}\right), $$ which can be solved solving the least squares problem: $$\min \left\|A x -\left(b-\dfrac{1}{2\rho} x^{0}\right)\right\|^2.$$
Let $f(x) = c^Tx + \rho \|Ax - b\|^2$, the gradient of this function is given by $$\nabla f(x) = c - 2\rho A^Tb + 2\rho A^TAx.$$ Setting this to zero and solving we have \begin{align} 0 = c - 2\rho A^Tb + 2\rho A^TAx & \iff A^TAx = A^Tb - \frac{1}{2\rho}c \\ & \iff x = (A^TA)^{-1}\left (A^T b - \frac{1}{2\rho}c \right ) \end{align} Which is the solution assuming that $A^TA$ is invertible (and this is guaranteed when the columns of $A$ are linearly independent).
A great resource for computing gradients involving vectors and matrices is this website.
Edit: If $A^TA$ is not invertible there are two possibilities for what can happen, and what you can do about it. Some important background facts to remember are
$A^TA$ is invertible if and only if the columns of $A$ are linearly independent (See this post for further discussion). In particular this implies that the matrix $A$ has a non-trivial null-space.
If $U \subset \mathbb{R}^d$ is a subspace and $U^\perp = \{x \in \mathbb{R}^d \ | x^Tu = 0 \ \forall \ u \in U \}$ is the orthogonal subspace to $U$, then $x$ can be written in a unique way as $x = u + v$ where $u \in U$ and $v \in U^\perp$.
With these in mind, let $c^\perp$ be the space of vectors that are orthogonal to $c$ and let $N$ be the null-space of $A$. Using $N$ and $c^\perp$ we can address the two possible cases:
This is not all that can be done in this case however. We can still find a minimizer by performing a trick which reduces the problem to the case above where $A^TA$ is invertible. Assume that the original problem takes in vectors of $\mathbb{R}^n$ and that $\text{dim}(N^\perp) = n'$. Now introduce a new matrix $B \in \mathbb{R}^{n \times n'}$ such that the range of $B$ equals $N^\perp$. This is most easily done by letting $B$ be the transpose of $n'$ linearly independent rows of $A$.
Using this matrix solve the new problem $$\min_{z \in \mathbb{R}^{n'}} c^T(Bz) + \rho \|ABz - b\|^2 = \min_{z \in \mathbb{R}^{n'}} (B^Tc)z + \rho \|(AB)z - b\|^2 = $$ which is of the same form as the original problem with $c' = B^Tc$ and $A' = AB$. Importantly in this new problem it must be the case that $(AB)^T(AB) = B^TA^TAB$ is invertible since $AB$ has no null space (since $Bz \in N^\perp \implies ABz \neq 0$, unless $z = 0$)
To recover a minimizer $x^*$ in the original problem from a minimizer of the new problem $z^*$, they are related by $x^* = Bz^*$. Interestingly, this $x^*$ is the unique minimizer which is contained in $N^\perp$.
The intuition for why this approach works is that since the null space does not contribute to the objective, we really just want to parameterize the subspace that is orthogonal to the null space. This is done by using $\mathbb{R}^{n'}$ and the matrix $B$ which transforms $\mathbb{R}^{n'}$ into $N^\perp$.