Simple Quadratic Minimization Problem

196 Views Asked by At

Let $A \in\mathbb{R}^{m\times n}$, $b,c\in\mathbb{R}^m$, $x\in \mathbb{R}^n$. Consider the following minimization problem: $$ \min_{x} f(x):= \frac{1}{2}\|Ax-c\|^2 + b^\top Ax. $$ Since the function is quadratic, and in case $A^\top A$ is invertible, the solution is found by imposing $$ \nabla_x f(x) = 0 \iff x^\star = (A^\top A)^{-1}(A^\top c - A^\top b). $$ However, in my case, $A^\top A$ is not invertible. Is the problem guaranteed to have a solution in this case? I feel like this is a convex quadratic program and it should have at least a solution. I am able to find one numerically through CVXpy but I don't know how to find one in closed form.

Related question: Quadratic Minimization Problem with positivity constraint

2

There are 2 best solutions below

5
On BEST ANSWER

Alternative solution:

We may use the continuity argument.

We have $$f(x) = \frac12 x^\mathsf{T}A^\mathsf{T} A x - \frac12 (c - b)^\mathsf{T}Ax - \frac12 x^\mathsf{T}A^\mathsf{T}(c - b) + \frac12 c^\mathsf{T} c.$$

Let $\epsilon > 0$ and let $$f_{\epsilon}(x) := \frac12 x^\mathsf{T}(A^\mathsf{T} A + \epsilon I_n) x - \frac12 (c - b)^\mathsf{T}Ax - \frac12 x^\mathsf{T}A^\mathsf{T}(c - b) + \frac12 c^\mathsf{T} c.$$

The unique global minimizer of $f_{\epsilon}(x)$ is given by $$x_{\epsilon} = (A^\mathsf{T} A + \epsilon I_n)^{-1} A^\mathsf{T}(c - b).$$

Fact 1: $\lim_{\epsilon\to 0^{+}} x_{\epsilon}$ exists (finite).
(The proof is given at the end.)

Let $$x_0 = \lim_{\epsilon\to 0^{+}} x_{\epsilon}.$$

Fact 2: $x_0$ is a global minimizer of $f(x)$.
(The proof is given at the end.)

Thus, $\lim_{\epsilon\to 0^{+}} (A^\mathsf{T} A + \epsilon I_n)^{-1} A^\mathsf{T}(c - b)$ is a solution. When $A^\mathsf{T} A$ is invertible, it is just $(A^\mathsf{T} A)^{-1} A^\mathsf{T}(c - b)$.


Proof of Fact 1:

WLOG, assume that $A\ne 0$.

Using SVD decomposition $A = UDV^\mathsf{T}$ where $$D = \left(\begin{array}{ll} D_1 & 0_{r\times (n-r)} \\ 0_{(m-r)\times r} & 0_{(m-r)\times (n-r)} \end{array} \right)$$ and $D_1$ is an invertible $r\times r$ diagonal matrix, we have $x_{\epsilon} = V (D^\mathsf{T}D + \epsilon I_n)^{-1}D^\mathsf{T} U^\mathsf{T}(c - b)$.

We have $$(D^\mathsf{T}D + \epsilon I_n)^{-1}D^\mathsf{T} = \left(\begin{array}{ll} (D_1^\mathsf{T}D_1 + \epsilon I_r)^{-1}D_1^T & 0_{r\times (m-r)} \\ 0_{(n-r)\times r} & 0_{(n-r)\times (m-r)} \end{array} \right).$$

Thus, we have $$\lim_{\epsilon \to 0^{+}} x_{\epsilon} = V \left(\begin{array}{ll} (D_1^\mathsf{T}D_1)^{-1}D_1^T & 0_{r\times (m-r)} \\ 0_{(n-r)\times r} & 0_{(n-r)\times (m-r)} \end{array} \right)U^\mathsf{T}(c - b).$$


Proof of Fact 2:

For all $x$ and $\epsilon > 0$, we have $f_{\epsilon}(x) \ge f_{\epsilon}(x_{\epsilon})$ where $$f_{\epsilon}(x_{\epsilon}) = \frac12 x_{\epsilon}^\mathsf{T}(A^\mathsf{T} A + \epsilon I_n) x_{\epsilon} - \frac12 (c - b)^\mathsf{T}Ax_{\epsilon} - \frac12 x_{\epsilon}^\mathsf{T}A^\mathsf{T}(c - b) + \frac12 c^\mathsf{T} c.$$

Let $g(\epsilon) := f_{\epsilon}(x) - f_{\epsilon}(x_{\epsilon})$. We have $\lim_{\epsilon \to 0^{+}} g(\epsilon) = f(x) - f(x_0)$. We have $f(x) - f(x_0) \ge 0$.

We are done.

1
On

Note that

$$ \frac{1}{2}\|Ax - c\|^2 +b^TAx = \frac{1}{2}x^TA^TAx - c^TAx + \frac{1}{2}c^Tc + b^TAx$$

Now since $c, b$ are constant, changing the objective to

$$ \frac{1}{2}\|Ax - c\|^2 +b^TAx + \frac{1}{2}b^T b - c^Tb $$

doesn't change the answer. But now using the observation, we can rewrite this as

$$ \begin{align} \frac{1}{2}\|Ax - c\|^2 + b^TAx + \frac{1}{2}b^T b +c^Tb &= \frac{1}{2}x^TA^TAx - c^TAx + \frac{1}{2}c^Tc + b^TAx + \frac{1}{2}b^T b - c^Tb \\ &= \frac{1}{2}x^TA^tAx + (b-c)^TAx + \frac{1}{2}(b-c)^T(b-c) \\ &= \frac{1}{2}\|Ax + b - c\|^2. \end{align}$$

Then from the standard theory on the ordinary least squares problem if $A^TA$ is invertible then the answer is

$$ x = (A^TA)^{-1}A^T(b-c). $$

Which is your answer.

However, when $A^TA$ is not invertible then the uniqueness of the solution depends on whether $A$ has a null space. In particular, if we let

$$ x^* = A^+(b-c).$$

Where $A^+$ is the Moore-Penrose pseudo inverse then any solution to the problem can be written as

$$ x^* + v$$

where $v \in \mathcal{N}(A)$. We also know that $x^*$ is the minimum norm solution.