Finding the minimum of $\frac{1}{2} \|x\|_2^2 + c^Tx$

86 Views Asked by At

Let's assume that $AA^T$ is invertible. I would like to show that the minimum of

$$\begin{array}{ll} \text{minimize} & \frac{1}{2} \|x\|_2^2 + c^Tx\\ \text{subject to} & Ax = 0\end{array}$$

where both $x$ and $c$ are from $\mathbb{R}^n$, is given by

$$x^* = -(I - A^T(AA^T)^{-1}A)c.$$

There is a tip that projecting $-c$ onto the subspace $\{x: Ax = 0 \}$.

I don't know how can I deal with the problem above. I would appreciate any hints or tips.

1

There are 1 best solutions below

0
On

Here is a non-calculus solution, but it requires some linear algebra knowledge. However, if you want to use calculus, apply the Lagrange multiplier technique with the Lagrange function $$\Lambda(x,\lambda):=\frac{1}{2}\,\|x\|_2^2+c^\top x-\lambda^\top A x\,,$$ where $x\in\mathbb{R}^n$ and $\lambda\in\mathbb{R}^m$ (provided that $A$ is an $m$-by-$n$ matrix). You will get $$\frac{\partial}{\partial x}\,\Lambda(x,\lambda)=x+c-A^\top\lambda\text{ and }\frac{\partial}{\partial \lambda}\,\Lambda(x,\lambda)=-Ax\,.$$
If $(x^*,\lambda^*)\in\mathbb{R}^n\times\mathbb{R}^m$ is the solution to this Lagrangian problem, then you can show that $$\lambda^*=(AA^\top)^{-1}A\,c\text{ and }x^*=-c+A^\top\lambda^*\,.$$

Rewrite the function to be optimized as $$f(x)=\frac{1}{2}\,\left\|x+c\right\|_2^2-\frac{1}{2}\,\|c\|^2_2\text{ for all }x\in\mathbb{R}^n\,.$$ Thus, this is optimization problem can be expressed as: $$\begin{array}{ll}\text{minimize}&\|v\|_2^2\\ \text{subject to}&v=c+k\text{ for some }k\in\ker(A)\,.\end{array}$$

If $P$ is the orthogonal projection onto $\ker(A)$ and $u\in\mathbb{R}^n$, then $$\|u-Pu\|_2^2\leq \|u\|_2^2\,.\tag{*}$$ The equality occurs if and only if $u$ is in the ortogonal complement $\big(\ker(A)\big)^\perp$ of $\ker(A)$.

The above inequality is true because $$\|u-Pu\|_2^2=\|u\|_2^2-u^\top Pu-u^\top P^\top u+uP^\top Pu\,,$$ and since $P$ is symmetric and idempotent (i.e., $P^2=P$), we obtain $$\|u-Pu\|_2^2=\|u\|_2^2-u^\top P^\top Pu-u^\top P^\top Pu+u^\top P^\top Pu= \|u\|_2^2-u^\top P^\top P u\,.$$ Therefore, $$\|u-Pu\|_2^2=\|u\|_2^2-\|Pu\|_2^2\leq \|u\|_2^2\,,$$ with equality if and only if $u\in \ker(P)$. However, $\ker(P)$ is the ortogonal complement of $\ker(A)$.

Using (*), $$\|v\|_2^2\geq \|v-Pv\|_2^2\,.$$ Now, $v=c+k$ for some $k\in\ker(A)$. Therefore, $$v-Pv=(c+k)-P(c+k)=(c-Pc)+(k-Pk)\,.$$ As $k\in \ker(A)$, we get $k=Pk$, so $v-Pv=c-Pc$. Hence, $$\|v\|_2^2\geq \|c-Pc\|_2^2\,.\tag{#}$$ We claim that the equality is achievable at a unique point $v^*\in\mathbb{R}^n$. Then, $x^*:=v^*-c$ is a minimizing point of the original optimization problem.

Here is the proof of the uniqueness of $v^*$. If $v_1\in\mathbb{R}^n$ and $v_2\in\mathbb{R}^n$ are two feasible points such that (#) is an equality, then by the equality condition of (*), both $v_1$ and $v_2$ are in $\big(\ker(A)\big)^\perp$, so $v_1-v_2\in \big(\ker(A)\big)^\perp$. However, as $v_1=c+k_1$ and $v_2=c+k_2$ for some $k_1,k_2\in\ker(A)$, we obtain $v_1-v_2=k_1-k_2\in\ker(A)$. Therefore, $$v_1-v_2\in\big(\ker(A)\big)^\perp\cap\ker(A)=\{0\}\,.$$ This means $v_1=v_2$.

Now, for the existence, we note that $v^*\in\big(\ker(A)\big)^\perp=\ker(P)$ for (#) to be an equality. Now, note that $$P(c-Pc)=Pc-P^2c=Pc-Pc=0\,,$$ we get $c-Pc\in\ker(P)$. Since $P$ is the orthogonal projection onto $\ker(A)$, $Pc\in\ker(A)$. Therefore, $c-Pc$ is a feasible point. Hence, $v^*:=c-Pc$ is a feasible point that makes (#) an equality.

After this long calculation, we get $x^*=v^*-c=-Pc$. Therefore, we need to only verify that $$Q:=I-A^\top(AA^\top)^{-1}A$$ is an orthogonal projection onto $\ker(A)$. Since such a linear transformation is unique, $Q=P$, and the problem is solved. You only have to verify that $Q$ is a symmetric matrix (which is quite obvious), that $Q^2=Q$ (which is not too difficult), and that $\text{im}(Q)=\ker(A)$. The part where $AQ=0$ is trivial, so $\text{im}(Q)\subseteq \ker(A)$. However, $Qk=k$ for all $k\in\ker(A)$, so $\text{im}(Q)=\ker(A)$.

If $AA^\top$ is not invertible, we can use $P=I-A^{\dagger}A$, where $A^\dagger$ is the Moore-Penrose pseudoinverse of $A$. That is, $$x^*=-Pc=-\left(I-A^\dagger A\right)\,c\,.$$ Note that $A^\dagger=A^\top(AA^\top)^{-1}$ when $AA^\top$ is invertible.