I am having trouble understanding min/max problems in Linear Algebra. I am normally used to solving these types of problems using some form of calculus. The question I am stuck on is minimizing:
$$b^{T}x + \frac{1}{2}x^{T}x$$
subject to
$$Ax = 0$$
I think it's a fairly straightforward question when solving it using gradients and lagrange but I am stuck on how to even start questions like this.
Find a basis for the null space of $A$, then if matrix $N$ has as its columns the vectors of this basis, you can express $x$ as follows
$x = N u $
where $N$ is $n \times k$ and $u $ is a $k \times 1 $ vector. So now the optimization problem becomes unconstrained as follows. Find $u \in \mathbb{R}^k $ such that
$ f = \dfrac{1}{2} u^T N^T N u + b^T N u \hspace{15pt}(1) $
is minimized. And the solution of this problem is well-known. It can be found by completing the square. We want to absorb the linear term $b^T N u = u^T N^T b$, so write $f$ as follows
$ f = \dfrac{1}{2} (u - u_0)^T N^T N (u - u_0) - \dfrac{1}{2} u_0 N^T N u_0 \hspace{15pt}(2) $
Expanding the form we just wrote, we get
$ f = \dfrac{1}{2} u^T N^T N u - u^T N^T N u_0 $
Comparing this to the original form (Eq. $(1)$), we must select $u_0$ such that
$ - u^T N^T N u_0 = u^T N^T b $
Therefore,
$ u_0 = - (N^T N)^{-1} N^T b $
From Eq. $(2)$, and since $N^T N$ as positive semi-definite, the minimum value of $f$ is
$ f_{min} = - \dfrac{1}{2} u_0^T (N^T N) u_0 = - \dfrac{1}{2} b^T N (N^T N)^{-1} N^T b $
The optimal value of vector $x$ is
$ x = N u_0 = - N (N ^T N )^{-1} N^T b $
Note that $f_{min} = - \dfrac{1}{2} b^T x $
Another way to proceed is directly through Lagrange multipliers
The target function to be minimized is
$g (x, \lambda) = \frac{1}{2} x^T x + b^T x + \lambda^T (A x) $
The gradient with respect to $x$ is
$ \nabla_x g = x + b + A^T \lambda $
And the gradient with respect to $\lambda$ is
$\nabla_\lambda g = A x $
From $\nabla_x g = 0 $ , it follows that $ x = - b - A^T \lambda $
From $\nabla_\lambda g = 0 $ , it then follows that
$A (- b - A^T \lambda) = 0 $
Therefore,
$ A A^T \lambda = - A b $
At this point we have to assume that $(A A^T)$ has full rank, (i.e. if $A$ is $m \times n$, then we want the rank of $A A^T$ to be $m$)
Then,
$\lambda = - (A A^T)^{-1} A b = - A^{\#} b $
where $A^\#$ is the Penrose pseudo-inverse of $A$.
And, therefore,
$ x = - b - A^T \lambda = - ( I - A^T (A A^T)^{-1} A ) b $
And finally the minimum value of $f$ is
$ f_{min} = - \frac{1}{2} b^T ( I - A^T (A A^T)^{-1} A ) b $
The advantage of this method is that we can express the minimizing $x$ and $f_{min} $ directly in terms of $A$ and $b$.
Finally, one notes that if $A$ is $m \times n$ where $ m \lt n $ has rank $m$, then $N$ will be of dimension $n \times (n - m)$; therefore given a vector $y \in \mathbb{R}^n$, it can be written as
$ y = [ A^T : N ] [u_1 , u_2 ]^T $
where $u_1 \in \mathbb{R}^m , u_2 \in \mathbb{R}^{n-m} $
And we will have
$(I - A^T (A A^T)^{-1} A) y = [ 0 : N] [u_1 , u_2]^T $
and also,
$ (N (N^T N) N^T ) y = [ 0 : N ] [u_1 , u_2]^T $
Thus both are projectors into the null space of matrix $A$, so they're equal.