I've faced a totally unexpected situation while I'm studying on FEA and optimization. I have the following system of linear equations (according to FEA) with constraints. $$ \begin{aligned} \mathbf K \mathbf u &= \mathbf f, \\ \mathbf Q \mathbf u &= \mathbf b, \end{aligned} $$
where $\mathbf K$ is a symmetric, banded, and positive semi-definite system matrix, $\mathbf u$ is the field variable, $\mathbf f$ is the vector of external loads, $\mathbf Q$ is a sparse constraint matrix, and $\mathbf b$ is the vector of constrained values.
I can solve the above system equations in two ways:
Lagrange multiplier method $$ \begin{bmatrix}\mathbf u \\ \boldsymbol\lambda\end{bmatrix}= \begin{bmatrix}\mathbf K & \mathbf Q^\mathsf T \\ \mathbf Q & \mathbf 0\end{bmatrix}^{-1}\begin{bmatrix}\mathbf f \\ \mathbf b\end{bmatrix}. $$
Or, just directly using eliminated system $$ \bar{\mathbf u} = \bar{\mathbf K}^{-1}\bar{\mathbf f}, $$ where bar indicates the vectors or matrices that are eliminated. Of course, both provide the same solutions.
I thought that it's equivalent to solve an optimization problem minimizing the residual, i.e., $$ \mathbf u^* = \mathrm{arg}\min_{\mathbf Q\mathbf u=\mathbf b} f\left(\mathbf u\right), $$ where $$ f\left(\mathbf u\right) = \frac 1 2 \mathbf r^\mathsf T\mathbf r, \\ \mathbf r = \mathbf K\mathbf u - \mathbf f. $$ However, it seems not working at all.
I've also tried an unconstrained version it, i.e., $$ \bar{\mathbf u}^* = \mathrm{arg}\min \bar f\left(\bar{\mathbf u}\right), $$ where $$ \bar f\left(\bar{\mathbf u}\right) = \frac 1 2 \bar{\mathbf r}^\mathsf T\bar{\mathbf r}, \\ \bar{\mathbf r} = \bar{\mathbf K}\bar{\mathbf u} - \bar{\mathbf f}. $$
The eliminated version seems to give a bit of an improved solution, but it's still wrong.
These are FE solutions of heat conduction on 10x10 grid mesh; from left, solving $\mathbf K\mathbf u = \mathbf f$, minimizing residual, and eliminated version of minimizing residual. Although it seems that the third one results in the correct answer, it goes wrong when the discretization gets bigger.

After a bit of investigation, I found that there was a trivial solution of the optimization problem, that is, a vector $\mathbf u$ with zeros for most of its components except near the constrained and loaded DOFs. It's because $\mathbf f$ is also a very sparse vector.
Finally, here's my question:
Is my hypothesis that solving a system of linear equations directly and minimizing residual are equivalent to each other wrong? What am I missing?
Is the problem related to one of the following facts?
- Ill-conditioned and narrowly banded Hessian (c.f. $\mathbf H = \mathbf K^\mathsf T\mathbf K$, $\mathrm{cond}\left(\mathbf H\right) = \mathcal O\left(10^{5}\right)$)
- A very sparse load vector
- Is there any way to get a nontrivial solution by optimization?
Any comment will be helpful, thanks

