Consider $M = \{ A \in \mathbb R^{2,2}: A = A^T\}$ and $\langle A,B \rangle := \operatorname{tr}(A^T B)$ which is a scalar product on $\mathbb R^{2,2}$ and induces $\|A\|_{F} = \sqrt{\langle A,A \rangle}$. The task is to find $\min_{B \in M} \|A - B\|_F$.
Apparently, one forms an orthonormal basis of $M$, let's say $\{B_1,B_2,B_3\}$, and then solves the normal equation $\sum_{j=1}^3 \alpha_j \langle B_i,B_j \rangle = \langle A,B_i \rangle$ for $A^* = \alpha_1 B_1 + \alpha_2 B_2 + \alpha_3 B_3$ which is our candidate.
Q: I don't understand why this approach works because the normal equation I know is given by $A^TAx = A^Tb$. There is also the Theorem of Mirsky-Schmidt which gives the best rank $s$ approximation but that requires the singular value decomposition.
Note that $M$ is a closed convex set in a Hilbert space hence a unique minimiser $B^* \in M$ exists.
Note that minimising $\|A-B\|$ is equivalent to minimising ${1 \over 2} \|A-B\|^2$, and if $f(B) = {1 \over 2} \|A-B\|^2$, then expanding $f(B+H)-f(B)$ using the inner product shows that $Df(B)(H) = - \langle A-B, H \rangle$.
To think in terms of normal equations, we need to consider the restriction $B \in M$.
Since $f$ is convex, and $M$ is a subspace, we have $Df(B^*)(H) \ge 0$ for all $H \in M$, which means that $Df(B^*) (H) = 0$ for all $H \in M$ (and $B^* \in M$). These are the normal equations which incorporate the restriction on $B$.
In particular, $A-B^* \bot H$ for all $H \in M$ and so $A-B^* \in M^\bot$.
To find $B^*$, suppose $B_1,...,B_d$ are an orthonormal basis for $M$, then we can write $A = B^* + A-B^*= \sum_{k \le d} \lambda_k B_k + A-B^*$, and so $\langle B_i, A \rangle = \lambda_i$, $i=1,...,d$, hence $B^* = \sum_{k \le d} \langle B_k, A \rangle B_k$.
It is not too hard to see that $E_{11}, E_{22}, {1 \over \sqrt{2}}(E_{12}+E_{21})$ is an orthonormal basis for $M$.
An equivalent (essentially the dual, and computationally less demanding in this instance) approach would be to note that $M^\bot$ is spanned by $C={1 \over \sqrt{2}}(E_{12}-E_{21})$, so we could look for $t$ such that $A = B^* + t C$, in which case we see that $t = \langle C, A \rangle$, and so $B^* = A-\langle C, A \rangle C$.