There is a finite set of features $F = {f_1, ..., f_n}$. System registers a signal $s$ that is a vector from a linear span of F $(1)$.
There is a set of "unit signals" that are vectors from $(1): u_1, ..., u_m,\ m \ll n.$ Signal $s$ could be decomposed into a kind of linear combination:
$s = \alpha_1 u_1 + ... + \alpha_m u_m + \theta\ (2)$
where $\theta$ is a compensation vector (background noise).
The challenge is to find an optimal set of parameters $\alpha_1, ..., \alpha_m$ so that $\theta$ has a minimal $L_2$ norm.
First, if the $u_i$ are not linearly independent, replace them with a maximal linearly independent subset (apply the sifting algorithm).
Since the $L_2$ norm comes from an inner product $\langle -,-\rangle$, we can orthonormalise $(u_1,\ldots,u_m)$ by the Gram-Schmidt process into $(v_1,\ldots,v_m)$. Extend this to an orthonormal basis of the whole space (this is easy: extend it by the standard basis vectors, sift, and Gram-Schmidt whatever's left: if you're doing this calculation lots of times and need it very efficient, the extended basis doesn't actually need to be orthonormal for an actual implementation) $(v_1,\ldots,v_n)$. Then there are unique coefficients $\beta_1,\ldots,\beta_n$ such that $s = \sum\limits_{i=1}^n \beta_i v_i$. Choose $\theta = \sum\limits_{i=m+1}^n\beta_iv_i$. Then $s - \theta = \sum\limits_{i=1}^m\beta_iv_i$ lies in the linear span of the $v_i$, which is exactly the linear span of the $u_i$, so there are unique $\alpha_1,\ldots,\alpha_m$ such that $s-\theta = \sum\limits_{i=1}^m\alpha_iu_i$ (and obtaining these from the $\beta_i$ is easy, since we get the transition matrix from the $u_i$ to the first $m$ of the $v_i$ out of the Gram-Schmidt process for free).
Now, $\|\theta\|_2^2 = \sum\limits_{i=m+1}^n|\beta_i|^2$, since the $v_i$ are orthonormal. Further, if there is some $\varphi\neq\theta$ and some other choice of $\alpha_i$ such that $s = \sum\limits_{i=1}^m\alpha_iu_i + \varphi$, then, since the $v_i$ form a basis, $\varphi$ must be of the form $\theta + \sum\limits_{i=1}^m\gamma_iv_i$ for some scalars $\gamma_i$, so $\left\|\varphi\right\|^2_2 = \left\|\sum\limits_{i=1}^m\gamma_iv_i+\sum\limits_{i=m+1}^n\beta_iv_i\right\|^2_2 = \sum\limits_{i=1}^m|\gamma_i|^2+\sum\limits_{i=m+1}^n|\beta_i|^2$ by orthonormality of the $v_i$, which is strictly greater than $\sum\limits_{i=m+1}^n|b_i|^2 = \|\theta\|_2^2$ (with the strictness since we have equality only when $|\gamma_i| = 0$ for all $i$, but in that case, $\theta = \varphi$, a contradiction). Thus, our chosen $\alpha_i$ are those that minimise the associated $\theta$, as required.