A PDE problem in the form $$−∇⋅(k(x)∇u)=f$$ can be expressed in the form $$Au=f$$ where the linear operator $A$ is defined by the expression $$Au=−∇⋅(k(x)∇u)$$ In the Bubnov-Galerkin method, $u$ is approximated by $\bar{u}$ as a linear combination of functions taken from a subspace $\phi$. $$\bar{u} = c_1\phi_1 + c_2\phi_2 + ... + c_n\phi_n$$ and the problem is restated in the weak form $$\int_\Omega−∇⋅(k(x)∇\bar{u})⋅\phi\ d\Omega=\int_\Omega f⋅\phi\ d\Omega$$ In terms of the operator A, this can be expressed as $$\phi^TA\bar{u} = \phi^Tf= \phi^TAu$$ or $$\phi^TA(\bar{u}-u) = 0$$ This means that the image of $\bar{u}$ under $A$ is the orthogonal projection of $f$, the image of $u$ under $A$, onto space $\phi$, and in that sense is the best approximation of $u$.
I'm familiar with basic linear algebra but not with functional analysis, and I get confused when I try to imagine the subspaces involved in this method. Specifically, it's not clear to me whether $\bar{u} \in \phi \implies A\bar{u} \in \phi$. I believe that's not the case for $A$ because that's not generally the case for the differentiation operator. That's my first question. The second one is: assuming that $A\bar{u} \notin \phi$ and $A\bar{u} \in \psi$, wouldn't finding $\bar{u}$ such that $\psi^TA\bar{u} = 0$ give a better approximation?
