Let's consider the classic elastostatics case where the strong form of the PDE is:
$\sigma _{ij,j}+b_i =0$ on V
By multiplying through by weighting functions and integrating we can create an equivalent weak form of the problem:
$ \int (\sigma ^u_{ij,j}+b_i)\lambda _i dV $
My question is what is the intuition behind this weak formulation? What is the point of multiplying through by these weighting functions, $\lambda_i$? And why choose our weighting functions to be from the same basis of our solution space?
My guess is these weighting functions act as a "correction factor," but i'm a little confused on their purpose. Is there a more mathematical way of explaining this formulation?
Being absolutely ignorant when it comes to elastostatics, I can only say something about intuition behind finite element methods in the context of fluid flow and heat transfer. My main conclusion is this: an intuition of the sort as mentioned by the OP is not quite there.
To be honest, I have seldomly found the Galerkin method useful, let it be insightful. When it comes to for example convection-diffusion problems it is indeed weak in the true sense of the word, giving the wrong scheme (central differences instead of upwind differences) for the convective terms.
Therefore I think it's advantageous to somehow relax the predominance of the weak formulation.
One of the best alternatives for my own applications has been the resistor network paradigm. See: