In first-order languages, ${\forall}$ is allowed to quantify only over variables, so that ${\forall}v(P)$, where $v$ is some variable and $P$ is a WFF is the only kind of a WFF concering universal quantifiers allowed in such languages.
Why is that? No texts I read on this subject (which due to the limitations I have due to the place I live have only been sites like Wikipedia, Proofwiki or Wolfram MathWorld) provided an explanation why quantifying over WFFs is not allowed. I don't see any advantages to this and it leads to annoying problems like ZFC being an infinite list of axioms, which don't avoid the concept of ${\forall}P(Q)$, $P$ and $Q$ being WFFs, just phrase it in another way.
So what's the justification of not quantifying over WFFs? What are the advantages of such approach?
It's because when we do allow broader scopes of quantification, things get extremely ugly.
One extremely important fact about first-order logic is that it has a reasonable notion of proof: if I want to know whether $\varphi$ is true whenever each sentence in $\Gamma$ is true - that is, if $\Gamma\models\varphi$ - I just look for a proof of $\varphi$ from $\Gamma$. The crucial facts about proofs are:
A proof is finite (in particular, can be represented by a single natural number).
I can identify proofs (so the set of numbers corresponding to valid proofs is computable).
Passing to stronger logics tends to kill this property: in particular, second-order logic - which I think is the natural setting for what you're describing - provably has no good proof system! In fact, I can write down a second-order sentence which is valid (= true in every model) iff the Continuum Hypothesis is true! So, determining what constitutes a proof in second-order logic requires us to first make strong set-theoretic commitments - which is putting the cart before the horse a bit!
This is not to say that stronger logics are uninteresting (see Barwise and Feferman's book Model-Theoretic Logics), merely that they lack nice properties which make first-order logic manageable (the other main property being the Lowenheim-Skolem property, which is a bit more technical). In fact, we can characterize first-order logic as the strongest logic satisfying a couple nice properties (this is Lindstrom's Theorem). So there's a trade-off to be made, in choosing what logic to use.