I'm wondering what exactly is meant when people say "Skolemization preserves satisfiability but not validity". I'm having trouble wrapping my brain around it because I think of Skolemization, when considered as an inference rule, to simply "be an okay thing to do" when working with a second-order formula.
Skolemization is said to preserve satisfiability but not validity. What exactly do satisfiability and validity mean in the context? Does "co-Skolemization" (dual to Skolemization as a second-order equivalence) have similar properties as Skolemization?
Skolemization, as far as I can tell, is based on the equivalence between the first-order formula (and therefore degenerate second-order formula)
$$ \forall x_1 \cdots x_n . \exists y . P(x_1, \cdots, x_n, y) $$
and the following second order formula, that moves the $\exists$ in front of the $\forall$.
$$ \exists f. \forall x_1 \cdots x_n . P(x_1, \cdots, x_n, f(x_1, \cdots, x_n))$$
It makes perfect sense when the expression is thought of as a game. In the first expression, the opponent first makes $n$ moves and then the player makes 1. In the second expression, the player moves first but commits to a response function of sorts rather than committing to a single move. That's not a proof, but it seems like good intuitive justification for why rewriting an expression like this is okay in the first place.
It seems to me that performing this translation at all only makes sense if we're talking about a closed expression. That is, $P(\cdots)$ must have no free variables or, equivalently, $P$ must be an $n+1$-place predicate (primitive or defined).
In propositional logic, satisfiability and validity can be though of as describing what happens to free variables. All variables are free because there's no way to bind them.
$ P \lor P $ is satisfiable because the assignment $ \{ P = \top \} $ satisfies it.
$ P \lor \lnot P $ is a tautology for obvious reasons. I don't know whether it's right to say $ P \lor \lnot P $ is valid.
An inference rule that maps $P$ to $P \lor Q$ is valid (disjunction introduction).
A rule that maps $P$ to $P \land Q$ preserves satisfiability. We can take any old assignment $\mu$ and construct $\mu \cup \{ Q = \top \}$ which satisfies the new expression.
In the context of Skolemization, are the meanings of satisfiability and validity similar?
Let "co-Skolemization" (probably has a real name) be the translation from
$$ \exists x_1 \cdots x_n . \forall y . P(x_1, \cdots, x_n, y) $$
to
$$ \forall f . \exists x_1 \cdots x_n . P(x_1, \cdots, x_n, f(x_1, \cdots, x_n)) $$
That seems just as legitimate as Skolemization, considered as an inference rule in a second-order setting. There's a caveat here that in a first-order theory definitions are provided for all fully saturated combinations of function symbols and arguments. Therefore "co-Skolemization" definitely takes us out of first-order land, but I don't know how relevant that is.
Since Skolemization adds new symbols to the logical language, I think it is useful to introduce concepts of relative validity and satisfiability:
(Beware that "relative" validity and satisfiability is probably not standard terminology; I made it up for the purpose of this explanation).
Now, when we Skolemize an $\mathcal L$-sentence $\varphi$, we get an extended language $\mathcal L'\supseteq \mathcal L$ and an $\mathcal L'$-sentence $\varphi'$. This trades truth for satisfiability in the sense that
In other words, if $\varphi$ is true in $\mathfrak A$, then we can find some interpretation of the Skolem functions that makes $\varphi'$ true, and vice versa. But we have no guarantee that every possible interpretation of the Skolem functions will preserve the truth of $\varphi$ -- indeed, usually they won't (as Bram28's counterexample shows).
This means that if we just have $\varphi$ and is not looking at a particular $\mathfrak A$, the Skolemization preserves validity:
If we try to replicate this argument with "valid" instead of "satisfiable", we get stuck after step 3:
Whereas the original (3) had two "some" that we could collapse to one in (4), here we have a combination of "every" and "some". And that cannot be collapsed to a simple statement about interpretations of $\mathcal L'$ and $\varphi'$ -- in particular it is not equivalent to $\varphi'$ being valid.