Currently going through Simpson's "Subsystems of second order arithmetic", which I believe is the ultimate reference in reverse mathematics, after having completed (more like peeked) Stillwell's "Reverse Mathematics". However, I'm having some troubles following his foundations and his approach to set theory.
After some (slightly odd) remarks about the distinction between what he calls set-theoretic and non-set-theoretic (or ordinary) mathematics, he states
In this book we want to restrict our attention to ordinary, non-set-theoretic mathematics. The reason for this restriction is that the set existence axioms which are needed for set-theoretic mathematics are likely to be much stronger than those which are needed for ordinary mathematics.
which is true in light of the approach he's going to take to distinguish the different axiomatic systems. But, right after that, he starts talking about sets, set variables, set memberships, the natural numbers set, etc. without giving a construction or explaining how these are supposed to work, since obviously this cannot be ZF.
The question is how can we formalize this idea of sets and their properties without giving them too much power to actually be unable to create different axiomatic systems that are not equivalent. My bet is ZF without the axiom of specification (and possibly replacement, but I might be totally wrong about this) should suffice, since then we can add the suitable comprehension axioms without them being already "true" by the set theory axiomatic, but I haven't found any discussion about this topic.
The answer is in the name of the book.
The system described is equivalent to a particular (monadic) second-order theory using Henkin semantics.
Instead of talking about "sets", Simpson could have instead talked about predicate variables (which would be better in my opinion...) Using uppercase to refer to predicate variables and lowercase to refer to number variables to disambiguate, Simpson could have written comprehension, say, as: $$\exists P.\forall n.P(n)\leftrightarrow \varphi(n)$$ where $\varphi$ is a formula not containing $P$ freely. It is clear what operations you can perform on predicate variables. Namely, given a predicate variable $P$, the only thing you can do with it is apply it to a number term. There's no need to provide a "construction" of these anymore than there is a need to provide a "construction" of "number" variables (which Simpson also doesn't do).
As usual for a logic, Simpson provides a semantics which interprets this formal syntax into set theoretic terms. In this semantics, you get to choose what "number" variables refer to by choosing the semantic domain which could have nothing to do with natural numbers. The question remains on how to interpret predicate variables. The common options are Henkin semantics and full (or standard) semantics. In both of the semantics, we interpret predicate variables as subsets of the domain and $P(n)$, for example, if the interpretation of $n$ is a member of the interpretation of $P$. The difference between Henkin semantics and full semantics is simply whether you get to choose which subsets are allowable interpretations for predicate variables (Henkin semantics), or if any subset of the domain is always allowed (full semantics). In other words, for Henkin semantics you select a subset $\mathcal S\subseteq \mathcal P(D)$ where $D$ is the domain (for "number" variables), and for full semantics we require $\mathcal S = \mathcal P(D)$.
This more logical rendition makes it more clear that the "sets" of $Z_2$ have nothing a priori to do with set-theoretic sets just like the "numbers" have no a priori relation to natural numbers. The semantics connects it to set theory, but we could use other semantics such as categorical semantics that don't interpret things as (sub)sets, or we could eschew semantics entirely and only work with the proof theory.