Both set theory and second-order arithmetic are general enough to embed any Turing machine including of course universal Turing machines (UTM). Since any theorem is provable via a UTM, why are the theories not equivalent?
Perhaps using an embedded UTM to prove a theorem outside the scope of set theory involves the use of the fantasy rule. Hence it can at best prove theorems as; assume $A$ then $A \to B$? Where $A$ is the program required to be run on the UTM to prove $B$.
There are a few issues here. First of all, UTMs don't prove things. What a UTM can do is verify that a computable first-order theory proves something, and so theories capable of studying UTMs can prove instances of provability in other theories. So for instance, if $T$ is a computable first-order theory which proves a sentence $\varphi$, then ZFC proves "$T$ proves $\varphi$." Moreover, if we assume ZFC is "reasonably true" (= $\Sigma^0_1$-sound), then ZFC proves "$T$ proves $\varphi$" iff $T$ in fact proves $\varphi$.
(To see that the soundness assumption on ZFC is needed, think about the theory ZFC+$\neg$Con(ZFC) ...)
So we have to be careful in defining "equivalent" here. A reasonable choice here is to do two things. First, restrict attention to $\Sigma^0_1$-sound theories to avoid the issue above. Second, say that theories $T_1, T_2$ are equivalent if they each correctly simulate each other: when we code the expressions "$T_i$ proves $\varphi$" appropriately, we have that for each sentence $\varphi$, $T_i$ proves $\varphi$ iff $T_{3-i}$ proves "$T_i$ proves $\varphi$."
(Note that if we didn't restrict attention to $\Sigma^0_1$-sound theories, we'd have a counterexample to your guess: ZFC+$\neg$Con(ZFC) proves that ZFC proves everything, so doesn't correctly simulate ZFC.)
So your question can now be rephrased as:
The answer comes down to those all-important words "first-order" versus "second-order." First-order logic has a recursive and complete proof system: if $T$ is a computably enumerable theory, the set of first-order consequences of $T$ is also computably enumerable (and we can go from a computable enumeration of $T$ to a computable enumeration of its consequences uniformly). By a "first-order consequence" of $T$ I mean a first-order sentence true in every model of $T$. The fact that the consequence relation in first-order logic can be captured by looking at finite combinatorial objects in a computable way is usually what justifies our use of the word "proves:" it's not just that a sentence is a consequence of a theory, but we can even produce a thing convincing us that the sentence is a consequence of the theory.
By contrast, this fails wildly for second-order logic. Second-order logic isn't even compact, so the mere idea of a "proof" in the sense of a finite sequence of sentences verifying that a certain sentence is a consequence of a certain set of sentences breaks down. And as it happens, this difference extends into the computability-theoretic realm:
(In fact it's much worse: it's not even in the analytic hierarchy!)
As a special case of this:
(In fact, it isn't even in the arithmetic hierarchy!)
So the issue here is:
Incidentally, there's a terminological subtlety here. The phrase "second-order arithmetic" is more commonly these days used to refer to the first-order theory Z$_2$, which consists of basic axioms about natural numbers and sets of natural numbers. The reason for this shift in terminology is that the second-order theory of arithmetic is so terrible, we almost never actually study it. But this does mean you might here contradictory statements - the resolution is that they mean different things by "second-order arithmetic, and explains why I've said "second-order theory of arithmetic" above.