There are two versions I know for mathematical induction, as well as structural induction. One says for all subset $S$ of $\Bbb N$, $1\in S\wedge (\forall n,~n\in S\rightarrow s(n)\in S)\rightarrow S=\Bbb N$. Another says for all predicate $P$ for natural numbers, $P(1)\wedge\forall n\in \Bbb N,~P(n)\rightarrow P(n+1)$ then $P$ is true for all $n\in\Bbb N$. I know traditionally the first is the original form. But some questions arise:
Why some people preferred stated the latter version? Is it simply because of the pedagogical reason that mid-school students are more likely to understand the predicate version?
In mathematical logic, which form is preferred? I roughly guess that the latter can not be easily written in first-order language? What is the role of axiom schema of separation playing here?
How to prove the equivalence of these two? What axioms and theorems do we need? Do we need the second-order language?