I am interested in Ramsey's original motivation for proving the Ramsey theorem (on finding some set for which the coloring on its subsets are constant, for any given coloring of subsets). This link https://cs.du.edu/~ndobrine/Dobrinen_NotreDameLogicSeminar_September2018_talk.pdf says that
This theorem appears in Ramsey’s paper, On a problem of formal logic, and is motivated by Hilbert’s Entscheidungsproblem: Find a procedure for determining whether any given formula is valid. Ramsey applied his theorem to solve this problem for formulas with only universal quantifiers in front ($\Pi_1$ formulas)
https://www.cantorsparadise.com/ramsey-theorys-surprising-roots-from-logic-to-combinatorics-95af58d27a13 article says that
The result of the paper still stands true of what is now called the Bernays–Schönfinkel–Ramsey class of first-order logic [from what I'm interpreting, these are the $\exists \forall$-formulas with no function symbols?].
Are there any notes/expositions of Ramsey's $\Pi_1/\exists\forall$ theorem(s), and why the Ramsey subset coloring theorems naturally come up in this context? I've searched around, and the only resource seems to be Ramsey's original paper https://www.cs.umd.edu/~gasarch/BLOGPAPERS/ramseyorig.pdf. I'm surprised that for a subject which gets so much attention (the halting problem), no one's (say) computability theory course notes go into this "positive" result for the halting problem, even though it was the original motivation for Ramsey's subset coloring theorem, which has grown into a full life of its own in the past century (e.g. extensions/generalizations like Galvin-Prikry, Paris-Harrington, which continue to bear fruit today).
Let me start with a little background. The Entscheidungsproblem is the validity problem for first-order logic: Let $\varphi$ be a sentence of first-order logic in the language $L$. Determine whether $\varphi$ is valid (i.e. true in every $L$-structure).
The dual problem is the satisfiability problem: Given an $L$-sentence $\varphi$, determine whether $\varphi$ is satisfiable (i.e. true in some $L$-structure).
Note that these problems are equivalent in a sense, since $\varphi$ is satisfiable if and only if $\lnot \varphi$ is not valid.
Ramsey was interested in a special case of the satisfiability problem. We assume $L$ is a finite relational language (no function symbols or constant symbols) and focus on universal sentences: those of the form $\forall x_1 \dots \forall x_n\,\varphi(x_1,\dots,x_n)$, where $\varphi(x_1,\dots,x_n)$ is quantifier-free.
It's easy to see that satisfaction of universal sentences is preserved under substructure: If $\psi$ is a universal sentence, $A\subseteq B$, and $B\models\psi$, then $A\models \psi$.
As a consequence, a universal sentence $\psi$ has a model if and only if $\psi$ has a model of size $1$. (Like Ramsey, I will assume in this answer that all $L$-structures are non-empty, so "has a model" means "has a non-empty model".) Indeed, if $A\models \psi$ and $a\in A$, then $\{a\}\models \psi$. So the satisfiability problem for universal sentences is trivial: all we have to do is enumerate all $L$-structures of size $1$ and check whether each is a model of $\psi$.
Now Ramsey's paper is actually about a refinement of the satisfiability problem, called the spectrum problem: Given a sentence $\psi$, for which finite numbers $\ell$ does $\psi$ have a model of size $\ell$? The set of such $\ell$ is called the spectrum of $\psi$. Ramsey writes: "The consistency of a formula may, of course, depend on the number of individuals in the universe considered, and we shall have to distinguish between formulae which are consistent in every universe and those which are only consistent in universes with some particular numbers of members."
Note that if $\psi$ is a universal sentence in a relational language, then since satisfaction of $\psi$ is preserved under substructure, the spectrum of $\psi$ is downwards-closed subset of $\mathbb{N}_+$. But it is not trivial to determine the spectrum of $\psi$ by inspecting $\psi$. Towards this end, Ramsey proved the following theorem. (It is not stated in this form in his paper, but this is what he proves.)
Theorem (Ramsey): For every $n\in \mathbb{N}$ and every finite relational language $L$, there exists $N\in \mathbb{N}_+$ (which can be effectively computed from $n$ and $L$) such that for every universal $L$-sentence $\psi$ with $n$ universally quantified variables, if $\psi$ has a model of size $N$, then $\psi$ has a model of size $\ell$ for all $\ell\in \mathbb{N}_+$.
The theorem provides an algorithm for determining the spectrum of a universal sentence $\psi$: Enumerate all $L$-structures of size $N$. If one of them is a model of $\psi$, then the spectrum of $\psi$ is all of $\mathbb{N}_+$. If not, then $\psi$ has bounded spectrum $[1,k]$ for some $k<N$, and we can determine $k$ by enumerating all $L$-structures of size $\ell$ for all $\ell<N$.
To prove the theorem, let me introduce some terminology. Let $a_1,\dots,a_k$ be a sequence of elements from some $L$-structure $A$. The sequence $(a_i)$ is literally indiscernible if for every atomic formula $\theta(x_1,\dots,x_\ell)$ and all increasing sequences $1\leq i_1<\dots<i_\ell\leq k$ and $1\leq j_1<\dots<j_\ell\leq k$ of length $\ell$, $A\models \theta(a_{i_1},\dots,a_{i_\ell})$ if and only if $A\models \theta(a_{j_1},\dots,a_{j_\ell})$. The reason for the terminology (which I made up) is that atomic formulas and their negations are sometimes called literals.
Let $a_1,\dots,a_k\in A$ be a literally indiscernible sequence which is non-constant (so the $a_i$ are all distinct). Then for any linear order $(I,<)$, we can use the sequence $(a_i)$ as a template for defining a structure with domain $I$. The idea is that for any $\ell\leq k$ and any increasing sequence $i_1<\dots<i_\ell$ from $I$, we interpret the relation symbols in $L$ in such a way that the map $i_j\mapsto a_j$ is an isomorphism $\{i_1,\dots,i_\ell\}\cong \{a_1,\dots,a_\ell\}$. The indiscernibility condition ensures that this is consistent, i.e., all these substructures of $I$ agree on their overlaps. This construction is called stretching indiscernibles.
More precisely, we define an $L$-structure $M_{I,(a_i)}$ with domain $I$ as follows: for every atomic formula $\theta(x_1,\dots,x_\ell)$ with $\ell\leq k$ and any increasing sequence $i_1<\dots<i_\ell$ from $I$, set $\theta(i_1,\dots,i_\ell)$ to be true in $M_{I,(a_i)}$ if and only if $A\models \theta(a_{1},\dots,a_{\ell})$. If the maximum arity of a relation symbol in $L$ is $\leq k$, this completely determines a structure with domain $I$, since no atomic formula contains more than $k$ variables. If not, we can simply declare that no relation symbol has an interpretation in $M_{I,(a_i)}$ containing a tuple with $>k$ distinct elements.
Claim: Let $\psi$ be a universal sentence with $n$ universally quantified variables. If $a_1,\dots,a_k\in A$ is a non-constant literally indiscernible sequence with $k\geq n$, and $A\models \psi$, then for any linear order $(I,<)$, $M_{I,(a_i)}\models \psi$.
Proof: The point is that to check that $M_{I,(a_i)}\models \psi$, it suffices to check that every substructure of size $\leq n$ satisfies $\psi$, and this is true because we used $(a_i)$ as a template. Let $\psi$ be $\forall x_1,\dots,x_n\,\varphi(x_1,\dots,x_n)$. Let $b_1,\dots,b_n\in M_{I,(a_i)}$. Enumerate $B = \{b_1,\dots,b_n\}$ in increasing order as $i_1<\dots<i_m$. Note that $m\leq k$. Now the map $i_j\mapsto a_j$ is isomorphism between $B$ and a substructure of $A$. Since $A\models \psi$, $B\models \psi$, so $M_{I,(a_i)}\models \varphi(b_1,\dots,b_n)$. Since $b_1,\dots,b_n$ were arbitrary, $M_{I,(a_i)}\models \psi$.
Proof of Theorem: Recall that the finite Ramsey theorem says that for all $a,b,c\in \mathbb{N}$, there exists $d\in \mathbb{N}$ such that $d\to (a)^b_c$, i.e., for every coloring of the $b$-element subsets of a set of size $a$ by $c$ colors, there is a homogeneous set of size $d$.
For each $1\leq \ell < n$, let $r_\ell$ be the number of atomic $L$-formulas in $\ell$ variables. Note that $r_\ell$ is finite. Now define a sequence of natural numbers $k_1,\dots,k_{n}$ by recursion: Let $k_1 = n$, and set $k_{i+1}$ to be the least natural number such that $k_{i+1}\to (k_i)^i_{2^{r_i}}$. Let $N = k_n$.
Now suppose $\psi$ has a model $A$ of size $N = k_n$. Linearly order the elements of $A$. Now color each subset of $A$ of size $(n-1)$ by $2^{r_{n-1}}$ colors according to which subset of the $r_{n-1}$-many atomic $L$-formulas it satisfies when enumerated in increasing order. Since $k_n\to (k_{n-1})^{n-1}_{2^{r_{n-1}}}$, we can find a subset $A'\subseteq A$ of size $k_{n-1}$ such that every increasing sequence of size $(n-1)$ satisfies the same set of atomic $L$-formulas.
Repeat, extracting from $A'$ a subset of size $k_{n-2}$ such that every increasing sequence of size $(n-2)$ satisfies the same set of atomic $L$-formulas (and the same remains true for increasing sequences of size $(n-1))$. Continuing in this way, we arrive at a subset of size $k_1 = n$ which, when enumerated in increasing order as $(b_i)$, is literally indiscernible.
Now for any $\ell\in \mathbb{N}_+$, letting $[\ell]$ be the linearly ordered set $\{0,\dots,\ell-1\}$, the $L$-structure $M_{[\ell],(b_i)}$ is a model of $\psi$ of size $\ell$.
The primary use of Ramsey's theorem in modern model theory is also to build indiscernible sequences. By an indiscernible sequence, I now mean an infinite sequence $(a_i)_{i\in I}$ from a model $M\models T$ such that for any formula $\varphi(x_1,\dots,x_n)$ and any increasing sequences $i_1<\dots<i_n$ and $j_1<\dots<j_n$ from $I$, we have $M\models \varphi(a_{i_1},\dots,a_{i_n})$ if and only if $M\models \varphi(a_{j_1},\dots,a_{j_n})$.
Given an arbitrary infinite sequence $(b_i)_{i\in \mathbb{N}}$ from $M$, we can use the finite Ramsey theorem in exactly the way we did above to extract arbitrarily large subsequences which are indiscernible for any finitely many formulas. We then apply the compactness theorem to find a sequence which is indiscernible for all formulas. Another application of the compactness theorem allows us to "stretch" any infinite indiscernible sequence to one indexed by an arbitrary linear order.
Ramsey did not use the terminology "indiscernible" - the notion of an indiscernible sequence was introduced by model theorists years later. But it's remarkable how similar his original application of the finite Ramsey theorem was to the modern theory of extracting and stretching indiscernibles.
In terms of other modern references, Harvey Friedman revisited Ramsey's work in his 1984 paper On the spectra of universal relational sentences. Here he gives a brief summary of Ramsey's results and analyzes the optimal runtime for the algorithm to determine the spectrum of $\psi$.
Ramsey's work on the spectrum problem must also appear in some textbooks on finite model theory, but I don't have a reference.