I’ve been interested in Graham’s number for a while now however I've been unable to figure out how little graham was derived in the paper “RAMSEY'S THEOREM FOR n-PARAMETER SETS” BY R. L. GRAHAM AND B. L. ROTHSCHILD. In the case of N(1, 2, 2) (a two-dimensional graph of one-dimensional subcomponents made from two colors I believe) the upper bound is calculated to F(F(F(F(F(F(F(12, 3), 3), 3), 3), 3), 3), 3). How was 12 gotten? How was 3 gotten? Why is repeated 7 times? Maybe the answer is in the original paper however I'm not yet at a level yet to understand the math behind it. Help would be appreciated :-)
Here’s a link to the paper: https://mathweb.ucsd.edu/~fan/ron/papers/71_04_n_ramsey.pdf `
The final value of the upper bound of the Graham–Rothschild theorem is stated in its general form in the next-to-last paragraph of p. 274 of the paper:
Here, we are solving the version of the theorem defining Graham's number, in terms of some easier previous versions of the theorem. The quantity $v$ asks: "what if, instead of looking for a $2$-dimensional monochromatic sub-hypercube in each color, we allowed one color to win by just getting a $1$-dimensional sub-hypercube - a line segment?" The answer in that case is $2$: $v=2$. (In $2$ dimensions, either the color that wants a line segment gets it, or all line segments are the other color and then the other color wins.)
Then we define $z = \binom{tv}{t^{k+1}} = \binom{2 \cdot 2}{2^{0+1}} = 6$, and define $m$ in terms of a simpler instance of the theorem that in our case is just a fancy way to encode the pigeonhole principle: we'll get $m=z+1=7$. This is where we find out that the function $F(-,3)$ must be nested $7$ times.
The nesting itself happens in the definition of $v_1, v_2, \dots, v_m$, which occurs earlier, on page 273:
The whole proof is an induction, and so here we are applying a simpler case of the Graham–Rothschild theorem: with $k$-parameter sets rather than $(k+1)$-parameter sets. In the case of Graham's number, $k+1$ is $1$, so $k=0$ and we are just coloring points: each step where we go from $v_i$ to $v_{i+1}$ is an application of the Hales–Jewett theorem. At the time of writing, the best known bound on the Hales–Jewett theorem (proved again for completeness in Lemma 4 of the paper) was indeed an "ackermanic" bound of the same type as the function $F(-, 3)$.
The actual bound in Lemma 4 looks nothing like $F(-,3)$. That proof shows that the Hales–Jewett number $N(r,t)$ is at most $c_r + c_{r-1} + \dots + c_1$, where $c_r = N(r,t-1)$, $c_{r-1} = N(r^{t^{c_r}}, t-1)$, and so forth with $c_k = N(r^{t^{c_r + \dots + c_{k+1}}}, t-1)$ (see the top of p. 278 in the paper). If we were defining $c_{r-1} = N(c_r, t-1)$ and $c_{r-2} = N(c_{r-1}, t-1)$ and so on, and only taking $N(r,t)$ to be the last value $c_1$ rather than the sum, then we would get the function $F$ as defined at the end of the paper.
Ultimately, the difference is insignificant and can be ignored. Replacing $c_k$ by $r^{t^{c_k}}$ at one level of nesting is a very weak change: even in a simple power tower, it would only change the height of the power tower by a constant, which means changing the previous level of nesting by a small amount. Our expressions are going to be much worse than power towers, so the differences between different nesting functions are overwhelmed by a tiny difference in the inmost input. Therefore the most reasonable thing to do at all levels past that first level is replace the nesting function by $F(-,3)$: the simplest-to-express function with its growth rate.
At the first level, we want to be more careful, and so the way the number $12$ is picked is to ensure that the quantity $v_1 = N(L, \overline C, H, 0, 2^{2^6}, 2, 2, \dots, 2)$ is comfortably less than $F(12,3)$. I am not sure what the reasoning was there: in the modern day, we know that this quantity is much less than $F(12,3)$, but I don't know how Graham and Rothschild did it. It should be possible to get a bound like $F(16,3)$ by following the proof of Lemma 4 in the paper, and maybe with some care we can reduce the constant.
(The specific instance of the Hales–Jewett theorem taken here is that $v_1$ is the least dimension of a $4 \times 4 \times \dots \times 4$ grid such that, whenever we color it by $2^{2^6}$ colors, we can find a plane whose $16$ points are all the same color. This is equivalent to finding a monochromatic line in a coloring of a $16\times 16 \times \dots \times 16$ grid of half the dimension; in the notation of Lemma 4, we can take $v_1 = 2 \cdot N(2^{2^6}, 16)$.)