Let $X$ be a set of $n$ vertices, and let $H$ be a hypergraph with vertex set £X£.
Call $H$ a $k$-uniform hypergraph if all hyperedges of $H$ are subsets of $X$ of size $k$.
Given a $k$-uniform hypergraph $H$ with vertex set $X$, call $Y \subset X$ a simplex in $H$ if:
$|Y| = k+1$ and $\forall Z \subset Y, |Z| = k \Rightarrow Z \in H$.
The simplex removal lemma then states that:
$\forall \epsilon > 0, \; \exists \delta >0 ; $ if $H$ is any $k$-uniform hypergraph on vertex set $X$, and at most $\delta n^{k+1}$ simplices, then one can remove at most $\epsilon n^k$ hyperedges from $H$ to obtain a hypergraph with no simplices.
I am asked to use this lemma to deduce Szemerédi's Theorem, given to me as:
$\forall \theta >0, \; r \in \mathbb N$, $\exists n$ such that $\forall X \subset \{1, \dots, n\}, \; |X| \geq \theta n,\; X$ contains an arithmetic progression of length $r$.
I am very unsure of how to go about proving this.
Obviously I need to think of a nice way to represent $\{1, \dots, n\}$, large enough subsets and arithmetic progressions, in terms of $k$-uniform hypergraphs and simplices.
My first thought is to take $\{1, \dots, n\}$ to be the vertex set of an $(r)$-uniform hypergraph, and let the hyperedges be the collections of $r$ numbers such that they are an arithmetic progression of length $r$.
Then for each integer $1 \leq d \leq \frac{n-1}{r-1}$, there are at most ${n - (r-1)d}$ arithmetic progressions in $X = \{1, \dots, n\}$ with common difference $d$.
Therefore, our hypergraph $H$ has $N = \sum^{\lfloor \frac{n-1}{r-1}\rfloor}_{d=1}{(n - (r-1)d)}$ edges.
I am a bit unsure how to proceed now because I don't know how to calculate how many simplices this hypergraph has. More importantly, thinking more about it I'm not actually sure where my approach is headed.
I can't really tell what the strategy for this solution is meant to be right now. I would appreciate any help that may be offered, thank you!
I'll describe one method, with $r=4$ which will hopefully be instructive enough that you can see how it could generalize.
A standard trick/corollary of the simplex removal lemma is that if a hypergraph has at least $\epsilon n^k$ edge-disjoint simplices, then it contains at least $\delta n^{k+1}$ simplices. Proof: use the contrapositive of the simplex removal lemma and observe that it's impossible to remove all the simplices by removing fewer than $\epsilon n^k$ hyperedges.
I'm just going to say "edge" instead of "hyperedge" from now on. Set $k=3.$ The challenge is to get each trivial AP $(a,a,a,a)\in X$ to correspond to $\Theta(n^2)$ simplices such that all these simplices are edge-disjoint. And, of course, such that any other simplices correspond to non-trivial APs.
We will use a $4$-partite hypergraph $H.$ The vertex set is the disjoint union of four copies $A,B,C,D$ of the positive integers up to $6n.$ Each AP $a,b,2b-a,3b-2a$ will end up encoded as a simplex on the vertices $(y_1,y_2,k_1,k_2)=(a+2k_1+3k_2,b+k_1+2k_2,k_1,k_2)$ for each choice $1\leq k_1,k_2\leq n.$ This linear combination is chosen such that we can recover each of $a,b,2b-a,3b-2a$ after forgetting $y_2,y_1,k_1,k_2$ respectively. Accordingly, declare:
This recovery of $a,b,2b-a,3b-2a,$ for the trivial AP case $a=b,$ proves that the edges corresponding to different trivial APs are distinct. And you can check that the $4n^2$ edges corresponding to a single choice of $a=b$ are all distinct. And any other simplices imply a non-trivial AP $(a,b,2b-a,3b-2a)\in X^4$ (possibly with $b<a,$ but then you can reverse it). So this graph does what we set out to do.