Im looking to understand how to bound the following. Let $S\subseteq \{0,1\}^n$ be an arbitrary subset of the Boolean cube. What is the largest $|S|$ such that for every $x,y\in S$, we have that $d(x,y)\geq n\cdot \delta$ (for arbitrary ranges of $\alpha$), where $d(x,y)=\sum_i [x_i\neq y_i]$.
Plotkin bound seems to give something for $S$ which is a linear code (also it applies only for restricted ranges of distance, not generically for all $\delta$), but apart from that, i dont see what else bounds we have.
Well, this is the general coding theory problem and doesn't have a closed form solution, though some bounds do exist.
By the way your arbitrary subset $S$ is a code, though not necessarily linear. So the Plotkin bound applies.
A subset $S$ which is a subspace of $\mathbb{F}_2^n$ is a binary linear code. The best upper bound will depend on $\alpha$ and also whether you are interested in asymptotics.
There is a nice short chapter describing what is known here. The quantity $\alpha_2(x)$ is what you are interested in. In general unless $\alpha$ is very small the best upper bound is the so-called MRRW (4 Americans) bound of McEliece-Rumsey-Rudemich-Welch obtained by linear programming. Here is a plot from that Chapter, the lower bound there is Gilbert-Varshamov, and all the others are upper bounds.
At low rates, Hamming is the best. The bound is on the exponent therefore it is $$ |S|\leq 2^{\alpha_2(\delta)\cdot n}, $$ as you point out.