I solved the first order two-scale approximation of $\ddot y + y + \epsilon y^3=0$ ($y(0)=\alpha , \dot y(0)=0$) to be $\alpha \cos(t(1+\frac{3\alpha^2}{8}\epsilon))$. Then I put that equation into periodic standard form using $u=y$, $v=\dot y$, $u=a \cos t + b \sin t$, $v=-a\sin t +b\cos t$ yielding: $$\dot a=\epsilon\sin t(a\cos t + b\sin t)^3$$ $$\dot b=-\epsilon\cos t(a\cos t + b\sin t)^3$$
Then I averaged out one cycle to get: $$\frac1{2\pi}\int_0^{2\pi}\dot a \,dt = \epsilon\frac{ 3 b}{8}(a^2+b^2)$$ $$\frac1{2\pi}\int_0^{2\pi}\dot b \,dt = \epsilon\frac{- 3 a}{8}(a^2+b^2)$$
So if we take $a$ and $b$ to be $\sin$ and $\cos$ respectively, the $a^2 +b^2$ goes away and we can quickly solve to see $b=\alpha cos(\frac38\epsilon t)$ and $a=0$ because of the boundary conditions.
So my two results from the two sets of coordinates are $\alpha\cos(t(1+\frac{3\alpha^2}8\epsilon))$ and $\alpha\cos(\frac38\epsilon t)$.
These are very similar. But why aren't my two estimations equal?
To be a bit more precise: you're using a particular form of multi-scale approximation, namely the Poincaré-Lindstedt method. The leading order term of the expansion is, as you calculated, equal to \begin{equation} \alpha\,\cos\left( \left(1+\epsilon \frac{3 \alpha^2}{8}\right)t\right). \end{equation} The second method you're using is known as (periodic) averaging. Denoting the slow variable as $\tau$, so $\tau = \epsilon t$, we get the dynamical system \begin{align} \frac{\text{d} a}{\text{d} \tau} &= \frac{3}{8} b (a^2+b^2),\\ \frac{\text{d} b}{\text{d} \tau} &= -\frac{3}{8} a (a^2+b^2). \end{align} So far, so good. Now, I don't understand what you mean by 'we take $a$ and $b$ to be sin and cos, so the $a^2+b^2$ goes away'. You might mean something like the following.
We see that the dynamical system contains terms of the form $a^2+b^2$. Therefore, it seems a good idea to try the coordinate transformation \begin{align} a(\tau) &= r(\tau)\,\cos(\phi(\tau)),\\ b(\tau) &= r(\tau)\,\sin(\phi(\tau)), \end{align} because then $a^2 + b^2 = r^2$. In these coordinates, the dynamical system indeed simplifies considerably, and we get \begin{align} \frac{\text{d} r}{\text{d} \tau} &= 0,\\ \frac{\text{d} \phi}{\text{d} \tau} &= - \frac{3}{8} r^2. \end{align} The original initial conditions $y(0) = \alpha, y'(0) = 0$ imply $a(0) = \alpha$ and $b(0) = 0$, which implies $r(0) = \alpha$ and $\phi(0) = 0$ (or $r(0) = -\alpha$ and $\phi(0) = \pi$, but you can show that it boils down to the same thing). Since $r(\tau)$ is constant, we have \begin{equation} r(\tau) = \alpha, \end{equation} which implies \begin{equation} \phi(\tau) = -\frac{3 \alpha^2}{8}\tau. \end{equation} Putting all this information together, we obtain the leading order expression \begin{align} y = u = a(\tau) \cos(t) + b(\tau) \sin(t) &= \alpha \cos(t) \cos\left(-\frac{3 \alpha^2}{8} \epsilon t\right) + \alpha \sin(t) \sin\left(-\frac{3 \alpha^2}{8} \epsilon t\right) \\ &= \alpha \cos \left(t + \frac{3 \alpha^2}{8} \epsilon t \right) \end{align} by the angle sum formulas for $\sin$ and $\cos$.