So my text books says that if $X \sim B(n,p)$ with $np>5$ and $nq>5$ (where $q=1-p$) then $X$ can be approximated by a normal distribution of $X\sim N(\mu,\sigma^2)$ with $\mu = E(X) = np$ and $\sigma^2 = Var(X) = npq$
So I understand that if n is very large then $X$ will roughly show a normal distribution and that $X$ will have $E(X) = np$ and $Var(X) = npq$ but why must $E(X)>5$ and $nq>5$. and furthermore, what is $nq$ representing?
The CLT says the normal approximation is good for a fixed distribution when $n$ is large enough. But when you have another parameter to play with, tweaking that other parameter can slow down the convergence rate (meaning that $n$ must get larger to achieve a given error tolerance). In the case of the binomial distribution, there is a sort of complete classification:
One way to anticipate this might happen in advance is to use a quantitative refinement of the CLT such as the Berry-Esseen theorem. The Berry-Esseen theorem for the binomial distribution gives an estimate for the difference of the CDFs as $C \frac{1}{\sqrt{n}} \frac{pq^3+qp^3}{(pq)^{3/2}}$ where $0.4<C<0.5$ is a constant. The important thing is that ratio involving $p$ and $q$, which behaves as $p^{-1/2}$ as $p \to 0$ and as $q^{-1/2}$ as $q \to 0$. Thus the Berry-Esseen theorem roughly speaking bounds the error by $\frac{C'}{\sqrt{n \min \{ p,q \}}}$ where $C'$ is a new constant. If you plot the actual error you see this kind of scaling although the $C'$ given by the theorem is significantly bigger than the optimal one.
Intuitively what the Berry-Esseen theorem is capturing is that the normal approximation to a distribution is symmetric about its mean, whereas the original distribution in general is not. Thus if a distribution (with the standard deviation scaled out) is highly skewed, then $n$ must become quite large in order to mitigate the effect of this skew.