So imagine we take $n$ random samples from a Bernoulli Trial. Thus our random samples are composed by binary random variables $X_1, X_2, ..., X_n$. So by central limit theorem we know that the distribution of $Z=\frac{\overline{X}-p}{\sigma/\sqrt{n}}$ such that $\overline{X}=\frac{X_1+X_2+...+X_n}{n}$ approximates a standard normal pdf when $n$ is big enough. So finding the probability that $Z$ lies between $-1.96$ and $1.96$ is:
$$P(-1.96\le Z\le 1.96)=P(-1.96\le \frac{\overline{X}-p}{\sigma/\sqrt{n}} \le 1.96) = 0.95$$
We also know that the standard deviation of our Binary Random Variable is $\sigma=\sqrt{p(1-p)}$. Thus:
$$P(-1.96\le \frac{\overline{X}-p}{\sqrt\frac{p(1-p)}{n}} \le 1.96) = 0.95$$
The book I'm using just replace $p$ by it's estimate $\overline{X}$ without further explanation. Why can we do that? Thus:
$$P(-1.96\le \frac{\overline{X}-p}{\sqrt\frac{\overline{X}(1-\overline{X})}{n}} \le 1.96) = 0.95$$
So transforming a little we have that:
$$P(\overline{X} -1.96\sqrt\frac{\overline{X}(1-\overline{X})}{n} \le p \le \overline{X} + 1.96 \sqrt\frac{\overline{X}(1-\overline{X})}{n}) = 0.95$$
How can we still saying that this is true with 95% of confidence? what justifies replacing $p$ by $\overline{X}$? I mean the 95% confidence interval is true when we use the population standard deviation and not some estimate. Using an unbiased estimator for the standard deviation population will only tells us that we are going to have a 95% confidence interval in the long run. So, it's the estimator $\overline{X}(1-\overline{X})$ for $\sigma^2$ even unbiased?

Yes you can do that, but observations and not random variables. So you don't have $\overline X$, but $\overline x$ (small $x_i$'s). To estimate $p$ (random variable) you use $\hat p=\frac1{n}\cdot \sum\limits_{i=1}^n x_i=\overline x$. We start with $$P(-1.96\le \frac{\overline{X}-\mu}{\frac{\sigma}{ n}} \le 1.96) = 0.95$$
Then we replace $\overline{X}$ by $p$. Both are random variables. And we replace $\mu$ by $\hat p$ and $\sigma$ by $\sqrt{n\cdot \hat p\cdot (1-\hat p)}$
$$P\left(-1.96\le \frac{p-\hat p}{\sqrt{\frac{ \hat p\cdot (1-\hat p)}{ n}}} \le 1.96\right) = 0.95$$
$$P\left(\hat p-1.96\cdot \sqrt{\frac{ \hat p\cdot (1-\hat p)}{ n}}\le p \le \hat p+1.96\cdot \sqrt{\frac{ \hat p\cdot (1-\hat p)}{ n}}\right) = 0.95$$
As written above you can replace the estimator for $p$ by the mean of observations $\overline x$, although it is not a usual notation.
$$P\left(\overline x-1.96\cdot \sqrt{\frac{ \overline x\cdot (1-\overline x)}{ n}}\le p \le \overline x+1.96\cdot \sqrt{\frac{ \overline x\cdot (1-\overline x)}{ n}}\right) = 0.95$$