As I understand it, a point estimate with 95% margin of error gives you an interval centered at the point estimator with half length equal to 1.96*the standard error. And if we construct a 95% confidence interval using the same estimator, we will end up with the same interval.
But the book totally lost me when it says:
You may have noticed that the point estimator with its 95% margin of error looks very similar to a 95% confidence interval for the same parameter. This close relationship exists for most of the parameters estimates in this book, but it is not true in general. Sometimes the best point estimator for a parameter does not fall in the middle of the best confidence interval; the best confidence interval may not even be a function of the best point estimator.
Any example illustrating this point?
The terminology '95% margin of error' should be considered informal. More precisely, in a symmertical 95% confidence interval (CI) centered on the point estimate one can call the half-length of a CI the 'margin of error' and modifier '95%' suggests it's for a 95% CI.
The italicized sentence in your quote has to do with CIs that are not symmetrical. One example is the CI for the unknown population standard deviation $\sigma$ of a normal sample.
Theory for an asymmetrical CI for $\sigma.$ The background for finding a 95% CI for $\sigma$ follows. You can read it or skip to the interval itself below.
The sample variance $S^2$ is a point estimate of $\sigma^2$ and $$\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(df = n-1).$$ This means that one can use software or printed tables of the chi-squared distribution to find $U$ that cuts off the top 2.5% of the distribution and $L$ that cuts off the bottom 2.5%. Thus $$P\left( L \le \frac{(n-1)S^2}{\sigma^2} \le U \right) = 0.95.$$ After some manipulation of inequalities, we get $$P\left( \frac{(n-1)S^2}{U} \le \sigma^2 \le \frac{(n-1)S^2}{L} \right) = 0.95,$$ so that a 95% CI for $\sigma^2$ is of the form $\left( \frac{(n-1)S^2}{U},\, \frac{(n-1)S^2}{L} \right)$ and $\left(\sqrt{ \frac{(n-1)S^2}{U}},\, \sqrt{\frac{(n-1)S^2}{L}} \right)$ is a 95% CI for $\sigma.$
Computations for this CI from R statistical software are shown below:
Addendum, prompted by comment. So far, I have been talking about 'probability-symmetric' CIs, of level $1-\alpha,$ in which $\alpha/2$ is cut from each tail of the distribution. The situation can be a little different, if you allow unequal tail probabilities:
(1) 95% CI for normal $\mu:$ If $\sigma$ is known the usual 95% CI is of the form $\bar X \pm 1.96 \sigma/\sqrt{n},$ where $\sigma/\sqrt{n}$ is the 'standard error of the mean'. However, any cutoffs $L$ and $U$ with $P(L \le \frac{\bar X - \mu}{\sigma/\sqrt{n}} \le U) = 0.95$ can be used as the basis for a 95% CI: One choice is $L = -1.88$ and $U = 2.05,$ giving rise to the 95% CI $\left(\bar X - U\frac{\sigma}{\sqrt{n}},\, \bar X - L\frac{\sigma}{\sqrt{n}}\right) = \left(\bar X - 2.04\frac{\sigma}{\sqrt{n}},\, \bar X + 1.88\frac{\sigma}{\sqrt{n}} \right).$ This is seldom done because it is a little messier to compute than the standard interval and also this CI is a little longer than the usual one with $\pm 1.96.$. But it is possible and that's one reason careful authors say a 95% CI, instead of the 95% CI for $\mu.$
(2) 95% CI for normal $\sigma.$ By trial and error, one could find different tail probabilities to get $L$ and $U$ so that $P\left( L \le \frac{(n-1)S^2}{\sigma^2} \le U \right) = 0.95.$ and $|S-\sqrt{(n-1)S^2/U}| =|S-\sqrt{(n-1)S^2/L}|,$ but that would require unequal tail probabilities. That's not usually done either. [What is sometimes done is to find by trial and error tail probabilities adding to 5% that make the shortest CI for $\sigma;$ the probability-symmetric interval is seldom the very shortest CI possible.]
I'm not criticizing the italicized statement from your book. I'd classify it as incomplete rather than incorrect. But almost all such statements are incomplete to some degree. In formulating such a statement, an author has take into account the mathematical level of expected students, whether the textbook is applied or theoretical, and whether the statement is near the beginning, middle, or end of the book. I brought up CIs that are not probability-symmetric only because you asked a (good) question with a slightly messy answer.