I'm reading the following from page 60 of Information theory and Machine Learning. My questions are the following
- Under the assumption that $\lambda \ll 20$ why is $\bar{x}-1$ a good estimator, could someone add the detail explaining it?
- What kind of ad hoc binning techniques would work for $\lambda\gg20$ ?
I know the author merely introduces his thought process so that the supervisor eventually leads him to a bayesian way of thinking but i'm interested in why his logic for these particular cases works even if the solution is not a unifying one. Thanks!

If $\lambda$ is small so $\mathbb P(X \ge 20) \approx 0$, you can say $$\mathbb E[X \mid 1 \lt X \lt 20] \approx \mathbb E[X \mid 1 \lt X ] = \lambda +1$$ by the memoryless property of the exponential distribution, and that may suggest $\hat{\lambda} = \overline{x}-1$ as an estimator using this truncated data
In general $$\mathbb E[X \mid 1 \lt X \lt 20] = \dfrac{\int_1^{20} \frac{x}{\lambda} e^{-x / \lambda} dx}{\int_1^{20} \frac{1}{\lambda} e^{-x / \lambda} dx}= \lambda + 1 -\dfrac{19}{e^{19/\lambda}-1}$$
which for very large $\lambda$ is close to $\dfrac{21}{2} - \dfrac{361}{12\lambda}$ so it might suggest something like $\hat{\lambda} = \dfrac{361}{126 - 12\overline{x}}$ as a possible approximate estimator using this truncated data, though noting that this will produce nonsense in cases where $\overline{x}\ge 10.5$
As an illustration of binning: if you observe $Y$ lengths in a bin between $1$ and $10.5$ and $Z$ in a bin from $10.5$ to $20$, then $\mathbb{E}[Y] = e^{19/(2\lambda)}\mathbb{E}[Z]$, so a possible estimator for $\lambda$ is $\hat{\lambda} = \dfrac{19}{2(\log_e(Y) - \log_e(Z))}$, though this will produce nonsense if $Y=0$ or $Z=0$ or $Y \le Z$
You can avoid these risks of nonsense with a proper Bayesian prior, though observations which might otherwise lead to nonsense are likely to give posterior distributions constrained by that prior and in particular any prior upper limit on $\lambda$