Let $X,Y$ be jointly defined random variables, $X$ being a finite second-moment random variable for which we wish to estimate, and $Y$ being the random variable for which we have observed.
Then $H(Y)$ with finite second moment is a so-called optimal estimator if it satisfies $\mathbb{E}[(X - H(Y))^2] \leq \mathbb{E}[(X - \hat{H}(Y))^2]$ where $\hat{H}$ is any other finite second moment function of $Y$.
It is well known that $$H(Y) = \mathbb{E}(X|Y)$$
However, I was unable to find any intuitive examples of these optimal estimator.
For example, suppose $X,Y$ has the joint distribution $$f_{XY}(x,y) = 2\exp(-x)\exp(-y), 0 \leq y \leq x < \infty$$
One can show $$f_{X|Y}(x|y) = \exp(-x)\exp(y)$$ and hence $$\mathbb{E}(X|Y) = Y+1$$
It is unclear to me why $\mathbb{E}(X|Y) = Y+1$ (i.e., a linear function plus an offset of $1$). The marginal distributions are $$f_X(x) = 2\exp(x)(1-\exp(-x)), \quad f_Y(y) = 2\exp(-2y)$$
and these distributions also don't seem to tell me why $\mathbb{E}(X|Y) = Y+1$.
Is there an intuitive way of explaining why the conditional expectation takes on the form of an affine function?
You can gain some intuition about why $\mathbb{E}(X|Y)$ is the minimum mean squared error (MMSE) estimator, by first considering the case where you want to estimate $X$ without any observation. In this case, obviously, the estimate $\hat{X}$ of $X$ should be selected as a fixed value. How should this value be selected to minimize the MSE?
Well, the latter is equal to (real-valued case) \begin{align} \mathsf{MSE} &= \mathbb{E}[(X-\hat{X})^2]\\ &= \mathbb{E}(X^2) - 2 \hat{X} \mathbb{E}(X) + \hat{X}^2. \end{align}
The last expression is easily seen to be minimized for $\hat{X} = \mathbb{E}(X)$, i.e.,
Given the above, it may now come as no big surprise that
Regarding the form of $\mathbb{E}(X|Y)$, I refer you to the comment by @spaceisdarkgreen