Here's what I have intuition-wise so far. When we have two r.vs. $X$ and $Y$ defined on sample space $\Omega$, conditioning on $Y=y$ means that we are 'zooming-in' in the sample space (as in, the set $\{Y=y\} \subseteq \Omega$ is now the sample space we are working with).
The formula $f_{X|Y}(x|y) = \frac{f(x,y)}{f_Y(y)}$ makes complete sense to me, we need to scale the density of $X$ since our sample space become smaller. An analogy is that 2 apples out of 5 fruits should feel lighter than 2 apples out of 3 fruits.
Now, the definition of $\mathbb{E}[X|Y] = \int_{Y=y}x \cdot f_{X|Y}(x|y) dx$ makes sense, we are again finding the weighted average of $X$ restricted (and scaled) to the set $\{Y=y\} \subseteq \Omega$. Similarly, the definition of $\mathbb{D}^2[X|Y] = \mathbb{E}[(X-\mathbb{E}[X|Y])^2]$ makes total sense, we are once again measuring the 'spread' of $X$ (by measuring distance of $X$ from it's expectation), restricted (and scaled) to the set $\{Y=y\} \subseteq \Omega$.
Now, we have the Tower Formula: $\mathbb{E}[\mathbb{E}[X|Y]] = \mathbb{E}[X]$. I do not really have much to say about the tower formula, Right now I'm simply thinking that $\mathbb{E}[X|Y]$ is a function of $Y$, and the weighted average of $\mathbb{E}[X|Y]$ somehow should get back $\mathbb{E}[X]$.
However it is not clear to me why shouldn't $\mathbb{E}[\mathbb{D}^2[X|Y]] = \mathbb{D}^2[X]$ (rather, $\mathbb{E}[\mathbb{D}^2[X|Y]] = \mathbb{D}^2[X] - \mathbb{D}^2[\mathbb{E}[X|Y]]$).
Think about the joint distribution of $(X, Y)$ being that you first sample $Y = y$ from the measure $P(Y \in \cdot)$, and then sample $X$ from the measure $P(X \in \cdot \mid Y = y)$. Then for any nonnegative measurable function $f$, \begin{align} E(f(X, Y)) &= \int P(Y \in dy) \int P(X \in dx \mid Y = y) f(x, y) \\ &= \int P(Y \in dy) E(f(X, y) \mid Y = y) \\ &= E(E(f(X, Y) \mid Y)). \end{align} So if you can write the joint as above, then the tower property falls out. This amounts to asking whether there is a regular conditional distribution for $X$ given $Y$. This is true if $X, Y$ take values in Polish spaces. To get an even more general tower property, you can use the "modern" definition of conditional expectation described in chapter 4 of https://services.math.duke.edu/~rtd/PTE/pte.html