I am trying to understand the visual intuition behind this formula, beyond its formal demonstration. If I think for a moment that $X$ and $Y$ are both equal to $ \mathbb{R}^2 $, I visualize this formula like this:
I wonder then: why does Taylor's expansion have that expression? If I stop at the first order, the visual idea is clear because we go back to the definition of derivative. If I stop at the second order, I also found an intuitive explanation for me, and it is the following (using a not rigorous notation):
$$\left\{\begin{matrix} f(x_0+h)-f(x_0)\approx f'(x_0)(h)\\ f'(x_0+h)-f'(x_0)\approx f''(x_0)(h) \end{matrix}\right.\Rightarrow \\ \Rightarrow f(x_0)+f'(x_0)(h)+\frac{f''(x_0)(h,h)}{2}\approx f(x_0)+\frac{f'(x_0)(h)+f'(x_0+h)(h)}{2}$$
that is, using also the second derivative it is as if we were mediating two first derivatives calculated in two different points to reduce the approximation error. This thing makes a lot of sense, but if I go to the third order I get a less intuitive expression (and I don't know if it is correct), namely:
$$\left\{\begin{matrix} f(x_0+h)-f(x_0)\approx f'(x_0)(h)\\ f'(x_0+h)-f'(x_0)\approx f''(x_0)(h)\\ f''(x_0+h)-f''(x_0)\approx f'''(x_0)(h) \end{matrix}\right.\Rightarrow \\ \Rightarrow f(x_0)+f'(x_0)(h)+\frac{f''(x_0)(h,h)}{2}+\frac{f'''(x_0)(h,h,h)}{6}\approx \\ \approx f(x_0)+f'(x_0)(h)+\frac{f''(x_0)(h,h)}{2}+\frac{f''(x_0+h)(h,h)-f''(x_0)(h,h)}{6}\approx \\ \approx f(x_0)+f'(x_0)(h)+\frac{f'(x_0+h)(h)-f'(x_0)(h)}{2}+\frac{f'(x_0+2h)(h)-f'(x_0+h)(h)-(f'(x_0+h)(h)-f'(x_0)(h))}{6} = \\ = f(x_0)+\frac{2}{3}\cdot f'(x_0)(h)+\frac{1}{6}\cdot f'(x_0+h)(h)+\frac{1}{6}\cdot f'(x_0+2h)(h)$$
that I can't quite understand what it means.
Am I wrong in the reasoning?


Your method to rewrite the Taylor expansion points out the progressive approximations obtained by considering successive derivatives. As you correctly noted, the Taylor expansion, considered up to the second derivative, can be rewritten using your approach as $$f(x_0+h)\approx f(x_0)+\\\frac{h}{2}[f'(x_0)+ f'(x_0+h)]$$
where the last term can be interpreted as an average of the first derivative in $x_0$ and $x_0+h$, that can reduce the approximation error.
If we consider the expansion up to the third derivative, we get a better approximation of the first derivative in $x_0+h$. In fact, your method gives
$$ f(x_0+h) \approx f(x_0)+\frac{h}{2}f'(x_0)+\frac{h}{2} \cdot \frac{1}{3}\left[f'(x_0) +f'(x_0+h)+ f'(x_0+2h) \right] $$
Again, the last term can be interpreted as an average of the first derivative, calculated among the values $x_0$, $x_0+h$ and $x_0+2h$, that can reduce the approximation error.