I am currently starting to read papers about Padé approximation of the matrix exponential $\exp(A)$, namely $\exp(A) \approx P_{n,m}(A)Q^{-1}_{n,m}(A)$
I am now seeking for a good motivation behind Padé approximation.
The problem with doing numerical approximation of $\exp(A)$ by taylor series is - neglecting efficiency - that there the values could be very different in the floating point arithmetic, especially very big and of the same size too so that catastrophic cancellation is possible, making the whole method unreliable. This problem still remains with Padé and there is even an additional problem, namely how well-conditioned the "inverse part" of the rational approximation is.
Can someone give me a good motivation and an intuition why Padé really can do better numerically and why it pays off to even consider inversion?
My ideas so far: $P_{n,m}(A) \approx \exp(\frac{A}{2})$, $Q_{n,m}(A) \approx \exp(-\frac{A}{2})$ and taylor of $\exp(A)$ is good for small $\|A\|$. Is this factor $2$ the only motivation? The scaling and squaring benefit is valid for both Taylor and Padé approximation.