I am reading 'Spectral analysis and time-series' by M.B. Priestly. In chapter 3, the auto-regressive processes have been discussed. I am having difficulty with understanding the use of backshift operator. For example, a second order autoregressive process may be written as
$X_t+a_1X_{t-1}+a_2X_{t-2}=\epsilon_t$
where, $X_t$ is the stoahstic process and $\epsilon_t$ is the white noise. The above equation can be alternatively written as
$(1+a_1B+a_2B^2)X_t=\epsilon_t$
I understand that $BX_t$ and $B^2X_t$ means $X_{t-1}$ and $X_{t-2}$. Maurice, further, assumes that the equation $(1+a_1B+a_2B^2)$ has two solutions $\mu_1$ and $\mu_2$ and writes the above equation as
$X_t=\frac{1}{(1-\mu_1B)(1-\mu_2B)}\epsilon_t$
$X_t=\frac{1}{\mu_1-\mu_2}[\frac{\mu_1}{1-\mu_1B}-\frac{\mu_2}{1-\mu_2B}]\epsilon_t$
$X_t=\frac{1}{\mu_1-\mu_2}[\sum_{s=0}^{\infty}(\mu_1^{s+1}-\mu_2^{s+1})B^s]\epsilon_t$.
How are last three equations derived. Speciafically, I don't undertand why we can take the term $(1-\mu_1B)(1-\mu_2B)$ in denominator? Why $\frac{1}{1-\mu_1B}$ can be written as $\sum_{s=0}^{\infty}\mu_1^sB^s$ unless we know that $|\mu_1B|$ is less than 1?
I have not used about backshift operator before and read about it from wikipedia article only recently.
This answer tries to shine some operator-theoretic light on the issue. I do make two key assumptions which can probably be verified by reading the text your are referencing.
Let's consider the operator $(1+a_1B+a_2B^2)$ if we (or Maurice) assume that there exist solutions $\mu_1,\mu_2$ to $a_2=\mu_1 \mu_2$ and $a_1 = -\mu_1 - \mu_2$, then we can write $$(1+a_1B+a_2B^2)=(1-\mu_1B)(1-\mu_2B).$$
If furthermore $\|\mu_i B\| < 1$ (this is an operator norm), then we get that $(I-\mu_i B)$ is an invertible operator and that $(I-\mu_i B)^{-1}=\sum_{k=1}^\infty (\mu_i B)^k$ This is the Neumann series, a generalization of the geometric series for operators. Writing this as a fraction is kind of a sloppy notation.
Furthermore, the first resolvent identity provides us with $(I-\mu_1 B)^{-1}(I-\mu_2 B)^{-1} = \frac{1}{\mu_1 - \mu_2}(\mu_1(1-\mu_1B)^{-1} - \mu_2(1-\mu_2B)^{-1}).$
To put it all together: If the $\mu_i$s exist and $\|\mu_i B\| < 1$ then $(I-\mu_iB)$ is invertible and we get from
$(1+a_1B+a_2B^2)X_t= (1-\mu_1B)(1-\mu_2B)X_t = \epsilon_t$ that
$$X_t = (I-\mu_1 B)^{-1}(I-\mu_2 B)^{-1}\epsilon_t = \frac{1}{\mu_1 - \mu_2}(\mu_1(1-\mu_1B)^{-1} - \mu_2(1-\mu_2B)^{-1})\epsilon_t = \frac{1}{\mu_1-\mu_2}[\sum_{s=0}^{\infty}(\mu_1^{s+1}-\mu_2^{s+1})B^s]\epsilon_t.$$
Unfortunately, I cannot provide proof for why $\|\mu_i B\| < 1$ (or equivalently $\frac{1}{\mu_i}\in \rho(B)$) since this depends on the choice/properties of your $a_i$s and is likely related to the stability mentioned in the other answer.