Suppose that we have a sequence of random square matrices $\{A_n\}_{n\ge1}$ that converge to $0$ in probability as $n\to\infty$, i.e. $\|A_n\|\to0$ in probability as $n\to\infty$, where $\|\cdot\|$ is the operator norm (or it might be any other norm since all norms are equivalent in the finite dimensional setting). I would like to use the following identity in my computations $$ (I-A_n)^{-1}=\sum_{k=0}^\infty A_n^k. $$ However, the series on the right hand side has to converge (this is a Neumann series) because otherwise the identity does not make sense. Intuitively, since $A_n\to0$ in probability as $n\to\infty$, it seems that for large values of $n$ the identity should hold with a high probability. However, I am not sure if it is possible to make this argument rigorous.
Is it possible to rigorously justify the use of the identity in this setting? For example, can we say that the identity holds for large values of $n$ since $\|A_n\|<1$ almost surely for large values of $n$?
Any help is much appreciated!
It's not true that $\|A_n\| < 1$ almost surely.
But if $\|A_n\| \to 0$ in probability, for any $\delta > 0$ there is some $N$ such that $\mathbb P(\|A_n\| < 1) > 1 - \delta$ for all $n > N$. Thus if $n > N$, the series $\sum_{k=0}^{\infty} A_n^k$ converges with probability $> 1-\delta$.
However, though the probability of convergence for any particular $n > N$ is large, it could be that almost surely there are infinitely many $n$ for which the series does not converge. For example, this would be the case if $A_n$ were independent with $\mathbb P(\sum_k A_n^k\ \text{diverges}) = 1/n$. See second Borel-Cantelli lemma