I'm trying to use the ideas of power iteration to approximate stationary distributions of Markov chains. Let's say I have a left irreducible stochastic matrix $A$ (i.e., non-negative and each column sums to 1), with 1 as eigenvalue of multiplicity $1$. Then, there will be a unique positive vector $u$ with $||u||_1 = 1$ and $Au = u$, such that for each non-negative vector $b$, $$\lim_{k\to\infty} A^kb = u.$$
The rate of convergence is dominated by the second largest eigenvalue of $A$, but I don't necessarily have a way of computing it in my case.
Let's say I have a vector $b$ such that $||Ab - b||_1 \leq \varepsilon$, for some known $\varepsilon > 0$. This is used as a sort-of-good stopping condition in some textbooks, but I would like to know: is there an actual bound (depending on $\varepsilon$ or otherwise) that can be proved for the distance $||b - u||_1$?
Update 1:
I have found some references that do the following: if we let $e = Ab - b$ be the "defect vector" (i.e., the failure of $b$ to be an eigenvector of $A$), then $b$ is an eigenvector of the matrix $$A + E,$$ where $E = -e\mathbf 1^T$. From here, there are a couple of books that apply perturbation techniques to approximate the error $||b - u||$: if we set $\varepsilon E = -e\mathbf 1^T$ with $||E|| = 1$, then we may see the vector $b$ as a function of $\varepsilon$ (this actually applies to the eigenvalue as well, but in our case it is always $1$): $$(A + \varepsilon E)b(\varepsilon) = b(\varepsilon).$$
With slightly different notation, in the book Numerical mathematics (Quarteroni et al., 2007) they mention the bound

However, what I want is a bound with the $||\cdot||_1$ norm. So, additional question: is it "legal" to substitute all norms by the norm that I want? I was wondering that it might be (maybe with a modification), as this comes from this other book, An Introduction to Numerical Analyis (Atkinson, 1989), where it is stated:

There are two levels of approximations that are usually considered in these situations. One is to suppose that we know something about the magnitudes of the other eigenvalues. Then we can write $$ A = \sum_i \lambda_i v_i u^T_i $$ where $\lambda_i$ is the $i$th eigenvalue and $v_i$ and $u_i$ are its associated left and right eigenvectors, respectively. In the question, the right eigenvectors $u_i$ are normalized so that $||u_i||_1 = 1$, and it is convenient to choose the normalization of the left eigenvectors so that $v_i^T u_i = 1$. Since the left and right eigenvectors associated to different eigenvalues are orthogonal, we can write $$ A^k = \sum_k \lambda_i^k v_i u_i^T $$ and therefore $$ A^k b = \sum_k \lambda_i^k ( u_i \cdot b ) v_i $$ The first term (with $\lambda_1 = 1$ and $u_1 = u$ as in the question) is the limit for large $k$, which is $$ \lim_{k \to \infty} A^k b = ( u \cdot b ) v_1 $$ Subsequent terms, which can basically be considered error terms, have a 1-norm of $$ | \lambda_i |^k | u_i \cdot b | \ || v_i ||_1 $$
You can try to generate a useful inequality from the last equation, but almost all of the value in the inequality will come from an assumption about how much smaller $|\lambda_i|$ is than $|\lambda_1| = 1$. For example, if we let $\lambda_2 = 1 - \delta$ for some sufficiently small $\delta$, we can basically make the error as bad as we want to. So the most fruitful way to proceed is to look at the size of $\lambda_2$.
It is obviously not possible to say anything rigorous for $\lambda_2$ for arbitrary matrices, because we can always construct a matrix with a value of $\lambda_2$ that will break whatever assumptions we have made. But there are very strong results about the size of $\lambda_i$ for random matrices. And generally speaking, these results apply extremely well for most matrices when $n$ is large, so they are very fruitful for computation.
The most basic result in this field is Wigner's semicircle distribution. The short version is that if you have a large random symmetric matrix, then the largest eigenvalue will scale as $O(n)$ and the rest of the eigenvalues will have magnitude less than or equal to $O(\sqrt{n})$. The coefficients of the two depend on the mean and variance of the entries of the random matrix, respectively. Wigner's original results only apply to symmetric/Hermitian matrices, but they have been extended to the asymmetric case. There are also extensions that give a distribution for entries in the eigenvectors and so forth, but I don't recall whether there is anything useful there for your question.
In your case, this means that, if your matrix is large and you feel comfortable with thinking of it as being "approximately" random in some sense, then the rest of your eigenvalues should have magnitude less than or equal to $\frac{c}{\sqrt{n}}$, where $c$ can be found by looking at the mean and variance of the entries in your matrix. This gives you a fantastic convergence result (errors scaling like $n^{-k/2}$) pretty quickly.
I will mention two more things. First, if your matrix does have a $\lambda_2$ that is very close to $\lambda_1$, then this says that it has some very interesting structure. For example, in the community of people who study physics on complex networks, we would say that such a matrix might represent a random walk on a network with very strong "community structure." Second, there are rigorous methods for approximating the eigenvalues and eigenvectors of such matrices (see for example this paper and some of the references therein). But to apply these methods you have to make some kind of assumption about what the special structure is in your matrix, which will depend on the application you're working on.