When I read a paper about distributed optimization, I saw a lemma says : suppose $X^{k+1} = AX^k$ where $X^k$ is a vector from $\mathbb{R}^n$ whose elements is $x_i^k$ and $A$ is a doubly stochastic matrix (its elements are non-negative and its rows and colums sum to 1), then we have \begin{equation} ||X^k - \frac{\sum_{i=1}^n x_i^0}{n}\textbf{1}||_2 \leq \sigma_2(A)^k ||X^0 - \frac{\sum_{i=1}^n x_i^0}{n}\textbf{1}||_2 , \end{equation} where $\textbf{1}$ is a vector with all 1 element and $\sigma_2(A)$ is the second-largest singular value of $A$.
My thought is that \begin{equation} ||X^k - \frac{\sum_{i=1}^n x_i^0}{n}\textbf{1}||_2 = ||A^kX^0 - \frac{\sum_{i=1}^n x_i^0}{n}\textbf{1}||_2 = ||A^kX^0 - \frac{\sum_{i=1}^n x_i^0}{n}A^k\textbf{1}||_2 = ||A^k(X^0 - \frac{\sum_{i=1}^n x_i^0}{n}\textbf{1})||_2 \end{equation}
The second equality comes from $A\textbf{1} = \textbf{1}$.
But I don't know how to proceed and where does the $\sigma_2(A)$ come from. Why isn't it the biggest singular value(which is 1)? Any hits? Thanks.
Since $A$ is doubly stochastic, it has a singular value decomposition $USV^T$ where $\mathbf v_1$, the first column of $V$, is parallel to $\mathbf 1$ and orthogonal to the other $n-1$ columns $\mathbf v_2,\mathbf v_3,\ldots,\mathbf v_n$.
Now let $\mathbf x=X^0-\frac{\sum_ix_i^0}{n}\mathbf1$ and $\mathbf y_k=A^k\mathbf x$ for every integer $k\ge0$. Then $\mathbf 1^T\mathbf y_k=(\mathbf 1^TA^k)\mathbf x=\mathbf1^T\mathbf x=0$, i.e. $\mathbf y_k\in\mathbf1^\perp=\operatorname{span}\{\mathbf v_2,\ldots,\mathbf v_n\}$. It follows that $\|A\mathbf y_k\|_2\le\sigma_2(A)\|\mathbf y_k\|_2$. So, inductively, we have $\|A^k\mathbf x\|_2\le\sigma_2(A)^k\|\mathbf x\|_2$.