How to solve system of linear equations with random walk?

572 Views Asked by At

I'm reading an article, they use random walk to solve linear system (in this paragraph).

Briefly, we try to solve $Ax=b$, where $A\in M_n$, first we admit that $A=(I-\gamma P)$ where $I$ is the identity matrix. The solution is $x=A^{-1}b=(I-\gamma P)^{-1}b$, but we can use a property of the neumann series : $$(I-\gamma P)^{-1}=\sum^\infty_{s=0}\gamma^sP^s.$$

Therefore, we can rewrite the solution as : $$x=\sum^\infty_{s=0}\gamma^sP^sb$$ And the $i_0$ component of x truncated up to c terms is : $$x_{i_0}^{(c)}=\sum_{s=0}^c \gamma^s \sum_{i_1,...i_s=1}^n P_{i_0,i_1}...P_{i_{s-1}i_s} b_{i_s}.$$

They say that this expression can be evaluated by a random walk on a graph of N nodes, with the probability of going from node $i$ and node $j$ of the graph given by the matrix element $P_{i,j}$.

But I don't understand why. How can a random walk evaluate this expression ?

1

There are 1 best solutions below

2
On BEST ANSWER

First, note that the matrix $P$ isn't arbitrary, so that the values $P_{i,j}$ are non negative and $\sum_j P_{i,j} = 1$ for all $i.$ So it makes sense to talk about a random walk on the graph associated with $P.$ I assume you understand how they've derived the expression for $x_{i_0}^{(c)}$ algebraically, so let's discuss random walks and how they relate. The upshot is that $x$ can be numerically approximated by doing a bunch of random walks and keeping track of an associated 'score.' With a good source of randomness it's easy to keep track of this score and do a bunch of random walks.

When we're talking about the random walk associated to a $n \times n$ matrix $P,$ we imagine a complete weighted graph with $n$ verticies, labeled $1, 2, \cdots, n.$ The weight of the edge between the vertex $i$ and the vertex $j$ is $P_{i,j}.$ A picture appears on page $3$ in the linked paper.

Our walk will be a series of vertexes $i_0, i_1, \cdots.$ To start our walk, we pick an initial vertex $i_0.$ Then to find out $i_1$ we randomly sample an edge $(i_0,j),$ where we get $(i_0, j)$ with probability $P_{i_0,j}.$ If we sample the edge $(i_0, j)$ then we set $i_1 = j.$ We then repeat for $i_2, i_3, \ldots$ for as long as we like, each time taking a random edge from the previous position. It is perhaps helpful to visualize an ant walking on the graph.

Now, in the context of $x_{i_0}^{(c)}$ we consider a walk that starts at $i_0,$ and before each step we flip a coin that is heads with probability $\gamma.$ If the coin is tails, we stop our walk forever. If the coin is heads, we continue to the next step as usual. If we make $c$ steps successfully, we also stop our walk (this is the truncated part). If our path is $I = (i_0, i_1, i_2, \cdots i_s),$ where $len(I) = s$ is the length of our walk, then we call the score for this path $$Score(I) = \sum_j^{len(I)} b_{i_j}.$$ Ie, we get $b_{j}$ points whenever we land on $b_j.$

(side note, this is not the only sort of score/walk combination that will work. One could have all walks be of length $c$ and rescale the number of points from the $s$th step by $\gamma^s$ instead)

$x_{i_0}^{(c)}$ is then the expected score of this walk. Note that $\mathbb P(len(I) \geq s) = \gamma^s,$ as we must get $s$ heads in a row. Further note that the probability of taking a particular path $v_1, \cdots v_k$ assuming $len(I) \geq k$ is $$\mathbb P(i_1 = v_1, \cdots i_k = v_k) = \Pi_j^k \mathbb P (\text{we step from } v_{j-1} \text{ to } v_j) = \Pi_j^k P_{v_{j-1}, v_j}, $$

where $v_0 = i_0.$ With those out of the way, the expected score of our walk is then:

$$\sum_{\text{paths $I$ starting at $i_0$ }} \mathbb{P}(I) Score(I) = \sum_{\text{paths $I$ starting at $i_0$ }} \mathbb{P}(I) \sum_{s=0}^{len(I)} b_{i_j}$$ $$ = \sum_{s=0}^c \sum_{\text{paths $v$ of length $\geq s$ starting at $i_0$ }}\mathbb P (i_1 = v_1, i_2 = v_2, \ldots i_s = v_s, len(I) \geq s) b_{v_s}$$ (here we've swapped the order of summation, counting all points gained on the $s$th step) $$= \sum_s^c \gamma^s \sum_{i_0, i_1, \cdots, i_s} P_{i_0, i_1} \cdots P_{i_{s-1}, i_s} b_{i_s}.$$ (relabeling and using independence)

Finally, this is presumably interesting/useful because instead of doing the potentially unstable matrix inversion we can instead simulate a bunch of these walks and keep track of their scores. The average will tend to this expected value, ie the value $x_{i_0}^{(c)},$ which in turn tends to the components of our solution vector $x.$