Explanation on Google’s PageRank is Webpages as Eigenvectors

498 Views Asked by At

Help understand what is the matrix A and the vector x discussed below.

Google uses the eigenvector corresponding to the maximal eigenvalue of a matrix A to determine the rank of a page for search. The idea for the PageRank algorithm, developed at Stanford University by Larry Page and Sergey Brin in 1996, was that the importance of any web page can be ap- proximated by the importance of pages that link to it. For this, they write down all web sites as a huge directed graph that shows which page links to which. PageRank computes the weight (importance) x i > 0 of a web site a i by counting the number of pages pointing to a i . Moreover, PageR- ank takes into account the importance of the web sites that link to a i . The navigation behavior of a user is then modeled by a transition matrix A of this graph that tells us with what (click) probability somebody will end up on a different web site. The matrix A has the property that for any ini- tial rank/importance vector x of a web site the sequence $x, Ax, A^2x$, . . . converges to a vector $x*$ . This vector is called the PageRank and satisfies Ax∗ = x∗ , i.e., it is an eigenvector (with corresponding eigenvalue 1 ) of A . After normalizing $x*$ , such that ||$x*$|| = 1 , we can interpret the entries as probabilities. More details and different perspectives on PageRank can be found in the original technical report (Page et al., 1999).

I suppose A is a linking from a source web page FROM which has a link(s) to another page TO. Is x like a one-hot-encoding of a web page to identify a specific page, then $Ax$ will result in multiple pages? Then the next step $A(Ax)$ will be multiple concurrent steps?

A web page is an eigenvector means that the matrix A has as many eigenvectors as the web pages existing in the world?

1

There are 1 best solutions below

0
On

The element $x_i$ of the vector $x$ is the probability that a random web surfer is visiting web page $i$ on a given time step. The elements of $x$ should sum to $1$. The $i$th element of $Ax$ is the probability that the random web surfer is on page $i$ on the next time step. The columns of $A$ are also probability vectors, and each should sum to $1$: the $j$th element of column $i$ is a transition probability, the probability that a random surfer visiting page $i$ on one time step will find herself on page $j$ on the next time step. In the simplest model, this probability is $1/L_i$ if there is a link from page $i$ to page $j$ and $0$ otherwise; $L_i$ is the number of links on page $i$. This simple model has some defects, so some tweaks are made to give the matrix better properties.

A matrix $A$, as described, will have at least one eigenvalue $1$ and all eigenvalues less than or equal to $1$. Corresponding to eigenvalue $1$ is an eigenvector $s$ satisfying $As=s$. Assuming $A$ satisfies suitable conditions (and one of the purposes of tweaking the model above is to ensure that it has those conditions), the $i$th element of $s$ represents the probability that the random surfer ends up on page $i$ after many clicks. The idea is that this probability will be higher for more important web pages. That is, the link structure of the web is being used to determine which pages are important; if random clicking leads you to a certain page with high probability, that page is important.