12.07.2015 Views

The Use of the Linear Algebra by Web Search Engines - Carl Meyer

The Use of the Linear Algebra by Web Search Engines - Carl Meyer

The Use of the Linear Algebra by Web Search Engines - Carl Meyer

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

where H is a row normalized hyperlink matrix, i.e., h ij =1/|O i |, if <strong>the</strong>re is a link from page i to page j,and 0, o<strong>the</strong>rwise. Unfortunately, this iterative procedure has convergence problems—it can cycle or <strong>the</strong>limit may be dependent on <strong>the</strong> starting vector.To fix <strong>the</strong>se problems, Brin and Page revised <strong>the</strong>ir basic PageRank concept. Still using <strong>the</strong> hyperlinkstructure <strong>of</strong> <strong>the</strong> <strong>Web</strong>, <strong>the</strong>y build an irreducible aperiodic Markov chain characterized <strong>by</strong> a primitive (irreduciblewith only one eigenvalue on <strong>the</strong> spectral circle) transition probability matrix. <strong>The</strong> irreducibilityguarantees <strong>the</strong> existence <strong>of</strong> a unique stationary distribution vector π T , which becomes <strong>the</strong> PageRankvector. <strong>The</strong> power method with a primitive stochastic iteration matrix will always converge to π T independent<strong>of</strong> <strong>the</strong> starting vector, and <strong>the</strong> asymptotic rate <strong>of</strong> convergence is governed <strong>by</strong> <strong>the</strong> magnitude <strong>of</strong><strong>the</strong> subdominant eigenvalue λ 2 <strong>of</strong> <strong>the</strong> transition matrix [19].Here’s how Google turns <strong>the</strong> hyperlink structure <strong>of</strong> <strong>the</strong> <strong>Web</strong> into a primitive stochastic matrix. If<strong>the</strong>re are n pages in <strong>the</strong> <strong>Web</strong>, let H be <strong>the</strong> n × n matrix whose element h ij is <strong>the</strong> probability <strong>of</strong> movingfrom page i to page j in one click <strong>of</strong> <strong>the</strong> mouse. <strong>The</strong> simplest model is to take h ij =1/|O i |, which meansthat starting from any <strong>Web</strong> page we assume that it is equally likely to follow any <strong>of</strong> <strong>the</strong> outgoing linksto arrive at ano<strong>the</strong>r page.However, some rows <strong>of</strong> H may contain all zeros, so H is not necessarily stochastic. This occurswhenever a page contains no outlinks; many such pages exist on <strong>the</strong> <strong>Web</strong> and are called dangling nodes.An easy fix is to replace all zero rows with e T /n, where e T is <strong>the</strong> row vector <strong>of</strong> all ones. <strong>The</strong> revised(now stochastic) matrix S can be written as a rank-one update to <strong>the</strong> sparse H. Let a be <strong>the</strong> danglingnode vector in which{ 1 if page i is a dangling node,a i =0 o<strong>the</strong>rwise.<strong>The</strong>n,S = H + ae T /n.Actually, any probability vector p T > 0 with p T e = 1 can be used in place <strong>of</strong> <strong>the</strong> uniform vector e T /n.We’re not home yet because <strong>the</strong> adjustment that produces <strong>the</strong> stochastic matrix S isn’t enough toinsure <strong>the</strong> existence <strong>of</strong> a unique stationary distribution vector (needed to make PageRank well defined).Irreducibility on top <strong>of</strong> stochasticity is required. But <strong>the</strong> link structure <strong>of</strong> <strong>the</strong> <strong>Web</strong> is reducible—<strong>the</strong> <strong>Web</strong>graph is not strongly connected. Consequently, an adjustment to make S irreducible is needed. This lastadjustment brings us to <strong>the</strong> Google matrix, which is defined to beG = αS +(1− α)E,where 0 ≤ α ≤ 1 and E = ee T /n. Google eventually replaced <strong>the</strong> uniform vector e T /n with a moregeneral probability vector v T (so that E = ev T ) to allow <strong>the</strong>m <strong>the</strong> flexibility to make adjustments toPageRanks as well as to personalize <strong>the</strong>m. See [10, 15] for more about <strong>the</strong> personalization vector v T .Because G is a convex combination <strong>of</strong> <strong>the</strong> two stochastic matrices S and E, it follows that G isboth stochastic and irreducible. Fur<strong>the</strong>rmore, every node is now directly connected to every o<strong>the</strong>r node(although <strong>the</strong> probability <strong>of</strong> transition may be very small in some cases), so G > 0. Consequently, G is aprimitive matrix, and this insures that <strong>the</strong> power method π (k+1)T = π (k)T G will converge, independent<strong>of</strong> <strong>the</strong> starting vector, to a unique stationary distribution π T [19]. This is <strong>the</strong> ma<strong>the</strong>matical part <strong>of</strong>Google’s PageRank vector.3.1 <strong>The</strong> Power MethodWhile it doesn’t always excite numerical analysts, <strong>the</strong> power method has been Google’s computationalmethod <strong>of</strong> choice, and <strong>the</strong>re are some good reasons for this. First, consider iterates <strong>of</strong> <strong>the</strong> power methodapplied to G (a completely dense matrix, were it to be formed explicitly). If we take E = ev T , <strong>the</strong>nπ (k)T = π (k−1)T G = απ (k−1)T S +(1− α)v T = απ (k−1)T H +(απ (k−1)T a +(1− α))v T ,3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!