Applying theory of Markov Chains to the problem of sports ranking.

Applying theory of Markov Chains to the problem of sports ranking.Applying theory of Markov Chains to theproblem of sports ranking.A. Govan C. MeyerDepartment of MathematicsNorth Carolina State UniversityAMS Southeastern Section Meeting, March 2007

Applying theory of Markov Chains to the problem of sports ranking.OutlineGoogle’s ranking algorithm.Ranking NFL.Results and current work.

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.Basics of PageRank.◮ Basic Idea: r(P ) =∑ r(Q)Q∈B Pdeg − (Q)where r(P ) is the rank of a webpage P , B P is the set ofweb pages pointing to P , and deg − (Q) is the outdegree ofa webpage Q.

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.Web digraph.

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.Web digraph adjacency matrix.WWW digraph is represented by an adjacency matrix A.A =P⎛ 1 P 2 P 3 · · · P n⎞P 1 0 1 0 · · · 1P 2 0 0 0 · · · 0P 3 1 1 1 · · · 0⎜⎟. ⎝ . . . . . ⎠P n 1 0 1 · · · 1

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.Web digraph hyperlink matrix.H =P⎛ 1 P 2 P 3 · · · P n⎞11P 1 0deg − (P 10 · · ·) deg − (P 1 )P 2 0 0 0 · · · 0111P 3 deg − (P 3 ) deg − (P 3 ) deg − (P 3· · · 0) ⎜⎟. ⎝ . . . . . ⎠111P n deg − (P n)0deg − (P n)· · ·deg − (P n)

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.PageRank problem statement.◮ Basic Idea: r(P ) =∑Q∈B Pr(Q)deg − (Q)◮ Problem restated:◮ π - vector containing the rank scores.

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.PageRank problem statement.◮ Basic Idea: r(P ) =∑Q∈B Pr(Q)deg − (Q)◮ Problem restated:◮ π - vector containing the rank scores.◮ π(0) - initial rank vector

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.PageRank problem statement.◮ Basic Idea: r(P ) =∑Q∈B Pr(Q)deg − (Q)◮ Problem restated:◮ π - vector containing the rank scores.◮ π(0) - initial rank vector◮ π T (k) =π T (k − 1)H

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.PageRank problem statement.◮ Basic Idea: r(P ) =∑Q∈B Pr(Q)deg − (Q)◮ Problem restated:◮ π - vector containing the rank scores.◮ π(0) - initial rank vector◮ π T (k) =π T (k − 1)H◮ π T (k) =π T (0)H k

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.PageRank problem statement.◮ Basic Idea: r(P ) =∑Q∈B Pr(Q)deg − (Q)◮ Problem restated:◮ π - vector containing the rank scores.◮ π(0) - initial rank vector◮ π T (k) =π T (k − 1)H◮ π T (k) =π T (0)H k◮ π T (0)H k → π ?

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.Google matrix.◮ Adjacency Matrix A.

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.Google matrix.◮ Adjacency Matrix A.◮ Hyperlink Matrix H.

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.Google matrix.◮ Adjacency Matrix A.◮ Hyperlink Matrix H.◮ Stochastic matrix S.◮ Replace the zero rows of H with (1/n)e T , where e is acolumn vector of ones.

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.Google matrix.◮ Adjacency Matrix A.◮ Hyperlink Matrix H.◮ Stochastic matrix S.◮ Replace the zero rows of H with (1/n)e T , where e is acolumn vector of ones.◮ Google Matrix G.◮ Convex combination: G = αS + (1 − α)ev T ,α ∈ (0, 1), v T > 0 and v T e = 1.

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.PageRank vector π.◮ G is the transition probability matrix.

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.PageRank vector π.◮ G is the transition probability matrix.◮ G is irreducible (and aperiodic).

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.PageRank vector π.◮ G is the transition probability matrix.◮ G is irreducible (and aperiodic).◮ Markov Chains theory implies:π T (0)G k → π Tsuch that π T =π T G

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.PageRank vector π.◮ G is the transition probability matrix.◮ G is irreducible (and aperiodic).◮ Markov Chains theory implies:π T (0)G k → π Tsuch that π T =π T G◮ π is a unique probability distribution vector.

Applying theory of Markov Chains to the problem of sports ranking.Google’s ranking algorithm.PageRank vector π.◮ G is the transition probability matrix.◮ G is irreducible (and aperiodic).◮ Markov Chains theory implies:π T (0)G k → π Tsuch that π T =π T G◮ π is a unique probability distribution vector.◮ π i is the PageRank score of the web page i.

Applying theory of Markov Chains to the problem of sports ranking.Ranking NFL.NFL weighted digraph.

Applying theory of Markov Chains to the problem of sports ranking.Ranking NFL.NFL adjacency matrix.A =Arz · · · Car Chi NO Pit TB · · ·⎛⎞Arz 0 · · · 4 0 0 0 0 · · ·· · · · · · · · · · · · · · · · · · · · · · · ·Car 0 · · · 0 10 3 0 20 · · ·Chi 0 · · · 0 0 0 12 0 · · ·· · · · · · · · · · · · · · · · · · · · · · · ·NO 0 · · · 3 0 0 0 14 · · ·· · · · · · · · · · · · · · · · · · · · · · · ·Pit 0 · · · 0 0 0 0 0 · · ·· · · · · · · · · · · · · · · · · · · · · · · ·⎜⎟TB ⎝ 0 · · · 10 3 0 0 0 · · · ⎠· · · · · · · · · · · · · · · · · · · · · · · ·

Applying theory of Markov Chains to the problem of sports ranking.Ranking NFL.GeM (Generalized Markov Method).◮ Adjacency matrix A.◮ Hyperlink matrix H(i, j) = ∑ tw t ij /(∑ j( ∑ tw t ij ))where wij t is the weight on the edge from team i to team jduring week t.◮ Stochastic matrix S, dealing with undefeated teams.◮ GeM matrix G = α 0 S + α 1 ev T 1 + ... + α kev T kwhere k ≥ 1.

Applying theory of Markov Chains to the problem of sports ranking.Ranking NFL.Feature vectors v 1 , ..., v k .◮ Based on the statistical data of the given season.

Applying theory of Markov Chains to the problem of sports ranking.Ranking NFL.Feature vectors v 1 , ..., v k .◮ Based on the statistical data of the given season.◮ Must be nonnegative.

Applying theory of Markov Chains to the problem of sports ranking.Ranking NFL.Feature vectors v 1 , ..., v k .◮ Based on the statistical data of the given season.◮ Must be nonnegative.◮ Problem: What statistical data corresponds the most to theperformance?

Applying theory of Markov Chains to the problem of sports ranking.Ranking NFL.Feature vectors v 1 , ..., v k .◮ Based on the statistical data of the given season.◮ Must be nonnegative.◮ Problem: What statistical data corresponds the most to theperformance?◮ Start with a matrix containing statistical data for a givenseason.

Applying theory of Markov Chains to the problem of sports ranking.Ranking NFL.Feature vectors v 1 , ..., v k .◮ Based on the statistical data of the given season.◮ Must be nonnegative.◮ Problem: What statistical data corresponds the most to theperformance?◮ Start with a matrix containing statistical data for a givenseason.◮ SVD → no guaranty on nonnegativity.

Applying theory of Markov Chains to the problem of sports ranking.Ranking NFL.Feature vectors v 1 , ..., v k .◮ Based on the statistical data of the given season.◮ Must be nonnegative.◮ Problem: What statistical data corresponds the most to theperformance?◮ Start with a matrix containing statistical data for a givenseason.◮ SVD → no guaranty on nonnegativity.◮ NMF (nonnegative matrix factorization)

Applying theory of Markov Chains to the problem of sports ranking.Ranking NFL.Feature vectors via NMFNonnegative matrix factorization: Given M m×n ≥ 0,such that W ≥ 0, and H ≥ 0M = W m×k H k×nPossible uses of NMF:M j = ∑ W i h ij◮ Given appropriate M matrix (e.g. teams by stats) featurevectors could be the nonnegative “basis” of columns of M,i.e. columns of W.

Applying theory of Markov Chains to the problem of sports ranking.Results and current work.Results.GeM ranking method:

Applying theory of Markov Chains to the problem of sports ranking.Results and current work.Results.GeM ranking method:◮ (without first two weeks) Basic GeM predicts 70% of thegames played correctly during 2004 NFL regular season.

Applying theory of Markov Chains to the problem of sports ranking.Results and current work.Results.GeM ranking method:◮ (without first two weeks) Basic GeM predicts 70% of thegames played correctly during 2004 NFL regular season.◮ (without first two weeks) Basic GeM predicts 75.9% of thegames played correctly during 2005 NFL regular season.

Applying theory of Markov Chains to the problem of sports ranking.Results and current work.Results.GeM ranking method:◮ (without first two weeks) Basic GeM predicts 70% of thegames played correctly during 2004 NFL regular season.◮ (without first two weeks) Basic GeM predicts 75.9% of thegames played correctly during 2005 NFL regular season.◮ (without first two weeks) Basic GeM predicts 62% of thegames played correctly during 2006 NFL regular season.

Applying theory of Markov Chains to the problem of sports ranking.Results and current work.Summary◮ Expanding to bigger data set - NCAA men’s basketball.◮ Experimenting with NMF to obtain feature vectors.◮ Moving beyond sports (recommendation systems).

Applying theory of Markov Chains to the problem of sports ranking.

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?