12.07.2015 Views

Applying theory of Markov Chains to the problem of sports ranking.

Applying theory of Markov Chains to the problem of sports ranking.

Applying theory of Markov Chains to the problem of sports ranking.

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong><strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.A. Govan C. MeyerDepartment <strong>of</strong> Ma<strong>the</strong>maticsNorth Carolina State UniversityAMS Sou<strong>the</strong>astern Section Meeting, March 2007


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.OutlineGoogle’s <strong>ranking</strong> algorithm.Ranking NFL.Results and current work.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.Basics <strong>of</strong> PageRank.◮ Basic Idea: r(P ) =∑ r(Q)Q∈B Pdeg − (Q)where r(P ) is <strong>the</strong> rank <strong>of</strong> a webpage P , B P is <strong>the</strong> set <strong>of</strong>web pages pointing <strong>to</strong> P , and deg − (Q) is <strong>the</strong> outdegree <strong>of</strong>a webpage Q.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.Web digraph.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.Web digraph adjacency matrix.WWW digraph is represented by an adjacency matrix A.A =P⎛ 1 P 2 P 3 · · · P n⎞P 1 0 1 0 · · · 1P 2 0 0 0 · · · 0P 3 1 1 1 · · · 0⎜⎟. ⎝ . . . . . ⎠P n 1 0 1 · · · 1


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.Web digraph hyperlink matrix.H =P⎛ 1 P 2 P 3 · · · P n⎞11P 1 0deg − (P 10 · · ·) deg − (P 1 )P 2 0 0 0 · · · 0111P 3 deg − (P 3 ) deg − (P 3 ) deg − (P 3· · · 0) ⎜⎟. ⎝ . . . . . ⎠111P n deg − (P n)0deg − (P n)· · ·deg − (P n)


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.PageRank <strong>problem</strong> statement.◮ Basic Idea: r(P ) =∑Q∈B Pr(Q)deg − (Q)◮ Problem restated:◮ π - vec<strong>to</strong>r containing <strong>the</strong> rank scores.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.PageRank <strong>problem</strong> statement.◮ Basic Idea: r(P ) =∑Q∈B Pr(Q)deg − (Q)◮ Problem restated:◮ π - vec<strong>to</strong>r containing <strong>the</strong> rank scores.◮ π(0) - initial rank vec<strong>to</strong>r


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.PageRank <strong>problem</strong> statement.◮ Basic Idea: r(P ) =∑Q∈B Pr(Q)deg − (Q)◮ Problem restated:◮ π - vec<strong>to</strong>r containing <strong>the</strong> rank scores.◮ π(0) - initial rank vec<strong>to</strong>r◮ π T (k) =π T (k − 1)H


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.PageRank <strong>problem</strong> statement.◮ Basic Idea: r(P ) =∑Q∈B Pr(Q)deg − (Q)◮ Problem restated:◮ π - vec<strong>to</strong>r containing <strong>the</strong> rank scores.◮ π(0) - initial rank vec<strong>to</strong>r◮ π T (k) =π T (k − 1)H◮ π T (k) =π T (0)H k


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.PageRank <strong>problem</strong> statement.◮ Basic Idea: r(P ) =∑Q∈B Pr(Q)deg − (Q)◮ Problem restated:◮ π - vec<strong>to</strong>r containing <strong>the</strong> rank scores.◮ π(0) - initial rank vec<strong>to</strong>r◮ π T (k) =π T (k − 1)H◮ π T (k) =π T (0)H k◮ π T (0)H k → π ?


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.Google matrix.◮ Adjacency Matrix A.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.Google matrix.◮ Adjacency Matrix A.◮ Hyperlink Matrix H.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.Google matrix.◮ Adjacency Matrix A.◮ Hyperlink Matrix H.◮ S<strong>to</strong>chastic matrix S.◮ Replace <strong>the</strong> zero rows <strong>of</strong> H with (1/n)e T , where e is acolumn vec<strong>to</strong>r <strong>of</strong> ones.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.Google matrix.◮ Adjacency Matrix A.◮ Hyperlink Matrix H.◮ S<strong>to</strong>chastic matrix S.◮ Replace <strong>the</strong> zero rows <strong>of</strong> H with (1/n)e T , where e is acolumn vec<strong>to</strong>r <strong>of</strong> ones.◮ Google Matrix G.◮ Convex combination: G = αS + (1 − α)ev T ,α ∈ (0, 1), v T > 0 and v T e = 1.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.PageRank vec<strong>to</strong>r π.◮ G is <strong>the</strong> transition probability matrix.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.PageRank vec<strong>to</strong>r π.◮ G is <strong>the</strong> transition probability matrix.◮ G is irreducible (and aperiodic).


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.PageRank vec<strong>to</strong>r π.◮ G is <strong>the</strong> transition probability matrix.◮ G is irreducible (and aperiodic).◮ <strong>Markov</strong> <strong>Chains</strong> <strong><strong>the</strong>ory</strong> implies:π T (0)G k → π Tsuch that π T =π T G


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.PageRank vec<strong>to</strong>r π.◮ G is <strong>the</strong> transition probability matrix.◮ G is irreducible (and aperiodic).◮ <strong>Markov</strong> <strong>Chains</strong> <strong><strong>the</strong>ory</strong> implies:π T (0)G k → π Tsuch that π T =π T G◮ π is a unique probability distribution vec<strong>to</strong>r.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Google’s <strong>ranking</strong> algorithm.PageRank vec<strong>to</strong>r π.◮ G is <strong>the</strong> transition probability matrix.◮ G is irreducible (and aperiodic).◮ <strong>Markov</strong> <strong>Chains</strong> <strong><strong>the</strong>ory</strong> implies:π T (0)G k → π Tsuch that π T =π T G◮ π is a unique probability distribution vec<strong>to</strong>r.◮ π i is <strong>the</strong> PageRank score <strong>of</strong> <strong>the</strong> web page i.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Ranking NFL.NFL weighted digraph.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Ranking NFL.NFL adjacency matrix.A =Arz · · · Car Chi NO Pit TB · · ·⎛⎞Arz 0 · · · 4 0 0 0 0 · · ·· · · · · · · · · · · · · · · · · · · · · · · ·Car 0 · · · 0 10 3 0 20 · · ·Chi 0 · · · 0 0 0 12 0 · · ·· · · · · · · · · · · · · · · · · · · · · · · ·NO 0 · · · 3 0 0 0 14 · · ·· · · · · · · · · · · · · · · · · · · · · · · ·Pit 0 · · · 0 0 0 0 0 · · ·· · · · · · · · · · · · · · · · · · · · · · · ·⎜⎟TB ⎝ 0 · · · 10 3 0 0 0 · · · ⎠· · · · · · · · · · · · · · · · · · · · · · · ·


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Ranking NFL.GeM (Generalized <strong>Markov</strong> Method).◮ Adjacency matrix A.◮ Hyperlink matrix H(i, j) = ∑ tw t ij /(∑ j( ∑ tw t ij ))where wij t is <strong>the</strong> weight on <strong>the</strong> edge from team i <strong>to</strong> team jduring week t.◮ S<strong>to</strong>chastic matrix S, dealing with undefeated teams.◮ GeM matrix G = α 0 S + α 1 ev T 1 + ... + α kev T kwhere k ≥ 1.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Ranking NFL.Feature vec<strong>to</strong>rs v 1 , ..., v k .◮ Based on <strong>the</strong> statistical data <strong>of</strong> <strong>the</strong> given season.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Ranking NFL.Feature vec<strong>to</strong>rs v 1 , ..., v k .◮ Based on <strong>the</strong> statistical data <strong>of</strong> <strong>the</strong> given season.◮ Must be nonnegative.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Ranking NFL.Feature vec<strong>to</strong>rs v 1 , ..., v k .◮ Based on <strong>the</strong> statistical data <strong>of</strong> <strong>the</strong> given season.◮ Must be nonnegative.◮ Problem: What statistical data corresponds <strong>the</strong> most <strong>to</strong> <strong>the</strong>performance?


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Ranking NFL.Feature vec<strong>to</strong>rs v 1 , ..., v k .◮ Based on <strong>the</strong> statistical data <strong>of</strong> <strong>the</strong> given season.◮ Must be nonnegative.◮ Problem: What statistical data corresponds <strong>the</strong> most <strong>to</strong> <strong>the</strong>performance?◮ Start with a matrix containing statistical data for a givenseason.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Ranking NFL.Feature vec<strong>to</strong>rs v 1 , ..., v k .◮ Based on <strong>the</strong> statistical data <strong>of</strong> <strong>the</strong> given season.◮ Must be nonnegative.◮ Problem: What statistical data corresponds <strong>the</strong> most <strong>to</strong> <strong>the</strong>performance?◮ Start with a matrix containing statistical data for a givenseason.◮ SVD → no guaranty on nonnegativity.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Ranking NFL.Feature vec<strong>to</strong>rs v 1 , ..., v k .◮ Based on <strong>the</strong> statistical data <strong>of</strong> <strong>the</strong> given season.◮ Must be nonnegative.◮ Problem: What statistical data corresponds <strong>the</strong> most <strong>to</strong> <strong>the</strong>performance?◮ Start with a matrix containing statistical data for a givenseason.◮ SVD → no guaranty on nonnegativity.◮ NMF (nonnegative matrix fac<strong>to</strong>rization)


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Ranking NFL.Feature vec<strong>to</strong>rs via NMFNonnegative matrix fac<strong>to</strong>rization: Given M m×n ≥ 0,such that W ≥ 0, and H ≥ 0M = W m×k H k×nPossible uses <strong>of</strong> NMF:M j = ∑ W i h ij◮ Given appropriate M matrix (e.g. teams by stats) featurevec<strong>to</strong>rs could be <strong>the</strong> nonnegative “basis” <strong>of</strong> columns <strong>of</strong> M,i.e. columns <strong>of</strong> W.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Results and current work.Results.GeM <strong>ranking</strong> method:


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Results and current work.Results.GeM <strong>ranking</strong> method:◮ (without first two weeks) Basic GeM predicts 70% <strong>of</strong> <strong>the</strong>games played correctly during 2004 NFL regular season.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Results and current work.Results.GeM <strong>ranking</strong> method:◮ (without first two weeks) Basic GeM predicts 70% <strong>of</strong> <strong>the</strong>games played correctly during 2004 NFL regular season.◮ (without first two weeks) Basic GeM predicts 75.9% <strong>of</strong> <strong>the</strong>games played correctly during 2005 NFL regular season.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Results and current work.Results.GeM <strong>ranking</strong> method:◮ (without first two weeks) Basic GeM predicts 70% <strong>of</strong> <strong>the</strong>games played correctly during 2004 NFL regular season.◮ (without first two weeks) Basic GeM predicts 75.9% <strong>of</strong> <strong>the</strong>games played correctly during 2005 NFL regular season.◮ (without first two weeks) Basic GeM predicts 62% <strong>of</strong> <strong>the</strong>games played correctly during 2006 NFL regular season.


<strong>Applying</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>Markov</strong> <strong>Chains</strong> <strong>to</strong> <strong>the</strong> <strong>problem</strong> <strong>of</strong> <strong>sports</strong> <strong>ranking</strong>.Results and current work.Summary◮ Expanding <strong>to</strong> bigger data set - NCAA men’s basketball.◮ Experimenting with NMF <strong>to</strong> obtain feature vec<strong>to</strong>rs.◮ Moving beyond <strong>sports</strong> (recommendation systems).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!