A Bradley-Terry Artificial Neural Network Model for Individual ...

More documents

Recommendations

Info

8 Joshua Menke, Tony Martinez rate is low enough. This is because the delta rule is using gradient descent on an error surface that is strictly convex. Notice that the formulation of the delta rule in (5) does not include the derivative of the sigmoid. The sigmoid’s derivative, namely Ouput(1 − Output), is also the inverse of the derivative of a different objective function called cross-entropy. Therefore, this version of the delta rule is minimizing the cross-entropy instead of the squared-error. The cross-entropy of a model is also known as the negative log-likelihood. This is appealing because the resulting fit will produce the maximum likelihood estimator. Despite not strictly minimizing the squarederror, this method will still converge to the global minimum because the direction of the gradient for minimizing either function is always the same. In summary, the standard Bradley-Terry model can be fit by reparameterizing it as a single-layer ANN, and then training it with the delta rule. 3.2 Individial Ratings from Groups In order to extend the ANN model given in 3.1 to learn individual ratings, the weights for each group are obtained by averaging the ratings of the individuals in each group. Huang et. al (Huang et al., 2005) used the sum of each group which in the ANN model is analagous to having each individual have an input of 1 if they are on the winning team, −1 if they are on the losing team, and 0 if they are absent. This model assumes that when there are an uneven number of individuals on each team, the effect of one individual is the same in all situations. Here the average is used instead so
An ANN Model For Individual Ratings in Group Competitions 9 that the effect of a difference in the number of individuals can be handled separately. For example, the home field advantage formulation in section 3.4 uses the relative number of individuals per group as an explicit input into the ANN. Averaging instead of summing the ratings is also analogous to normalizing the inputs—a common practice in efficiently training ANNs. Neither method changes the convergence properties because the error surface is still strictly convex and the weight updates are still in the correct direction. As stated, the weights for the ANN model in 1 are obtained as follows: w A = w B = ∑ i∈A θ i (10) N ∑ A i∈B θ i (11) N B Where i ∈ A means player i belongs to group A and N A is the number of individuals in group A. Combining individual ratings in this manner means that the difference in the perfomance between two different players within a single competition can not be distinguished. However, after two players have competed within several different teams, their ratings can diverge. The weight update for each player θ i is equal to the update for ∆w team given in (8): ∀i ∈ A,∆θ i = η(1 − Output) (12) ∀i ∈ B,∆θ i = −η(1 − Output) (13) where A is the winning team and B is the losing team. In this model, instead of updating w A and w B directly, each individual receives the same
Page 1 and 2: Machine Learning manuscript No. (wi
Page 3 and 4: An ANN Model For Individual Ratings
Page 7: An ANN Model For Individual Ratings
Page 25: An ANN Model For Individual Ratings

A Bradley-Terry Artificial Neural Network Model for Individual ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?