13.07.2015 Views

Neurogammon: a neural-network backgammon program - Neural ...

Neurogammon: a neural-network backgammon program - Neural ...

Neurogammon: a neural-network backgammon program - Neural ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

appear to have learned a great deal of expert <strong>backgammon</strong> knowledge, andthe resulting <strong>program</strong> plays at a substantially higher level than conventional<strong>backgammon</strong> <strong>program</strong>s, as demonstrated by its decisive 5-0 win at the FirstComputer Olympiad.In the following, we examine in detail some of the key ingredients thatwent into the development of <strong>Neurogammon</strong>. In section 2, a “comparisonparadigm” for learning expert preferences is introduced, in which the <strong>network</strong>is told that the move preferred by the expert in a given situation should scorehigher than each of the other legal moves not selected by the expert. Thecomparison training paradigm was probably the most crucial ingredient inthe achievement of <strong>Neurogammon</strong>’s performance level, and in its victory atthe Computer Olympiad.Section 3 describes a separate <strong>network</strong> used to make doubling decisions,which was trained on a separate data base. Each position in the data basewas ranked according to a crude nine-point scale of doubling strength inmoney games. Doubling decisions were made by comparing a win probabilityestimate, based on <strong>network</strong> judgement and on racing considerations, with atheoretically optimal doubling point depending on the match score. Thewhole approach to doubling can only described as a quick and dirty “hack”done in a limited time before the Computer Olympiad, yet the consensus ofexpert opinion at the Olympiad was that <strong>Neurogammon</strong>’s doubling algorithmwas probably the strongest part of the <strong>program</strong>. Thus this was an instancewhere supervised learning lived up to the (sometimes naive) expectation thatit can be used to quickly develop a high-performance system with little efforton the part of the human <strong>program</strong>mer.In section 4, we examine <strong>Neurogammon</strong>’s performance at the ComputerOlympiad. The concluding section discusses the significance of the work, andsome future directions likely to yield further progress.2.Comparison training of expert move preferencesIn [ll], <strong>network</strong>s were trained to score single moves in isolation, based onexamples of human expert scores of moves. The input was a description ofan initial board position and a transition to a final position, and the desiredoutput was an expert’s judgement of the quality of the transition from initialto final position. The expert judgement recorded in the training data rangedfrom $100 for the best possible move to -100 for the worst possible movein a given situation. This approach is hereafter referred to as the “relativescore” paradigm (RSP).The RSP approach gave surprisingly good average performance, but sufferedfrom a number of fundamental limitations. The training data itself isproblematic, due to the arbitrary normalization, incompleteness (the humanexpert cannot comment on every possible legal move), difficulty of scoringbad moves (necessary for the <strong>network</strong>, but unusual for the human expert),and possible human errors. Furthermore, in this scheme, a sophisticatedscheme for coding the transition from initial to final board positions wasI11 - 34Authorized licensed use limited to: University of Illinois. Downloaded on August 15, 2009 at 19:24 from IEEE Xplore. Restrictions apply.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!