13.07.2015 Views

Neurogammon: a neural-network backgammon program - Neural ...

Neurogammon: a neural-network backgammon program - Neural ...

Neurogammon: a neural-network backgammon program - Neural ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Type of RSPnet CP nettest set (651- 12- 1) (289-12- 1)bearoff .82 .83bearin .49 .63opp. bearoff .56 .59opp. bearin .49 .61other .55 .69Table 1: Performance of nets of indicated sizes trained in the relativescore paradigm (RSP) and in the comparison paradigm (CP) on respectivetest sets, as measured by fraction of positions for which netagrees with human expert choice of best move.required, using human expert concepts such as “slotting” and “stripping”points. More fundamentally, however, in the RSP approach, the <strong>network</strong>does not learn an intrinsic understanding of the position itself, but insteadonly learns what kinds of transitions between positions are desirable. Furthermore,the <strong>network</strong> can only judge one move at a time in isolation, withoutany knowledge of what other alternative are available. The latter problemis the most serious, since from the human expert point of view, the “goodness”or “badness” of a given move often depends on what other moves areavailable.In [lo], an alternative training paradigm called the “comparison paradigm”(CP) was reported. In this paradigm, the input to the <strong>network</strong> is two of theset of possible final board states, and the desired output represents the humanexpert’s judgement as to which of these two states is better. This curesthe most fundamental problem of the RSP method, as the <strong>network</strong> alwayslearns to judge a particular move in the context of the other moves thatare available. The comparison paradigm also has a number of other advantages.There are no problems due to the expert entering an incorrect score,no problems due to arbitrary score normalization, and no problems with uncommentedmoves. Also, one can present only final board positions and thusdoes not need a sophisticated scheme to encode the transitions from initialto final positions. Finally, it was shown in [lo] that symmetry and transitivityof comparisons can be exactly enforced by a constrained architecturein which the left half of the <strong>network</strong> only looks at board position a, and theright half only looks at board position b. The constrained architecture forceseach half-<strong>network</strong> to implement a real-valued absolute evaluation functionof a single final board position. The <strong>network</strong> thus does develop an intrinsicunderstanding of the position in this training methodology.Test-set performance of RSP nets and CP nets are compared in table 1.(The nets were trained according to the methodology of [lo].) The CP netshave a much smaller and simpler input coding scheme, and have significantlyfewer connections. The CP nets also have much stronger game performance.In particular, worst-case performance is dramatically improved by comparisontraining.I11 - 35Authorized licensed use limited to: University of Illinois. Downloaded on August 15, 2009 at 19:24 from IEEE Xplore. Restrictions apply.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!