11.07.2015 Views

Monte-Carlo Tree Search and Billiards - VideoLectures

Monte-Carlo Tree Search and Billiards - VideoLectures

Monte-Carlo Tree Search and Billiards - VideoLectures

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

European Conference in Machine Learning 2009Active Learning is a Game:<strong>Monte</strong>-<strong>Carlo</strong> <strong>Tree</strong> <strong>Search</strong> <strong>and</strong><strong>Billiards</strong>P. Rolet, ML&O Team, Paris-Sud UniversityJoint work with M. Sebag <strong>and</strong> O. Teytaud


Why do we bother?Machine Learning tackles a wide range ofproblemsFor some of them, data is expensiveLabel = $$A supervised learning problem


A major Machine Learning goalReduce sample complexitywhile keeping generalization error lowMotivating application: numerical engineering=> Learn simplified models with only ~ 100 examples


That's what Active Learning does!Learning a thresholdPASSIVE:ACTIVE:Exponential improvementFreund et. Al. 97,Dasgupta 04,05, ...This can be generalized (cf <strong>Billiards</strong> later on)


How AL typically worksFind a way to measure ”information” brought byinstancesGreedily choose the most informative instancesExamples:version space split,Query-by-Committee(classification)Seung et al. 97=> Good, but notoptimal


A different perspectiveActive learning can be seen as a gameplays x1answers h*(x1)plays x2answers h*(x2)Learner(with learningstrategy S). . .T-size training set S T(h*){(x1,h*(x1)), ... , (xT,h*(xT))}T= Finite HorizonTarget Concept h*(a.k.a. Oracle)Score: Generalization ErrorThis is a Reinforcement Learning problem


Train the Active Learning PlayerInspiration from GoCoulom 06, Chaslot et al. 06,Gelly&Sliver 07Explore the gametree: MCTSValues of movesestimated by <strong>Monte</strong>-<strong>Carlo</strong> simulationsAL: Train againstsurrogate hypotheses


<strong>Monte</strong>-<strong>Carlo</strong> & UCT for gamesSimulation planningMulti-armed b<strong>and</strong>itsAssess values ofchild moves:asymetric tree growthExplore more themoves with bettervalue: UCT=> Baal Algorithm(B<strong>and</strong>it-based ActiveLearning)


UCT: Exploration Vs ExploitationUCB: balanceexploration <strong>and</strong>exploitationAuer 03UCT = UCB for treesKocsis&Szepesvari 06


The Baal Algorithm


The Baal AlgorithmValue function for states converges to=> Optimal Strategy(Proof based on the Markov DecisionProcess model)BUT:- Infinite action space- How to drawsurrogatehypotheses?


Baal: Infinite action spaceUCB is for finite action spacesHere, action space = R DProgressive widening: add instances as thenumber of simulation grows# instances ~ (# visits) ¼Coulom 07➢In a r<strong>and</strong>om order➢ In an educated orderAllows coupling with existing AL criteriasuch as VS split


Baal: draw surrogate hypothesesBilliard algorithmsRujan 97,Comets et. al. 09, ...Constraints = labeledinstancesPoint = hypothesisDomain = versionspaceSound: provably converges to uniform drawScalable w.r.t. dimension, # constraints


Some resultsPassive learningSetting:- Linear sep. of R D- Dimension : 4, 8- # queries: 15, 20X-axis: log(# sims)Y-axis: Gen. ErrorAlmost optimal AL(QbC-based)D=4, # =15D=8, # =20


Some results - 2Combining with ALcriteria (inspired fromQbC)Best of both worlds!D=4, # =15Almost optimal AL(QbC-based)D=8, # =20


To sum up ...A new approach to Active LearningAL as a GameAn approximation of the optimal strategy(provably)An Anytime algorithmPerspectives:- Kernelized Baal- Numerical engineeringapplication


Thanks for listening

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!