12.07.2015 Views

Product distribution theory for control of multi-agent systems

Product distribution theory for control of multi-agent systems

Product distribution theory for control of multi-agent systems

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

where the delta function <strong>for</strong>ces x ′ i = x i in the usual way.Now given any initial q, one may use gradient descent tosearch <strong>for</strong> the q optimizing L(q). Taking the appropriatepartial derivatives, the descent direction is given by△q i (x i ) =δLδq i (x i ) = [G|x i] + β −1 log q(x i ) + C (9)where C is a constant set to preserve the norm <strong>of</strong> the probability<strong>distribution</strong> after update, i.e., set to ensure that∫∫dx i q i (x i ) = dx i (q i (x i ) + △q i (x i )) = 1. (10)Evaluating, we find thatC = −∫ 1 ∫}dx i{[G|x i ] + β −1 log q i (x i ) . (11)dxi 1(Note that <strong>for</strong> finite X, those integrals are just sums.)To follow this gradient, we need an efficient scheme <strong>for</strong>estimation <strong>of</strong> the conditional expected G <strong>for</strong> different x i .Here we do this via Monte Carlo sampling, i.e., by repeatedlyIID sampling q and recording the resultant private utilityvalues. After using those samples to <strong>for</strong>m an estimate <strong>of</strong>the gradient <strong>for</strong> each <strong>agent</strong>, we update the <strong>agent</strong>s’ <strong>distribution</strong>saccordingly. We then start another block <strong>of</strong> IID samplingto generate estimates <strong>of</strong> the next gradients.In large <strong>systems</strong>, the sampling within any given blockcan be slow to converge. One way to speed up the convergenceis to replace each [G | x i ] with [g i | x i ], where g i isset to minimize bias plus variance [9, 11]:∫g i (x) := G(x) −dx iL −1x i∫dx′i L −1x ′ iG(x −i , x i ), (12)where L xi is the number <strong>of</strong> times the particular value x iarose in the most recent block <strong>of</strong> L samples. This is calledthe Aristocrat Utility (AU). It is a correction to the utility<strong>of</strong> the same name investigated in [13] and referencestherein.Note that evaluation <strong>of</strong> AU <strong>for</strong> <strong>agent</strong> i requires knowingG <strong>for</strong> all possible x i , with x (i) held fixed. Accordingly,we consider an approximation, which is called the WonderfulLife utility (WLU), in which we replace the valuesL −1x∫ idefining <strong>agent</strong> i’s AU with a delta function aboutdx ′ i L−1 x ′ iabout the least likely (according to q i ) <strong>of</strong> that <strong>agent</strong>’s moves.(This is version <strong>of</strong> the utility <strong>of</strong> the same name investigatedin [13] and references therein.)Below we present computer experiments validating thetheoretical predictions that AU converges faster than theteam game, and that the WLU defined here converges fasterthan its reverse, in which the delta function is centered onthe most likely <strong>of</strong> the <strong>agent</strong>’s moves.Both WLU and AU require recording not just G(x) <strong>for</strong>the Monte Carlo sample x, but also G at a set <strong>of</strong> points relatedin a particular way to x. When the functional <strong>for</strong>m <strong>of</strong>G is known, <strong>of</strong>ten there is cancellation <strong>of</strong> terms that obviatesthis need. Indeed, <strong>of</strong>ten what one does need to recordin these cases is easier to evaluate than is G. However whenthe functional <strong>for</strong>m <strong>of</strong> G is not known, using such privateutilities would require rerunning the system, i.e., evaluatingG <strong>for</strong> many points besides x.PD <strong>theory</strong> provides us an alternative way to improve theconvergence <strong>of</strong> the sampling. This alternative exploits thefact that the joint <strong>distribution</strong> <strong>of</strong> all the <strong>agent</strong>s is a product<strong>distribution</strong>. Accordingly, we can have the <strong>agent</strong>s all announcetheir separate <strong>distribution</strong>s {q i } at the end <strong>of</strong> eachblock. By itself, this is <strong>of</strong> no help. However say that x aswell as G(x) is recorded <strong>for</strong> all the samples taken so far(not just those in the preceding block). We can use this in<strong>for</strong>mationas a training set <strong>for</strong> a supervised learning algorithmthat estimates G. Again, this piece <strong>of</strong> in<strong>for</strong>mation is<strong>of</strong> no use by itself. But if we combine it with the announced{q i }, we can <strong>for</strong>m an estimate <strong>of</strong> each [G | x i ].This estimate is in addition to the estimate based on theMonte Carlo samples — here the Monte Carlo samples fromall blocks are used, to approximate G(x), rather than to directlyestimate the various [G | x i ]. Accordingly we cancombine these two estimates. Below we present computerexperiments validating this technique.4. Experiments4.1. Known world utilitiesWe first consider the case where the functional <strong>for</strong>m <strong>of</strong>the world utility is known. Technically, the specific problemthat we consider is the equilibration <strong>of</strong> a spin glass in an externalfield, where each spin has a total angular momentum3/2. The problem consists <strong>of</strong> 50 spins in a circular <strong>for</strong>mation,where every spin is interacting with three spins on itsright, three spins on its left as well as with itself. There arealso external fields which interact with each individual spindifferently. The world utility is thus <strong>of</strong> the following <strong>for</strong>m:G(x) = ∑ h i x i + ∑J ij x i x j , (13)iwhere ∑ means summing over all the interacting pairsonce. In our problem, the set elements in the set {h i } and{J ij } are generated uni<strong>for</strong>mly at random from −0.5 to 0.5.The algorithm <strong>for</strong> the Lagrangian estimation goes as follows:1. Each spin is treated as an <strong>agent</strong> which possessesa probability <strong>distribution</strong> on his set <strong>of</strong> actions:{q i (x i ) | x i ∈ σ i ≡ {−1, 1}}, which is initially set tobe uni<strong>for</strong>m.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!