12.07.2015 Views

Product distribution theory for control of multi-agent systems

Product distribution theory for control of multi-agent systems

Product distribution theory for control of multi-agent systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2. Each <strong>agent</strong> picks its choice <strong>of</strong> state according to theprobability <strong>distribution</strong> <strong>for</strong> L times sequentially, whereL is the Monte Carlo block size. We denote the number<strong>of</strong> state x i picked by <strong>agent</strong> i by L xi . We require L xito be non-empty <strong>for</strong> all x i ∈ ω i , i.e., if some L xi = 0,we randomly pick a sample x ′ and set x i = x ′ i so thatL xi = 1. This process is to ensure that we can getconditional expected values [G|x i ] <strong>for</strong> all x i ∈ σ i . Itshould be noted though that it violates the assumptions<strong>of</strong> IID sampling underpinning the derivation <strong>of</strong> the privateutilities minimizing bias plus variance.3. The gradients <strong>for</strong> each individual component is calculatedbased on the L samples taken from the previousstep (c.f. eq. 9), and gradient descents are per<strong>for</strong>med<strong>for</strong> all i simultaneously. Since all probabilities must bepositive, <strong>for</strong> each component i, the magnitude <strong>of</strong> descentis halved if q i (x i ) is no longer positive <strong>for</strong> somex i .4. Repeat steps 2 and 3.In figure 1, we have shown a comparison <strong>of</strong> three differentways <strong>of</strong> doing the descent direction estimation in step 3above. Team game means that we use [G|x i ] to get the descentdirections, weighted Aristocratic Utility correspondsto using the <strong>for</strong>mula in eq. 12 to get the descent directions,and uni<strong>for</strong>m Aristocratic Utility corresponds to simplifyingthe functions {g i } toĝ i (x) := G(x) − 1|σ i |∑x i∈σ iG(x −i , x i ). (14)dx ′ i L−1 x ′ iIn figure 1, we see that weighted AU outper<strong>for</strong>ms uni<strong>for</strong>mAU except at β −1 = 0.2. This unexpected result atβ −1 = 0.2 may be due to the limitation on the size <strong>of</strong> L.(Recall that we have required that L xi ≠ 0, and if it everdoes, we randomly pick a sample x ′ and set x i = x ′ i sothat L xi = 1.) Hence, as shown in figure 3, the number<strong>of</strong> L xi = 1 is greater when β −1 = 0.2 than that whenβ −1 = 0.6. This demonstrates that at β −1 = 0.2, quite afew re<strong>distribution</strong>s <strong>of</strong> the samples are happening and hencethe size <strong>of</strong> L has to be enlarged to get decent statistics.The speculation is further strengthened by comparing correctWLU (where−1x L∫ idefining <strong>agent</strong> i’s AU are replacedwith a delta function about about the least likely (accordingto q i ) <strong>of</strong> that <strong>agent</strong>’s moves) and incorrect WLU(where the same quantities are replaced with a delta functionabout about the most likely (according to q i ) <strong>of</strong> that<strong>agent</strong>’s moves) with different sample size L. As shown infigures 4 and 5, the increase in sample size does amend theproblem caused by resampling.5. Unknown world utilitiesWe now consider the case where the explicit <strong>for</strong>mula <strong>for</strong>the world utility is not known and hence the calculations <strong>for</strong>WLU, uni<strong>for</strong>m AU and weighted AU are not possible. Recallthat <strong>for</strong> this case we require that each player not onlysubmits her choices <strong>of</strong> actions during each Monte Carloblock, but her probability <strong>distribution</strong> as well. Although thisbrings a constant overhead to the transmission, this becomesnegligible when L is large.The problem we consider here is a 100-<strong>agent</strong> 4-night barproblem [14]. In this problem, each <strong>agent</strong>’s strategy set consists<strong>of</strong> four elements: {1, 2, 3, 4}. The world utility is <strong>of</strong> the<strong>for</strong>m:4∑G(x) = −50 × e −f k(x)/6(15)k=1where f k (x) = ∑ i δ(x i − k), i.e., f k (x) is the number <strong>of</strong><strong>agent</strong>s attending the bar at night k. The precise algorithm isas follows:1. Each <strong>agent</strong> possesses a probability <strong>distribution</strong> on herset <strong>of</strong> actions: {q i (x i ) | x i ∈ ω i }, which is initially setto be uni<strong>for</strong>m.2. Each <strong>agent</strong> picks its state according to the probability<strong>distribution</strong> <strong>for</strong> L times sequentially, where L is theMonte Carlo block size, as well as her probability <strong>distribution</strong>{q i (x i ) | x i ∈ ω i }. Again, we require L xi tobe non-empty <strong>for</strong> all x i ∈ ω i , i.e., if some L xi = 0,we randomly pick a sample x ′ and set x i = x ′ i so thatL xi = 1.3. Denote the set <strong>of</strong> samples in the L Monte Carlo step byS, each <strong>agent</strong> generates a set <strong>of</strong> artificial data points accordingto <strong>agent</strong>s’ probability <strong>distribution</strong>s, and denotethose by A i . Then we define the following quantity:Ḡ xi := 1 − α ∑δ(x i − x ′|S|i)G(x) (16)x ′ ∈S+ α ∑δ(x i − x ′|A i |i)ĜS(x) (17)x ′ ∈A iwhere α is a weighting parameter between 0 and 1 andĜ is defined by:∑xĜ S (x) :=′ ∈S d(x, x′ )G(x ′ )∑x ′ ∈S d(x, (18)x′ )where d( . , . ) is some appropriate metric. In thepresent 100-<strong>agent</strong> 4-night bar problem, d(x, x ′ ) :=e −2×∑ 4|f k(x)−f k (x ′ )| k=1 where the functions {f k (.)}are as defined in eq. 15.4. Each <strong>agent</strong> updates her probability <strong>distribution</strong> accordingto the gradients calculated as in eq. 9 but with[G|x i ] replaced by Ḡx i. Again, <strong>for</strong> each <strong>agent</strong> i, the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!