12.07.2015 Views

Product distribution theory for control of multi-agent systems

Product distribution theory for control of multi-agent systems

Product distribution theory for control of multi-agent systems

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

= ∑ i∫β i [dx ∏ jq j (x j )g i (x) − ɛ i ] − S(q)(2)where the subscript on the expectation value indicates thatit evaluated under <strong>distribution</strong> q, and the {β i } are “inversetemperatures” implicitly set by the constraints on the expectedutilities.Solving, we find that the mixed strategies minimizing theLagrangian are related to each other viaq i (x i ) ∝ e −E q (i)(G|x i )where the overall proportionality constant <strong>for</strong> each i is setby normalization, and G ≡ ∑ i β ig i 2 . In Eq. 3 the probability<strong>of</strong> player i choosing pure strategy x i depends on the effect<strong>of</strong> that choice on the utilities <strong>of</strong> the other players. Thisreflects the fact that our prior knowledge concerns all theplayers equally.If we wish to focus only on the behavior <strong>of</strong> player i, it isappropriate to modify our prior knowledge. To see how todo this, first consider the case <strong>of</strong> maximal prior knowledge,in which we know the actual joint-strategy <strong>of</strong> the players,and there<strong>for</strong>e all <strong>of</strong> their expected costs. For this case, trivially,the maxent principle says we should “estimate” q asthat joint-strategy (it being the q with maximal entropy thatis consistent with our prior knowledge). The same conclusionholds if our prior knowledge also includes the expectedcost <strong>of</strong> player i.Modify this maximal set <strong>of</strong> prior knowledge by removingfrom it specification <strong>of</strong> player i’s strategy. So our priorknowledge is the mixed strategies <strong>of</strong> all players other thani, together with player i’s expected cost. We can incorporateprior knowledge <strong>of</strong> the other players’ mixed strategiesdirectly, without introducing Lagrange parameters. The resultantmaxent Lagrangian isL i (q i ) ≡ β i [ɛ i − E(g∫ i )] − S i (q i )= β i [ɛ i − dx ∏ q j (x j )g i (x)] − S i (q i )jsolved by a set <strong>of</strong> coupled Boltzmann <strong>distribution</strong>s:(3)q i (x i ) ∝ e −β iE q(i) (g i |x i ) . (4)Following Nash, we can use Brouwer’s fixed point theoremto establish that <strong>for</strong> any non-negative values {β}, there mustexist at least one product <strong>distribution</strong> given by the product<strong>of</strong> these Boltzmann <strong>distribution</strong>s (one term in the product<strong>for</strong> each i).The first term in L i is minimized by a perfectly rationalplayer. The second term is minimized by a perfectly2 The subscript q (i) on the expectation value indicates that it is evaluatedaccording the <strong>distribution</strong> ∏ j≠i q j.irrational player, i.e., by a perfectly uni<strong>for</strong>m mixed strategyq i . So β i in the maxent Lagrangian explicitly specifiesthe balance between the rational and irrational behavior <strong>of</strong>the player. In particular, <strong>for</strong> β → ∞, by minimizing the Lagrangianswe recover the Nash equilibria <strong>of</strong> the game. More<strong>for</strong>mally, in that limit the set <strong>of</strong> q that simultaneously minimizethe Lagrangians is the same as the set <strong>of</strong> delta functionsabout the Nash equilibria <strong>of</strong> the game. The same istrue <strong>for</strong> Eq. 3.Eq. 3 is just a special case <strong>of</strong> Eq. 4, where all player’sshare the same private utility, G. (Such games are knownas team games.) This relationship reflects the fact that <strong>for</strong>this case, the difference between the maxent Lagrangian andthe one in Eq. 2 is independent <strong>of</strong> q i . Due to this relationship,our guarantee <strong>of</strong> the existence <strong>of</strong> a solution to the set<strong>of</strong> maxent Lagrangians implies the existence <strong>of</strong> a solution <strong>of</strong>the <strong>for</strong>m Eq. 3. Typically players will be closer to minimizingtheir expected cost than maximizing it. For prior knowledgeconsistent with such a case, the β i are all non-negative.For each player i definef i (x, q i (x i )) ≡ β i g i (x) + ln[q i (x i )]. (5)Then we can maxent Lagrangian <strong>for</strong> player i is∫L i (q) = dx q(x)f i (x, q i (x i )). (6)Now in a bounded rational game every player sets its strategyto minimize its Lagrangian, given the strategies <strong>of</strong> theother players. In light <strong>of</strong> Eq. 6, this means that we interpreteach player in a bounded rational game as being perfectlyrational <strong>for</strong> a utility that incorporates its computationalcost. To do so we simply need to expand the domain<strong>of</strong> “cost functions” to include probability values as well asjoint moves.Often our prior knowledge will not consist <strong>of</strong> exact specification<strong>of</strong> the expected costs <strong>of</strong> the players, even if thatknowledge arises from watching the players make theirmoves. Such alternative kinds <strong>of</strong> prior knowledge are addressedin [10, 11]. Those references also demonstrate theextension <strong>of</strong> the <strong>for</strong>mulation to allow <strong>multi</strong>ple utility functions<strong>of</strong> the players, and even variable numbers <strong>of</strong> players.Also discussed there are semi-coordinate trans<strong>for</strong>mations,under which, intuitively, the moves <strong>of</strong> the <strong>agent</strong>s are modifiedto set in binding contracts.3. Optimizing the LagrangianFirst we introduce the shorthand[G | x i ] ≡ E(G∫| x i ) (7)= dx ′ δ(x ′ i − x i )G(x) ∏ q i (x ′ i), (8)j≠i

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!