10.07.2015 Views

From One to Many: Planning for Loosely Coupled Multi-Agent Systems

From One to Many: Planning for Loosely Coupled Multi-Agent Systems

From One to Many: Planning for Loosely Coupled Multi-Agent Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>From</strong> <strong>One</strong> <strong>to</strong> <strong>Many</strong>: <strong>Planning</strong> <strong>for</strong> <strong>Loosely</strong> <strong>Coupled</strong> <strong>Multi</strong>-<strong>Agent</strong> <strong>Systems</strong>Ronen I. BrafmanDepartment of Computer ScienceBen-Gurion Universitybrafman@cs.bgu.ac.ilCarmel DomshlakFaculty of Industrial Engineering and ManagementTechniondcarmel@ie.technion.ac.ilAbstract<strong>Loosely</strong> coupled multi-agent systems are perceived as easier<strong>to</strong> plan <strong>for</strong> because they require less coordination betweenagent sub-plans. In this paper we set out <strong>to</strong> <strong>for</strong>malize thisintuition. We establish an upper bound on the complexityof multi-agent planning problems that depends exponentiallyon two parameters quantifying the level of agents’ coupling,and on these parameters only. The first parameter is problemindependent,and it measures the inherent level of couplingwithin the system. The second is problem-specific and it has<strong>to</strong> do with the minmax number of action-commitments peragent required <strong>to</strong> solve the problem. Most importantly, the directdependence on the number of agents, on the overall sizeof the problem, and on the length of the agents’ plans, is onlypolynomial. This result is obtained using a new algorithmicmethodology which we call “planning as CSP+planning”.We believe this <strong>to</strong> be one of the first <strong>for</strong>mal results <strong>to</strong> bothquantify the notion of agents’ coupling and <strong>to</strong> demonstrate atractable planning algorithm <strong>for</strong> fixed coupling levels.IntroductionSuppose that we seek a plan <strong>for</strong> a system consisting of a cooperativeset of agents, each with its own capabilities. Towhat extent would (centralized) planning <strong>for</strong> such a multiagent(MA) system be harder than solving individual planningproblems over the domains of each of the agents in isolation?Intuitively, the answer <strong>to</strong> this question should dependboth on the actual problem in hand, as well as on thedesign of the MA system. Clearly, if the agents are tightlycoupled, then planning <strong>for</strong> a MA system can become exponentiallyharder than individual, internal planning <strong>for</strong> eachagent, because we must basically treat the system as a single,large entity. On the other extreme, planning <strong>for</strong> a completelydecoupled system of agents will merely require solvinga few independent single-agent planning problems, andwould thus incur at most a linear fac<strong>to</strong>r over planning <strong>for</strong> theindividual agents themselves. But what lies in between?Intuitively, we would expect planning <strong>to</strong> become easierthe more loosely coupled the system is, and we seek analgorithm that can take advantage of such loose coupling.However, “loose coupling” itself is a loose concept, and <strong>to</strong>concretize it, we need <strong>to</strong> identify a set of <strong>for</strong>mal parametersquantifying the “coupling level” of MA systems. Then,we must show either that the worst-case time complexity ofplanning <strong>for</strong> such systems can be <strong>for</strong>mulated in terms ofthese parameters, or that empirical run-time complexity ofplanning <strong>for</strong> such systems correlates with these parameters(or, of course, both). The <strong>for</strong>mer is what we set out <strong>to</strong> do inthis paper.A discussion of planning problems and their complexitymust occur within some <strong>for</strong>mal model. In this work we considera minimalistic state-transition model expressed via theSTRIPS classical planning language (Fikes & Nilsson 1971),slightly extended <strong>to</strong> associate actions with agents. To capturethe level of interaction between agents we define andexploit the agent interaction (di)graph in which two agentsare connected if one agent’s action affects the functionalityof the other agent. We show that the worst-case time complexityof planning <strong>for</strong> a MA system can be tied <strong>to</strong> the treewidthof this graph: the lower it is, the less dependent agentsare on one another when they desire <strong>to</strong> change their state.However, as we will see, the situation is a bit more complex.Besides a dependence on the structure of the system,there is also dependence on the properties of the concreteproblem the system must solve. Some problems cannot besolved without much coordination between the agents, evenif each agent interacts with just a few, or even one, otheragents. The latter is a problem-specific parameter that corresponds<strong>to</strong> the number of actions executed by each agentthat influence or are influenced by other agents in the system.Thus, one coupling parameter, denoted by ω, roughlycorresponds <strong>to</strong> a measure of the number of agents that eachagent must coordinate with (because they have the potential<strong>to</strong> influence her) and the other, denoted by δ, corresponds <strong>to</strong>the number of actions involving other agents that an agentmust insert in<strong>to</strong> its plan.The above two parameters are intuitive, but are insufficient<strong>to</strong> <strong>for</strong>malize the worst-case time complexity of MAplanning because specific problem/domain properties mayaffect the cost of single agent planning in each domain.Thus, our result is <strong>for</strong>mulated in terms of the overhead ofplanning <strong>for</strong> a multi-agent system as a function of planning<strong>for</strong> each single agent in isolation. As noted above, this canmove from an exponential overhead <strong>to</strong> a linear one. Ourmain contribution is <strong>to</strong> provide an algorithm that can gracefullymove between these two extremes as a function of thecoupling level of the system. We provide an algorithm <strong>for</strong>planning <strong>for</strong> a MA-system that is (worst-case) harder thanICAPS'08 <strong>Multi</strong>agent <strong>Planning</strong> Workshop


planning <strong>for</strong> each of its agents in isolation by a fac<strong>to</strong>r exponentialonly in the tree-width of the agent interactiongraph and the maximal number of coordination points anagent must have. Most importantly, the direct dependenceon (i) the overall size of the planning problem, (ii) the numberof agents, and the length of the joint, and even individual,plans, is only polynomial. In other words, if the coupling parametersremain fixed while the number of agents increases,the cost of planning will increase only polynomially!In our work, we build upon, combine, and extend theideas underlying two recent proposals <strong>for</strong> fac<strong>to</strong>red singleagentplanning by Brafman and Domshlak (2006) and Amirand Engelhardt (2003). Our key extension corresponds <strong>to</strong> anew algorithmic methodology <strong>for</strong> planning, which we refer<strong>to</strong> as “planning as CSP+planning”. In particular, in contrast<strong>to</strong> the above works on fac<strong>to</strong>red planning, this methodologyallows us <strong>to</strong> handle efficiently MA planning problems thatrequire arbitrary long individual agent plans, provided thenumber of coordination points per agent is kept fixed. Overall,this gives us some of the first tractability results <strong>for</strong> nonhierarchicalMA planning, and a <strong>for</strong>mal characterization ofcoupling level and its effect on the hardness of MA planning.Moreover, although our discussion is in terms of centralizedplanning, the algorithm we provide is based on solving an inherentlydistributed CSP. Thus, using any of the many algorithms<strong>for</strong> distributed constraint satisfaction (Yokoo 2001),one obtains a distributed planning algorithm. And the underlyingideas go well beyond the simple STRIPS action model.The paper is structured as follows. We start by definingthe basic multi-agent planning model used. This definitionnaturally induces the notions of private/internal vs. publicactions of an agent. It also leads <strong>to</strong> the definition of the agentinteraction graph of the MA planning domain. Then, in themain section of this paper we show how <strong>to</strong> solve MA planningusing an enhancement of planning as CSP, which wecall planning as CSP+<strong>Planning</strong>. In this context, the problemof planning with landmarks arises naturally, and we showhow <strong>to</strong> reduce this problem <strong>to</strong> a standard planning problem.After describing our planning algorithm, we analyzeits complexity. Following this we re-examine the algorithmand modify it somewhat <strong>to</strong> get improved complexity.<strong>Multi</strong>-<strong>Agent</strong> “Classical <strong>Planning</strong>” ModelWe consider planning <strong>for</strong> cooperative MA systems in whichagents act under complete in<strong>for</strong>mation and via deterministicactions. Specifically, we consider problems expressiblein a minimalistic MA-extension of the STRIPS language(Fikes & Nilsson 1971). In particular, the problemsconsidered here comprise the seminal au<strong>to</strong>mata-based multientitymodels (Moses & Tennenholtz 1995). In what follows,we <strong>for</strong>malize this extension of STRIPS, as well as someof its useful derivatives that we then employ in the problemsolvingpart of the paper.Definition 1 An MA-STRIPS problem <strong>for</strong> a system ofagents Φ = {ϕ i } k i=1 is given by a quadruple Π =〈P, {A i } k i=1 , I, G〉, where:• P is a finite set of a<strong>to</strong>ms (also called propositions), I ⊆ Pencodes the initial situation, and G ⊆ P encodes the goalconditions,• For 1 ≤ 1 ≤ k, A i is the set of actions that the agent ϕ iis capable of per<strong>for</strong>ming. Each action a ∈ A = ⋃ A ihas the standard STRIPS syntax and semantics, that is,a = 〈pre(a), add(a), del(a)〉 is given by its preconditions,add effects, and delete effects.Clearly, MA-STRIPS reduces <strong>to</strong> STRIPS exactly when k =1. For ease of presentation, we assume that the individualaction sets of the agents are disjoint, i.e., no two agents sharean identical action. This assumption is easy <strong>to</strong> eliminate, aswe explain later in the paper.To illustrate the MA-STRIPS model, consider the wellknownLogistics domain in which a set of packages shouldbe moved on a (possibly complex) roadmap from their initial<strong>to</strong> their target locations using a given fleet of vehiclessuch as trucks, airplanes, etc. The packages can be loadedon<strong>to</strong> and unloaded off the vehicles, and each vehicle canmove along a certain subset of road segments. It is quitenatural <strong>to</strong> model this domain using MA-STRIPS by associatingan a<strong>to</strong>m with each package location on the map and inthe vehicles, and with each truck location on the map. Theaction schema are move, load, and unload, with the suitableparameters (e.g., move(truck, origin, destination)and load(package, truck, at-location)). Associating eachtruck with an agent, we might assign <strong>to</strong> this agent all themove, load, and unload actions in which it is involved.(Note that disjointness of agents’ action set is not problematichere as load(P, T, L) and load(P, T ′ , L) are two differentactions in A.)We now focus on dependencies that such a MA-STRIPSproblem Π induces on the agents Φ. In what follows, we⋃use eff(a) as a shortcut <strong>for</strong> add(a) ∪ del(a). Let P i =a∈A ipre(a) ∪ eff(a) be the set of all a<strong>to</strong>ms affected byand/or affecting the actions of the agent ϕ i . By internala<strong>to</strong>ms and public a<strong>to</strong>ms of agent ϕ i we refer <strong>to</strong> the subsetsPiint = P i \ ⋃ ϕ P j∈Φ\{ϕ i} j, and P pubi = P i \ Piint , respectively.That is, if p ∈ Piint , then other agents can neitherachieve nor destroy nor even require p. Clearly, the internala<strong>to</strong>ms of all the agents are pair-wise disjoint, and theremight be certain a<strong>to</strong>ms that are internal <strong>to</strong> no agent. In ourexample, all possible truck locations are a<strong>to</strong>ms internal <strong>to</strong>the truck agent, while package locations are public if theycan be loaded/unloaded in these locations in<strong>to</strong>/from morethan one vehicle.Using this notion of an agent’s internal a<strong>to</strong>ms, we can nowdefine the partition A i = A inti ∪ A pubi of agent actions in<strong>to</strong>its internal and public actions, respectively, whereA inti= {a | a ∈ A i , pre(a) ∪ eff(a) ⊆ P inti }.That is, A inti is the set of all actions whose description containsonly internal a<strong>to</strong>ms of ϕ i , while all other actions of ϕ iare public. In our example, all the move actions are certainlyinternal <strong>to</strong> the respective vehicle agents, while load, unloadactions are public just if they affect the position of a packagein some of its public locations. Given an action a ofagent ϕ i , we use a| int <strong>to</strong> denote the projection of a on<strong>to</strong> itsICAPS'08 <strong>Multi</strong>agent <strong>Planning</strong> Workshop


private conditions, that is, a| int = 〈pre(a) ∩ Piint , add(a) ∩Piint , del(a)∩Piint 〉. If a ∈ A inti , then a = a| int , but otherwisea| int may have fewer conditions.Finally, we introduce the notion of agent interaction digraphIG Π that plays a key role in the algorithmic part of thes<strong>to</strong>ry. The nodes of IG Π correspond <strong>to</strong> the system’s agentsΦ. There is a directed edge from node ϕ i <strong>to</strong> node ϕ j inIG Π if there exist actions a i ∈ A i and a j ∈ A j such thateff(a i ) ∩ pre(a j ) ≠ ∅. That is, an edge from ϕ i <strong>to</strong> ϕ j indicatesthat ϕ i either supplies or destroys a condition requiredby ϕ j . It is possible, of course, that there are edges in bothdirections between ϕ i and ϕ j .It worth noting the connection between agent-interactiongraph and the well known causal graph which plays an importantrole in the work of Brafman and Domshlak (2006)on fac<strong>to</strong>red planning. The nodes of the causal graph correspond<strong>to</strong> domain variables, and an edge connects node p<strong>to</strong> q if there exists an action a such that p ∈ pre(a) andq ∈ eff(a). Thus, the causal graph is a special instance ofthe agent-interaction graph when each agent is associatedwith a proposition and its set of actions contains all actionsthat influence the value of this proposition.<strong>Planning</strong> as CSP+<strong>Planning</strong>We now proceed <strong>to</strong> consider the algorithmic alternatives<strong>for</strong> solving a given MA-STRIPS problem Π =〈P, {A i } k i=1 , I, G〉. Obviously, one can simply compile itin<strong>to</strong> an equivalent “single-agent” STRIPS planning problem〈P, A, I, G〉 and apply some state-of-the-art algorithm <strong>for</strong>this task. This compilation, however, hides away the originalproblem decomposition induced by the agents Φ. Inparticular, the worst-case time complexity of solving Π thisway is independent of the structure and some other propertiesthat may naturally be induced by the agent coalitionΦ over the planning problem in hand. Specifically,the worst-case time complexity of leading approaches <strong>to</strong>STRIPS planning is either unbounded (<strong>for</strong> local search procedures),or exponential in the size of the problem description(<strong>for</strong> standard planning-as-CSP approaches), or exponentialin the length of the shortest plan (<strong>for</strong> BFS-style procedures).The exceptions would be only some recently-proposed algorithms<strong>for</strong> fac<strong>to</strong>red planning (Amir & Engelhardt 2003;Brafman & Domshlak 2006; Kelareva et al. 2007) that webuild upon in our work here. The MA-STRIPS solving frameworkwe propose here combines some technical ideas underlyingtwo such fac<strong>to</strong>red planning algorithms of (Brafman &Domshlak 2006) and (Amir & Engelhardt 2003), and extendsthem <strong>to</strong> target loose agents’ coupling, which we believe<strong>to</strong> be a natural property of practical MA systems.Coordination-Centric <strong>Planning</strong>Consider some plan ρ <strong>for</strong> Π and an agent ϕ i involved in it.Let the individual sub-plan ρ i of ϕ i be the order-preservingprojection of ρ on<strong>to</strong> ϕ i ’s set of actions A i . Let a i1 , . . . , a imbe the public actions in ρ i (in their order of appearance); betweeneach adjacent actions a ij , a ij+1 we have a (possiblyempty) sequence of internal actions of ϕ i . While, in principle,it is possible that the agent has no internal actions, onewould expect <strong>to</strong> encounter many such actions in a system ofsubstantially au<strong>to</strong>nomous agents. Thus, we can view eachagent’s plan as a sequence of coordination (or commitment)points, i.e., points in which it executes actions that possiblyinfluence or are influenced by other agents directly, andin between them, actions that do not affect other agents directly.As an example, consider the Logistics domain describedearlier, and recall that move actions are internal <strong>to</strong> thevehicle-operating agents. If the vehicles move on a complexmap, requiring many map-point <strong>to</strong> map-point movements inbetween load and unload actions, then between every twoactions requiring coordination, there would be many internalmove actions. Another example would be the Rover domainthat modelsaaa NASA’s exploration rovers (Bresina etal. 2002). Imagine a set of rovers that explore a particularregion. The public actions would be actions that carry out anexperiment at a location, such as taking a measurement or apho<strong>to</strong>. These actions are public because they affect (goal)propositions that can be affected by many other rovers (e.g.,some other rovers can also take these measurements or pictures).However, the individual plans of the rovers consistmostly of actions like moving from one location <strong>to</strong> another,tracking an object, extending the arm, warming up devices,placing instruments, calibrating instruments, etc. All theseare internal actions that affect only the rover’s internal state,and typically many of them come in between each pair ofpublic actions.Given this expectation from the MA planning problems,a promising idea should be <strong>to</strong> shift emphasis <strong>to</strong> the coordinationpoints in the search, and let the agents “fill-inthe details” on their own. In fact, this intuitive principleis already adopted one way or another in many domainspecificmulti-agent (and, in particular, multi-robot) systems(Durfee 1999). Likewise, this principle lies in theheart of (both domain-specific and general-purpose) hierarchicalplanning systems (e.g., (Erol, Hendler, & Nao 1994;Knoblock 1994; Clement, Durfee, & Barrett 2007)). Ourobjective here is <strong>to</strong> operationalize this principle in a generic,domain-independent manner in systems that do not necessarilyexhibit substantial hierarchy among the agents. Wenow explain how this works.First, suppose that we know how many coordinationpoints each agent requires in order <strong>to</strong> solve the planningproblem. In that case, we can(1) guess how these coordination points look like, that is,what public actions are executed in them and when, and(2) <strong>for</strong> each agent, add internal actions between its coordinationpoints <strong>to</strong> provide their respective internal preconditions,obtaining a legal joint plan <strong>for</strong> the system.The latter task requires each agent <strong>to</strong> plan “in-between” itscoordination points, adding internal actions that take it fromthe state following one public action <strong>to</strong> a state in which thenext public action can be executed. In addition, differentagents’ individual sub-plans must be consistent—if an agentsub-plan calls <strong>for</strong> executing an action that requires some precondition<strong>to</strong> hold, then (i) either this or another agent mustproduce this precondition in time, and (ii) no agent is al-ICAPS'08 <strong>Multi</strong>agent <strong>Planning</strong> Workshop


lowed <strong>to</strong> destroy this precondition. For example, if the agen<strong>to</strong>f truck T decides <strong>to</strong> load a package P in location L, thenP should either be in L from the beginning, or be somehowbrought <strong>to</strong> L in time, and, in any case, no other vehicleshould be allowed <strong>to</strong> grab this package from L be<strong>for</strong>e T .Of course, we have no way of guessing correctly howmany coordination points there are and what their contentis. On the other hand, we can try searching over all possibleguesses, checking whether a guess can be extendedin<strong>to</strong> a complete plan. The time complexity is directly related<strong>to</strong> this: <strong>to</strong> find coordination points we per<strong>for</strong>m iterativedeepening over the number of coordination points,which requires time exponential in the number of coordinationpoints. If we are not careful, a naive such iterativedeepening is exponential in the <strong>to</strong>tal number of coordinationpoints among all agents; that would be very problematic asthis parameter is expected <strong>to</strong> grow at least as fast as the numberof agents in the system. The good news is that, with care,we can reduce the time complexity <strong>to</strong> be exponential onlyin the number of coordination points required by a singleagent. This number will be dominated by the agent that requiresthe most coordination points, and of course, we willseek <strong>to</strong> minimize this number. Note that this parameter isproblem-specific because it depends on both the initial stateand the goal of the MA system.CSP and (Intra-<strong>Agent</strong>) <strong>Planning</strong> with LandmarksWe now describe a concrete procedure <strong>for</strong> extending achoice of coordination points in<strong>to</strong> a globally consistent planthat corresponds <strong>to</strong> a certain combination of constraint satisfactionand planningIn general, a constraint satisfaction problem (Dechter2003) is defined via a set of variables, U = {u i } n i=1 ,with respective domains {D i } n i=1 , and set of constraints{c i } m i=1 . Each constraint c i is associated with a subset ofvariables {u i1 , . . . , u il(i) }, and defines a subset of tuplesC i ⊆ D i1 × · · · × D il(i) <strong>to</strong> be the set of allowable joint assignments<strong>to</strong> these variables. An assignment 〈{θ 1 , . . . , θ k }〉<strong>to</strong> U is a satisfying assignment if its projection <strong>to</strong> the domainof each constraint satisfies that constraint, that is, if〈θ i1 , . . . , θ il(i) 〉 ∈ C i .Now, assume that we allow each agent at most δ ≥ 0 coordinationpoints. Thus, the <strong>to</strong>tal number of coordinationpoints across the system is at most kδ (recall that k is thenumber of agents). Given this explicit constraint on solvingΠ, we define a constraint satisfaction problem CSP Π;δ overk variables U = {u i } k i=1 , one <strong>for</strong> each agent ϕ i. Each suchvariable, u i , represents the agent’s choice of coordinationpoints. That is, its domain consists of different choices theagent could make <strong>for</strong> the choice of coordination points. Eachsuch choice consists of an action <strong>to</strong> execute and a time <strong>to</strong> executethis action. Thus, it is most convenient <strong>to</strong> view u i as avec<strong>to</strong>r of length δ, representing a sequence of δ coordinationpoints. Each entry in this vec<strong>to</strong>r is either empty (because theagent may need fewer than δ coordination points), or is assigneda pair of the <strong>for</strong>m (a, t), where a is public action ofϕ i , and t ∈ {1, 2, . . . , kδ} is an abstract time point at whichϕ i commits <strong>to</strong> per<strong>for</strong>ming a. 1Our next step is <strong>to</strong> pose constraints on u i such that any solution<strong>to</strong> the CSP defined by these variables and constraintscan be extended in<strong>to</strong> a legal plan <strong>for</strong> Π just if there exists alegal plan satisfying the explicit δ-bound on each agent coordination.To make things simpler and more uni<strong>for</strong>m, weassume the existence of some dummy agent that has a pairof actions producing the initial state at abstract time 0, andconsuming the goal state at abstract time kδ + 1.Our first constraint takes care of verifying the consistencyof an agent’s commitments with those of other agents affectingits non-private values.(C1) Coordination Constraint.An assignment 〈θ 1 , . . . , θ k 〉 <strong>to</strong> U satisfies C1 iff, <strong>for</strong>1 ≤ i ≤ n, (a, t) ∈ θ i implies that, <strong>for</strong> each publicprecondition p ∈ P pubi of a holds• <strong>for</strong> some u j , and some (a ′ , t ′ ) ∈ θ j , holds p ∈add(a ′ ) and t ′ < t, (i.e., “someone supplies p be<strong>for</strong>et”) and• <strong>for</strong> no u l we have (a ′′ , t ′′ ) ∈ θ l with p ∈ del(a ′′ ) andt ′ ≤ t ′′ ≤ t (i.e., “no one destroys p between t ′ andt”).For example, if u T represents a truck T and a =load(P, T, L), then, if (a, t) appears in the sequence of u T ,then either the agent of T or some other agent should makesure that package P gets <strong>to</strong> location L at t ′ < t, and noother agent picks it up from there within the correspondingabstract time interval [t ′ , t].Our second constraint is posed over the internal part of thecoordination-point actions <strong>to</strong> ensure that the agent is capableof supporting its own commitments. That is, the agent mustbe able <strong>to</strong> generate internal actions ensuring that the internalpreconditions of the (public) actions it has committed <strong>to</strong> areachieved, and in the right order. To specify this constraint,we begin by <strong>for</strong>malizing a special type of single-agent planningproblem which we call a STRIPS problem with actionlandmarks.Definition 2 A STRIPS problem with action landmarks isgiven by a tuple Π L = 〈P, A, I, G, σ〉 where• P , A, I, and G have the standard STRIPS semantics ofa<strong>to</strong>ms, actions, initial state, and goal, respectively.• σ = 〈a 1 , . . . , a |σ| 〉 is a sequence of action instances fromA ′ , where A ′ is defined (similarly <strong>to</strong> A) in terms of P .A sequence ρ of actions from A∪A ′ is a plan <strong>for</strong> Π L just if (i)ρ is a plan <strong>for</strong> the regular STRIPS problem 〈P, A∪A ′ , I, G〉,and (ii) σ is a subsequence of ρ.In<strong>for</strong>mally, our objective in a STRIPS problem with actionlandmarks is <strong>to</strong> solve it in the standard sense while ensuringthat the solution contains a certain sequence of actions.This sequence of actions may be disjoint from the regularactions in A, though it does not have <strong>to</strong> be. In our case the1 The “abstractness” of time points is crucial, but it is explainedand motivated later. For now, the reader may consider these asregular time points on some discrete scale.ICAPS'08 <strong>Multi</strong>agent <strong>Planning</strong> Workshop


actions of σ will be projections of public actions on<strong>to</strong> theirinternal preconditions. Note that planning with action landmarksis meaningful even in the absence of a clear end-goalG. In fact, this is exactly our usage of planning with actionlandmarks in the specification of the internal-planningconstraint below.(C2) Internal-<strong>Planning</strong> Constraint.An assignment 〈θ 1 , . . . , θ k 〉 <strong>to</strong> U satisfies C2 iff, <strong>for</strong>each θ i = 〈(a θi1 , t 1), . . . , (a θiδ , t δ)〉, the STRIPS problemwith action landmarks〈P i , A inti , I ∩ P i , ∅, 〈a θi1 | int, . . . , a θiδ | int〉〉is solvable.Notice that C2 induces a set of unary constraints overU— it constrains each agent’s coordination-point sequencein isolation, and it does not depend on the actions of otheragents. However, unlike typical unary constraints, these areprocedural unary constraints over each u i in the <strong>for</strong>m of asingle-agent planning problem of a certain <strong>for</strong>m. We nowsee clearly how CSP and planning are combined – CSP isused <strong>to</strong> ensure the inter-variable consistency, while planningis used <strong>to</strong> ensure intra-variable consistency (i.e., legal values<strong>for</strong> each u i ).Putting things <strong>to</strong>gether, the high-level skele<strong>to</strong>n of our algorithm<strong>for</strong> MA planning problems is depicted below.procedure MA-planning (Π over agents ϕ 1, . . . , ϕ k )δ := 1loopConstruct CSP Π;δ over u 1, . . . , u k .if ( solve-csp(CSP Π;δ ) ) thenReconstruct a plan ρ from a solution <strong>for</strong> CSP Π;δ .return ρelseδ := δ + 1endloopThe MA-planning algorithm per<strong>for</strong>ms an infinite loop.Each iteration, it increments the (upper-bound on the) lengthδ of the coordination sequences. Within the loop, the algorithmconstructs the constraint satisfaction problem CSP Π;δalong the constraints C1 and C2, and checks its satisfiability.Flow-wise, this algorithm is similar <strong>to</strong> the iterativedepeningalgorithm <strong>for</strong> (single-agent) fac<strong>to</strong>red planning ofBrafman and Domshlak (2006), with the (as shown below,crucial) difference being in the constraint satisfaction problemschecked within the loop. Theorems 1 and 2 provide thecorrectness properties of the algorithm.Theorem 1 (Soundness) Given a MA-STRIPS problemΠ = 〈P, {A i } k i=1 , I, G〉, and an upper bound δ on thenumber of coordination points per agent, if an assignment〈θ 1 , . . . , θ n 〉 is a satisfying assignment <strong>to</strong> CSP Π;δ , then itcan be extended in<strong>to</strong> a legal plan <strong>for</strong> Π.Theorem 2 (Completeness) Given a solvable MA-STRIPSproblem Π, there exists δ ≥ 0, such that CSP Π;δ is solvable.The proof of Theorem 1 requires taking care of numeroustechnical details, but conceptually it is quite straight<strong>for</strong>ward.Satisfaction of the planning-based constraint C2implies “conditional” validity of individual agents’ plans,while these conditions are verified by the constraint C1. Thelatter corresponds <strong>to</strong> the standard partial-order causal-link(POCL) constraints of flaw prevention, while the standardordering constraints of POP are replaced with associatingactions with explicit time points (as is done, e.g., in temporalPOCL algorithms such as CPT (Vidal & Geffner 2006)).Finally, goal-achievement of the action sequence inducedby 〈θ 1 , . . . , θ n 〉 is ensured by our schematic addition of thedummy “goal-achiving” agent. The same line of reasoningunderlies the (simpler) proof of Theorem 2.ComplexityWe now proceed <strong>to</strong> consider the time complexity of theMA-planning algorithm. In<strong>for</strong>mally, this complexity corresponds<strong>to</strong> the number of times we need <strong>to</strong> verify thata certain choice of coordination-sequence length <strong>for</strong>ms abasis <strong>for</strong> a solution times the complexity of the verificationprocess. In other words, the time complexity ofMA-planning is captured by the time complexity of solvingthe CSP+planning problems CSP Π;δ . CSPs are a wellstudiedproblem, and we have a relatively good understandingof their complexity. The most relevant result <strong>for</strong> ourpurpose is that CSPs can be solved in time polynomial inthe problem size, and exponential in the tree-width of theinduced constraint graph (Dechter 2003). The constraintgraph is an undirected graph whose nodes correspond <strong>to</strong> theCSP variables, and there is an edge between u i and u j justif both participate in some constraint c. In<strong>for</strong>mally, the treewidthof a graph is a measure of its “cliquishness,” or howtightly coupled its nodes are (Seymour & Thomas 1993).For example, the tree-width of a tree is 1, regardless of itssize, whereas the tree-width of a complete graph over nnodes is n.Let δ denote the minimal coordination-sequence lengthunder which a solution exists. Given that, there are at mostδ coordination points <strong>for</strong> each of the k agents, which mightall be executed at different time points, and each such coordinationpoint corresponds <strong>to</strong> a public action of one of theagents. Thus, the domain D i of each CSP variable u i ofCSP Π;δ captures|D i | =δ∑d=1( ) kδd· |A pubi | d = O((kδ|A pubi |) δ+1 ) (1)possible coordination sequences, where the first multiplicativeterm within the summation captures the choice of d ≤ δtime points, and the second term captures the choice ofpublic-action sequence of length d.The complexity of en<strong>for</strong>cing the unary internal-planningconstraints C2 is O(f(I) ∑ ki=1 |D i|), where I is the maximalcomplexity of the individual planning <strong>for</strong> each agentin Φ, and f(·) captures the cost of switching from regularplanning. If we let D denote max k i=1 D i then thiscan be written as O(f(I)kD), where D as well satisfiesD = O((kδ|A pubi |) δ+1 ). Note that the C2 constraints areunary constraints and they could be en<strong>for</strong>ced “offline”, resultingin an equivalent CSP with reduced variable domains.ICAPS'08 <strong>Multi</strong>agent <strong>Planning</strong> Workshop


In turn, if CG Π;δ is the constraint graph of CSP Π;δ , thenchecking the coordination constraint C1 can be done in timeO(kD ω+1 ), where D = max k i=1 D i, and ω is the tree-widthof CG Π;δ (Dechter 2003). Hence, we can conclude:Theorem 3 The overall complexity of solving CSP Π;δ isO ( f(I) · k(kδ|A pub |) δ+1 + k(kδ|A pub |) δω+ɛ) (2)The first term of the summation is the cumulative complexityof the single-agent sub-problems, and the second term is thecomplexity of extending single-agent plans <strong>to</strong> a joint MAplan,with ɛ = δ + ω + 1 being the dominated fac<strong>to</strong>r in theexponent.Finally, we would like <strong>to</strong> establish a concrete connectionbetween the tree-width ω of the constraint graph CG Π;δ andthe <strong>to</strong>pology of the MA system. In Lemma 1 below we doexactly that by connecting between the structure of CG Π;δand that of the agent interaction graph IG Π . The implicationis that this parameter can already be known <strong>to</strong> us at systemdesign time and does not depend on the particular planningproblem solved.Lemma 1 For any MA-STRIPS problem Π, and any δ > 0,the constraint graph CG Π;δ induced by the constraints C1-C2 is independent of δ, and is isomorphic <strong>to</strong> the moral graphof IG Π .A moral graph of a digraph G is obtained by removingthe edge directions, and adding an edge between each pair of(original) parents of each node of G. Sketching the proof ofLemma 1, note that the edges of the constraint graph CG Π;δare only due <strong>to</strong> the coordination constraints C1. Thus, thereis an edge between ϕ i and ϕ j either (A) if ϕ i has publicactions affecting preconditions of some public actions of ϕ j(or vice versa), or (B) if ϕ i and ϕ j both have public actionsaffecting (either positively or negatively) preconditions of(possibly different) public actions of some third agent ϕ l ∈Φ. Given that, the bijective node mapping ∀i : u i ↦→ ϕ iestablishes an isomorphism between CG Π;δ and the moralgraph of IG Π ; edges (A) and (B) of CG Π;δ are mapped <strong>to</strong>the original edges of IG Π and the edges connecting betweenthe nodes’ parents, respectively.DiscussionConsidering the worst-case time complexity of MA-STRIPSplanning as a function of the time complexity I of STRIPSplanning<strong>for</strong> each of the system’s agents, we have shown thatthe <strong>for</strong>mer can be upper-bounded bythat is, by thef(I) · exp(δ) + exp(δω)• fac<strong>to</strong>r f(·) induced by requesting each agent <strong>to</strong> plan whilecommitting <strong>to</strong> a certain sequence of actions,• multiplicative fac<strong>to</strong>r exponential only in δ, the minmaxnumber of per-agent commitments, and• additive (!) fac<strong>to</strong>r exponential only in δω, where ω is thetree-width of moral graph of the agent interaction graph.Here, ω and δ provide quantitative measures of the coupling“levels” of the system in general, and of the concrete probleminstance, respectively. Note that, putting aside <strong>for</strong> a momentthe fac<strong>to</strong>r f(·) of intra-agent planning, the complexityof MA-planning(1) has no direct exponential dependence on the number ofagents, k,(2) has neither direct exponential dependence on the size|Π| of the MA planning problem, nor such dependenceon the length of a joint plan <strong>for</strong> it (and this in contrast <strong>to</strong>standard planning techniques), and(3) has no direct exponential dependence on the length ofindividual agent plans, in contrast <strong>to</strong> the recent fac<strong>to</strong>redplanning techniques we build upon (Amir & Engelhardt2003; Brafman & Domshlak 2006)).Having read this far, the reader may rightfully commentthat planning <strong>for</strong> each individual agent can already be exponentialin the overall size of the problem. Indeed, if someof the domains of individual agents have size comparable <strong>to</strong>that of the whole multi-agent system, that is, |P i | = Θ(|P |),the whole discussion of multi-agent planning complexityseems like a waste of time, as some of the individual planningproblems are about as hard as the problem of planning<strong>for</strong> the entire system. In that case, treating the system as asingle entity is likely <strong>to</strong> be more profitable.More natural and interesting settings correspond <strong>to</strong> systemsin which each agent’s domain is not <strong>to</strong>o large, and thecomplexity of the system stems from the existence of manysuch interacting agents. In such systems we would expectthe number of internal a<strong>to</strong>ms of each agent <strong>to</strong> be relativelysmall – that is, constant or O(log |P |). Now, planning <strong>for</strong>a single agent, even if exponential in log |P |, is still polynomialin P . In many MA systems this appears <strong>to</strong> be thecase. For example, in the Rovers domain mentioned be<strong>for</strong>e,individual agents are often designed <strong>to</strong> fulfill certainwell-defined roles, and their internal combina<strong>to</strong>rics can naturallyend-up being simple. In fact, this is one of the majorpromises in devising heterogeneous MA systems: “<strong>One</strong> ofthe powerful motivations <strong>for</strong> distributed problem solving isthat it is difficult <strong>to</strong> build artifacts (or train humans) <strong>to</strong> becompetent in every possible task. Moreover, even if it feasible<strong>to</strong> build (or train) an omni-capable agent, it is oftenoverkill because, at any given time, most of those capabilitieswill go <strong>to</strong> waste. The strategy in human systems, andadopted in many distributed problem-solving systems, is <strong>to</strong>bring <strong>to</strong>gether on demand combinations of specialists in differentareas <strong>to</strong> combine their expertise <strong>to</strong> solve problems thatare beyond their individual capabilities.” (Durfee 1999). Anice example of this approach in the context of planning andscheduling has been proposed in (Wilkins & Myers 1998),where sophisticated systems <strong>for</strong> planning and scheduling aredecomposed in<strong>to</strong> modules, each of which is trans<strong>for</strong>med in<strong>to</strong>an agent, allowing experimentation with different degrees ofcoupling between the planning and scheduling capabilities.Finally, let us consider closely the planning-withlandmarksfac<strong>to</strong>r f(·); at least at first view, planning withaction landmarks seems <strong>to</strong> be more complicated than standardSTRIPS planning. It is easy <strong>to</strong> show, however, thatICAPS'08 <strong>Multi</strong>agent <strong>Planning</strong> Workshop


from the worst-case time complexity perspective the overheadof adding landmarks is not significant. 2 This is becauseany problem Π L = 〈P, A, I, G, 〈a 1 , . . . , a δ 〉〉 with δaction landmarks can be compiled in<strong>to</strong> an equivalent, regularSTRIPS problem Π by(i) adding a single auxiliary multi-valued variable withdomain {q 1 , . . . , q δ },(ii) re<strong>for</strong>mulating each action landmark a i by settingpre(a i ) := pre(a i )∪{q i−1 } and add(a i ) := add(a i )∪{q i }, and(iii) extending the goal G <strong>to</strong> G ∪ {q δ }.Note that, with this simple compilation, the state space ofΠ is only δ times larger than the state space of Π L . Thus,assuming individual planning <strong>for</strong> each agent is polynomial(in the size of the entire system description) it is easy <strong>to</strong>verify that STRIPS planning with action landmarks <strong>for</strong> eachsuch agent remains polynomial-time as well.To extend the algorithm <strong>to</strong> non-disjoint action sets weneed <strong>to</strong> distinguish between actions that can be per<strong>for</strong>medby two agents independently and actions that require truecoordination at execution. The first case is the simplest –we create two copies of the action with different names andare back <strong>to</strong> the case of disjoint sets. The second case coversboth actions that require joint-execution and actions that are“mutually exclusive” – in both cases the agents must executein coordination. The interaction graph must be modified <strong>to</strong>include edges between agents that “share” such actions, andthe constraints must be modified <strong>to</strong> ensure that these actionsco-occur (or not) within the sequence of public actions ofthe corresponding agents. Naturally, the interaction graphmay be denser because of such actions, and their executionrequires the ability <strong>to</strong> synchronize.Another point <strong>to</strong> note is that the MA-planning algorithmhas kδ abstract time points in which public actions are taken.These time points are abstract because any number of internalactions can come between any two public actions. Inessence they serve only <strong>to</strong> constrain the order of the publicactions of different agents, and not as real time points. Infact, the algorithm does the most <strong>to</strong> decouple the time pointsused by each agent. This may be counter-intuitive, as usuallywe view fully synchronized systems as easier <strong>to</strong> dealwith. However, here additional synchronization would actuallybe a burden on the planning algorithms, as it wouldadd unnecessary constraints <strong>to</strong> the system, and would actuallyincrease the worst-case time complexity of the algorithms.Moreover, we see that the agents need not communicatetheir internal plans, nor do they need <strong>to</strong> synchronizeduring execution time. All an agent needs <strong>to</strong> know is thatthe preconditions <strong>for</strong> its next public action are satisfied.Finally, the ability <strong>to</strong> per<strong>for</strong>m the planning process in adistributed manner is of great interest, and is conceptuallysimple in our case. The key step in our algorithm is solvingan appropriate CSP. This CSP has a natural distributed<strong>for</strong>mulation and any of the many (distributed) algorithms2 Of course, empirically, the situation may be quite different.But by this point it should be apparent <strong>to</strong> the reader that here wefocus only on <strong>for</strong>mal, worst-case analysis of these issues.<strong>for</strong> solving distributed CSPs could be used <strong>to</strong> generate adistributed version of the MA-planning algorithm (Yokoo2001). The particular choice of the distributed CSP algorithmwould affect properties such as communication complexity,and this can be an interesting question <strong>for</strong> futurework.Reducing the Time ComplexityConsidering the worst-case time complexity of theMA-planning algorithm as captured by Eq. 2, and recallingour interest in the time complexity of MA planning mainlyas a function of time complexity of local planning <strong>for</strong> agents,a complexity bottleneck appears <strong>to</strong> be the exponent in thetree-width of the constraint graph CG Π;δ . In what follows,we show that this bottleneck can be partly eliminated, andsometimes <strong>to</strong> a very large degree.Considering the statement of Lemma 1, note that the treewidthof CG Π;δ can be Θ(k) even if the tree-width of theundirected graph induced by the agent interaction graph isO(1). The reason is that the coordination constraint <strong>for</strong>the agent ϕ i glues <strong>to</strong>gether the CSP variables corresponding<strong>to</strong> all possible providers and all possible destroyers ofthe preconditions of public actions A pubi (cf. the use of themoral graph in Lemma 1). Closely considering the languageused <strong>to</strong> “communicate” commitments within the coordinationprocess imposed by solving CG Π;δ , it turns out thatsometimes we can do substantially better.Each sequence of coordination points θ i =〈(a θi1 , t 1), . . . , (a θiδ , t δ)〉 posed by agent ϕ i corresponds <strong>to</strong> aset of δ announcements of the <strong>for</strong>m “at time t I will per<strong>for</strong>maction a θi ”. Now, let π i = max a∈Apub |pre(a) ∩ P pubi | beithe tight upper bound on the number of public preconditionsof an action of ϕ i . Note that this quantity is expected <strong>to</strong>be very low; e.g., in most (if not all) standard planningbenchmarks we have π i = O(1) (Helmert 2003). Giventhat, let us extend the verbosity of each coordination pointfrom (a, t) <strong>to</strong> (a, t, {(j 1 , t 1 ), . . . , (j πi , t πi )}) having thesemantics “at time t I will per<strong>for</strong>m action a, and I requireagents ϕ jl <strong>to</strong> provide me with the (j l -th) non-privateprecondition of a at time t jl , respectively.” This modificationof the language does not affect the internal-planningconstraints, but does effect the coordination constraints thatare now re<strong>for</strong>mulated as follows.(C3) Extended Coordination Constraint. An assignment〈θ 1 , . . . , θ k 〉 <strong>to</strong> U satisfies C3 iff, <strong>for</strong> 1 ≤ i ≤ n,(a, t, {(j 1 , t 1 ), . . . , (j πi , t πi )}) ∈ θ i implies that, <strong>for</strong>1 ≤ l ≤ π i , if p l ∈ P pubi is the j l -th public preconditionof a, then• <strong>for</strong> some u jl , and some action a ′ ∈ A pubj l, holds p l ∈add(a ′ ) and (a ′ , t l , {·}) ∈ θ jl , and• <strong>for</strong> no u j we have (a ′′ , t ′′ , {·}) ∈ θ j if p l ∈ del(a ′′ )and t l ≤ t ′′ ≤ t.Intuitively, what we required were commitments that notmerely demand that someone will supply some condition,but rather, explicitly name the supplier and the supply time.This may appear a bad idea: we increased the domainICAPS'08 <strong>Multi</strong>agent <strong>Planning</strong> Workshop


of the CSP variable because there are now many moresyntactically-different coordination sequences of length δ.However, this constraint also “unglues” the providers andthe destroyers of each agent ϕ i . The providers now need notensure <strong>to</strong>gether that some condition is supplied, but eachprovider worries only about the conditions it is explicitly requested<strong>to</strong> supply. According <strong>to</strong> Lemma 2, as long as π i issmall, this <strong>for</strong>mulation can buy us a lot.Lemma 2 For any MA-STRIPS problem Π, and any δ > 0,the constraint graph CG Π;δ induced by the constraints C2-C3 is independent of δ, and is isomorphic <strong>to</strong> the undirectedgraph underlying IG Π .The proof of Lemma 2 is similar <strong>to</strong> that of Lemma 1, exceptthat now there is an edge between ϕ i and ϕ j in the constraintgraph CG Π;δ only if ϕ i has public actions affectingpreconditions of an public action of ϕ j (or vice versa).Let us now consider more closely the complexity ofMA-planning with the re<strong>for</strong>mulated constraint satisfactionproblems CG Π;δ . The domain D i of each CSP variable u inow captures|D i | =δ∑( ) kδd· |A pubi | d · (k 2 d) πi(3)d=1=O((kδ|A pubi |) δ+1 ) · δ(k 2 δ) πipossible coordination sequences, where the first two multiplicativeterms within the summation are as in Eq. 1, andthe third term captures the choice of who (k) supports when(kδ) each of the π i public preconditions of the action. Inturn, the complexity of <strong>for</strong>cing the unary internal-planningconstraints C2 remains exactly as be<strong>for</strong>e, while the complexityof checking the coordination constraints C3 can now bedone in time O(kD ϖ+1 ), where ϖ is the tree-width of the(undirected) agent interaction graph IG Π . The overall complexityof solving CSP Π;δ is thus order off(I) · k(kδ|A pub |) δ+1 + k(kδ|A pub |) δϖ+ɛ′ · (k 2 δ) πiϖ+ɛ′′ ,(4)Note that, as we already mentioned, the tree-width ϖ can besubstantially lower than the (induced by C1) tree-width ω,possibly up <strong>to</strong> a reduction from Θ(k) <strong>to</strong> 1. Hence, the reductionof worst-case time complexity (indirectly) resultingfrom extending the agents’ language of commitments frommessages used in C1 <strong>to</strong> more complex messages used in C3can be exponential in the size of the multi-agent system.SummaryWe identified two parameters that quantify the couplinglevel of a multi-agent planning problem. <strong>One</strong> is systemdependent—the tree-width of the agent interaction graph,and the other is problem dependent—the minmax numberof coordination points per agent. When these parameters arefixed, the complexity of planning scales only polynomiallywith the size of the system.Our results provide novel insights in<strong>to</strong> the area of problemdecomposition, and they may also help guide the design ofsuch systems. That is, if we are <strong>to</strong> allocate actions <strong>to</strong> agents,we should strive <strong>to</strong> minimize the tree-width of the resultingagent interaction graph. They also show how a specialtype of single-agent planning problem is used <strong>to</strong> solve multiagentplanning problems.There are a number of natural issues <strong>for</strong> future work.Of great interest is the design of more practical algorithmsguided by the theoretical insights of this paper. If basedon CSPs, these would require more efficient encodings ofthe problem. Execution moni<strong>to</strong>ring <strong>for</strong> such systems is alsoan interesting <strong>to</strong>pic, as the use of abstract time points givesus flexibility <strong>to</strong> handle delays as well as work with asynchronoussystems.ReferencesAmir, E., and Engelhardt, B. 2003. Fac<strong>to</strong>red planning. InIJCAI, 929–935.Brafman, R. I., and Domshlak, C. 2006. Fac<strong>to</strong>red planning:How, when, and when not. In AAAI, 809–814.Bresina, J.; Dearden, R.; Meuleau, N.; Ramakrishnan, S.;Smith, D.; and Washing<strong>to</strong>n, R. 2002. <strong>Planning</strong> under continuoustime and resource uncertainty: A challenge <strong>for</strong> AI.In UAI, 77–84.Clement, B. J.; Durfee, E. H.; and Barrett, A. C. 2007.Abstract reasoning <strong>for</strong> planning and coordination. JAIR28:453–515.Dechter, R. 2003. Constraint Processing. Morgan Kaufmann.Durfee, E. H. 1999. Distributed problem solving and planning.In <strong>Multi</strong>agent systems: a modern approach <strong>to</strong> distributedartificial intelligence. 121–164.Erol, K.; Hendler, J.; and Nao, D. S. 1994. HTN planning:Complexity and expressivity. In AAAI, 1123–1128.Fikes, R. E., and Nilsson, N. 1971. STRIPS: A new approach<strong>to</strong> the application of theorem proving <strong>to</strong> problemsolving. AIJ 2:189–208.Helmert, M. 2003. Complexity results <strong>for</strong> standard benchmarkdomains in planning. AIJ 146(2):219–262.Kelareva, E.; Buffet, O.; Huang, J.; and Thiébaux, S. 2007.Fac<strong>to</strong>red planning using decomposition trees. In IJCAI,1942–1947.Knoblock, C. 1994. Au<strong>to</strong>matically generating abstractions<strong>for</strong> planning. AIJ 68(2):243–302.Moses, Y., and Tennenholtz, M. 1995. <strong>Multi</strong>-entity models.Machine Intelligence 14:63–88.Seymour, P. D., and Thomas, R. 1993. Graph searching andmin-max theorem <strong>for</strong> tree-width. Journal of Combina<strong>to</strong>rialTheory 58:22–33.Vidal, V., and Geffner, H. 2006. Branching and pruning:An optimal temporal POCL planner based on constraintprogramming. AIJ 170(3):298–335.Wilkins, D. E., and Myers, K. 1998. A multiagent planningarchitecture. In Int. Con. on AI <strong>Planning</strong> <strong>Systems</strong>, 154–162.Yokoo, M. 2001. Distributed Constraint Satisfaction:Foundations of Cooperation in <strong>Multi</strong>-agent <strong>Systems</strong>.Springer.ICAPS'08 <strong>Multi</strong>agent <strong>Planning</strong> Workshop

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!