From One to Many: Planning for Loosely Coupled Multi-Agent Systems

From One to Many: Planning for Loosely Coupled Multi-Agent SystemsRonen I. BrafmanDepartment of Computer ScienceBen-Gurion Universitybrafman@cs.bgu.ac.ilCarmel DomshlakFaculty of Industrial Engineering and ManagementTechniondcarmel@ie.technion.ac.ilAbstractLoosely coupled multi-agent systems are perceived as easierto plan for because they require less coordination betweenagent sub-plans. In this paper we set out to formalize thisintuition. We establish an upper bound on the complexityof multi-agent planning problems that depends exponentiallyon two parameters quantifying the level of agents’ coupling,and on these parameters only. The first parameter is problemindependent,and it measures the inherent level of couplingwithin the system. The second is problem-specific and it hasto do with the minmax number of action-commitments peragent required to solve the problem. Most importantly, the directdependence on the number of agents, on the overall sizeof the problem, and on the length of the agents’ plans, is onlypolynomial. This result is obtained using a new algorithmicmethodology which we call “planning as CSP+planning”.We believe this to be one of the first formal results to bothquantify the notion of agents’ coupling and to demonstrate atractable planning algorithm for fixed coupling levels.IntroductionSuppose that we seek a plan for a system consisting of a cooperativeset of agents, each with its own capabilities. Towhat extent would (centralized) planning for such a multiagent(MA) system be harder than solving individual planningproblems over the domains of each of the agents in isolation?Intuitively, the answer to this question should dependboth on the actual problem in hand, as well as on thedesign of the MA system. Clearly, if the agents are tightlycoupled, then planning for a MA system can become exponentiallyharder than individual, internal planning for eachagent, because we must basically treat the system as a single,large entity. On the other extreme, planning for a completelydecoupled system of agents will merely require solvinga few independent single-agent planning problems, andwould thus incur at most a linear factor over planning for theindividual agents themselves. But what lies in between?Intuitively, we would expect planning to become easierthe more loosely coupled the system is, and we seek analgorithm that can take advantage of such loose coupling.However, “loose coupling” itself is a loose concept, and toconcretize it, we need to identify a set of formal parametersquantifying the “coupling level” of MA systems. Then,we must show either that the worst-case time complexity ofplanning for such systems can be formulated in terms ofthese parameters, or that empirical run-time complexity ofplanning for such systems correlates with these parameters(or, of course, both). The former is what we set out to do inthis paper.A discussion of planning problems and their complexitymust occur within some formal model. In this work we considera minimalistic state-transition model expressed via theSTRIPS classical planning language (Fikes & Nilsson 1971),slightly extended to associate actions with agents. To capturethe level of interaction between agents we define andexploit the agent interaction (di)graph in which two agentsare connected if one agent’s action affects the functionalityof the other agent. We show that the worst-case time complexityof planning for a MA system can be tied to the treewidthof this graph: the lower it is, the less dependent agentsare on one another when they desire to change their state.However, as we will see, the situation is a bit more complex.Besides a dependence on the structure of the system,there is also dependence on the properties of the concreteproblem the system must solve. Some problems cannot besolved without much coordination between the agents, evenif each agent interacts with just a few, or even one, otheragents. The latter is a problem-specific parameter that correspondsto the number of actions executed by each agentthat influence or are influenced by other agents in the system.Thus, one coupling parameter, denoted by ω, roughlycorresponds to a measure of the number of agents that eachagent must coordinate with (because they have the potentialto influence her) and the other, denoted by δ, corresponds tothe number of actions involving other agents that an agentmust insert into its plan.The above two parameters are intuitive, but are insufficientto formalize the worst-case time complexity of MAplanning because specific problem/domain properties mayaffect the cost of single agent planning in each domain.Thus, our result is formulated in terms of the overhead ofplanning for a multi-agent system as a function of planningfor each single agent in isolation. As noted above, this canmove from an exponential overhead to a linear one. Ourmain contribution is to provide an algorithm that can gracefullymove between these two extremes as a function of thecoupling level of the system. We provide an algorithm forplanning for a MA-system that is (worst-case) harder thanICAPS'08 Multiagent Planning Workshop

planning for each of its agents in isolation by a factor exponentialonly in the tree-width of the agent interactiongraph and the maximal number of coordination points anagent must have. Most importantly, the direct dependenceon (i) the overall size of the planning problem, (ii) the numberof agents, and the length of the joint, and even individual,plans, is only polynomial. In other words, if the coupling parametersremain fixed while the number of agents increases,the cost of planning will increase only polynomially!In our work, we build upon, combine, and extend theideas underlying two recent proposals for factored singleagentplanning by Brafman and Domshlak (2006) and Amirand Engelhardt (2003). Our key extension corresponds to anew algorithmic methodology for planning, which we referto as “planning as CSP+planning”. In particular, in contrastto the above works on factored planning, this methodologyallows us to handle efficiently MA planning problems thatrequire arbitrary long individual agent plans, provided thenumber of coordination points per agent is kept fixed. Overall,this gives us some of the first tractability results for nonhierarchicalMA planning, and a formal characterization ofcoupling level and its effect on the hardness of MA planning.Moreover, although our discussion is in terms of centralizedplanning, the algorithm we provide is based on solving an inherentlydistributed CSP. Thus, using any of the many algorithmsfor distributed constraint satisfaction (Yokoo 2001),one obtains a distributed planning algorithm. And the underlyingideas go well beyond the simple STRIPS action model.The paper is structured as follows. We start by definingthe basic multi-agent planning model used. This definitionnaturally induces the notions of private/internal vs. publicactions of an agent. It also leads to the definition of the agentinteraction graph of the MA planning domain. Then, in themain section of this paper we show how to solve MA planningusing an enhancement of planning as CSP, which wecall planning as CSP+Planning. In this context, the problemof planning with landmarks arises naturally, and we showhow to reduce this problem to a standard planning problem.After describing our planning algorithm, we analyzeits complexity. Following this we re-examine the algorithmand modify it somewhat to get improved complexity.Multi-Agent “Classical Planning” ModelWe consider planning for cooperative MA systems in whichagents act under complete information and via deterministicactions. Specifically, we consider problems expressiblein a minimalistic MA-extension of the STRIPS language(Fikes & Nilsson 1971). In particular, the problemsconsidered here comprise the seminal automata-based multientitymodels (Moses & Tennenholtz 1995). In what follows,we formalize this extension of STRIPS, as well as someof its useful derivatives that we then employ in the problemsolvingpart of the paper.Definition 1 An MA-STRIPS problem for a system ofagents Φ = {ϕ i } k i=1 is given by a quadruple Π =〈P, {A i } k i=1 , I, G〉, where:• P is a finite set of atoms (also called propositions), I ⊆ Pencodes the initial situation, and G ⊆ P encodes the goalconditions,• For 1 ≤ 1 ≤ k, A i is the set of actions that the agent ϕ iis capable of performing. Each action a ∈ A = ⋃ A ihas the standard STRIPS syntax and semantics, that is,a = 〈pre(a), add(a), del(a)〉 is given by its preconditions,add effects, and delete effects.Clearly, MA-STRIPS reduces to STRIPS exactly when k =1. For ease of presentation, we assume that the individualaction sets of the agents are disjoint, i.e., no two agents sharean identical action. This assumption is easy to eliminate, aswe explain later in the paper.To illustrate the MA-STRIPS model, consider the wellknownLogistics domain in which a set of packages shouldbe moved on a (possibly complex) roadmap from their initialto their target locations using a given fleet of vehiclessuch as trucks, airplanes, etc. The packages can be loadedonto and unloaded off the vehicles, and each vehicle canmove along a certain subset of road segments. It is quitenatural to model this domain using MA-STRIPS by associatingan atom with each package location on the map and inthe vehicles, and with each truck location on the map. Theaction schema are move, load, and unload, with the suitableparameters (e.g., move(truck, origin, destination)and load(package, truck, at-location)). Associating eachtruck with an agent, we might assign to this agent all themove, load, and unload actions in which it is involved.(Note that disjointness of agents’ action set is not problematichere as load(P, T, L) and load(P, T ′ , L) are two differentactions in A.)We now focus on dependencies that such a MA-STRIPSproblem Π induces on the agents Φ. In what follows, we⋃use eff(a) as a shortcut for add(a) ∪ del(a). Let P i =a∈A ipre(a) ∪ eff(a) be the set of all atoms affected byand/or affecting the actions of the agent ϕ i . By internalatoms and public atoms of agent ϕ i we refer to the subsetsPiint = P i \ ⋃ ϕ P j∈Φ\{ϕ i} j, and P pubi = P i \ Piint , respectively.That is, if p ∈ Piint , then other agents can neitherachieve nor destroy nor even require p. Clearly, the internalatoms of all the agents are pair-wise disjoint, and theremight be certain atoms that are internal to no agent. In ourexample, all possible truck locations are atoms internal tothe truck agent, while package locations are public if theycan be loaded/unloaded in these locations into/from morethan one vehicle.Using this notion of an agent’s internal atoms, we can nowdefine the partition A i = A inti ∪ A pubi of agent actions intoits internal and public actions, respectively, whereA inti= {a | a ∈ A i , pre(a) ∪ eff(a) ⊆ P inti }.That is, A inti is the set of all actions whose description containsonly internal atoms of ϕ i , while all other actions of ϕ iare public. In our example, all the move actions are certainlyinternal to the respective vehicle agents, while load, unloadactions are public just if they affect the position of a packagein some of its public locations. Given an action a ofagent ϕ i , we use a| int to denote the projection of a onto itsICAPS'08 Multiagent Planning Workshop

private conditions, that is, a| int = 〈pre(a) ∩ Piint , add(a) ∩Piint , del(a)∩Piint 〉. If a ∈ A inti , then a = a| int , but otherwisea| int may have fewer conditions.Finally, we introduce the notion of agent interaction digraphIG Π that plays a key role in the algorithmic part of thestory. The nodes of IG Π correspond to the system’s agentsΦ. There is a directed edge from node ϕ i to node ϕ j inIG Π if there exist actions a i ∈ A i and a j ∈ A j such thateff(a i ) ∩ pre(a j ) ≠ ∅. That is, an edge from ϕ i to ϕ j indicatesthat ϕ i either supplies or destroys a condition requiredby ϕ j . It is possible, of course, that there are edges in bothdirections between ϕ i and ϕ j .It worth noting the connection between agent-interactiongraph and the well known causal graph which plays an importantrole in the work of Brafman and Domshlak (2006)on factored planning. The nodes of the causal graph correspondto domain variables, and an edge connects node pto q if there exists an action a such that p ∈ pre(a) andq ∈ eff(a). Thus, the causal graph is a special instance ofthe agent-interaction graph when each agent is associatedwith a proposition and its set of actions contains all actionsthat influence the value of this proposition.Planning as CSP+PlanningWe now proceed to consider the algorithmic alternativesfor solving a given MA-STRIPS problem Π =〈P, {A i } k i=1 , I, G〉. Obviously, one can simply compile itinto an equivalent “single-agent” STRIPS planning problem〈P, A, I, G〉 and apply some state-of-the-art algorithm forthis task. This compilation, however, hides away the originalproblem decomposition induced by the agents Φ. Inparticular, the worst-case time complexity of solving Π thisway is independent of the structure and some other propertiesthat may naturally be induced by the agent coalitionΦ over the planning problem in hand. Specifically,the worst-case time complexity of leading approaches toSTRIPS planning is either unbounded (for local search procedures),or exponential in the size of the problem description(for standard planning-as-CSP approaches), or exponentialin the length of the shortest plan (for BFS-style procedures).The exceptions would be only some recently-proposed algorithmsfor factored planning (Amir & Engelhardt 2003;Brafman & Domshlak 2006; Kelareva et al. 2007) that webuild upon in our work here. The MA-STRIPS solving frameworkwe propose here combines some technical ideas underlyingtwo such factored planning algorithms of (Brafman &Domshlak 2006) and (Amir & Engelhardt 2003), and extendsthem to target loose agents’ coupling, which we believeto be a natural property of practical MA systems.Coordination-Centric PlanningConsider some plan ρ for Π and an agent ϕ i involved in it.Let the individual sub-plan ρ i of ϕ i be the order-preservingprojection of ρ onto ϕ i ’s set of actions A i . Let a i1 , . . . , a imbe the public actions in ρ i (in their order of appearance); betweeneach adjacent actions a ij , a ij+1 we have a (possiblyempty) sequence of internal actions of ϕ i . While, in principle,it is possible that the agent has no internal actions, onewould expect to encounter many such actions in a system ofsubstantially autonomous agents. Thus, we can view eachagent’s plan as a sequence of coordination (or commitment)points, i.e., points in which it executes actions that possiblyinfluence or are influenced by other agents directly, andin between them, actions that do not affect other agents directly.As an example, consider the Logistics domain describedearlier, and recall that move actions are internal to thevehicle-operating agents. If the vehicles move on a complexmap, requiring many map-point to map-point movements inbetween load and unload actions, then between every twoactions requiring coordination, there would be many internalmove actions. Another example would be the Rover domainthat modelsaaa NASA’s exploration rovers (Bresina etal. 2002). Imagine a set of rovers that explore a particularregion. The public actions would be actions that carry out anexperiment at a location, such as taking a measurement or aphoto. These actions are public because they affect (goal)propositions that can be affected by many other rovers (e.g.,some other rovers can also take these measurements or pictures).However, the individual plans of the rovers consistmostly of actions like moving from one location to another,tracking an object, extending the arm, warming up devices,placing instruments, calibrating instruments, etc. All theseare internal actions that affect only the rover’s internal state,and typically many of them come in between each pair ofpublic actions.Given this expectation from the MA planning problems,a promising idea should be to shift emphasis to the coordinationpoints in the search, and let the agents “fill-inthe details” on their own. In fact, this intuitive principleis already adopted one way or another in many domainspecificmulti-agent (and, in particular, multi-robot) systems(Durfee 1999). Likewise, this principle lies in theheart of (both domain-specific and general-purpose) hierarchicalplanning systems (e.g., (Erol, Hendler, & Nao 1994;Knoblock 1994; Clement, Durfee, & Barrett 2007)). Ourobjective here is to operationalize this principle in a generic,domain-independent manner in systems that do not necessarilyexhibit substantial hierarchy among the agents. Wenow explain how this works.First, suppose that we know how many coordinationpoints each agent requires in order to solve the planningproblem. In that case, we can(1) guess how these coordination points look like, that is,what public actions are executed in them and when, and(2) for each agent, add internal actions between its coordinationpoints to provide their respective internal preconditions,obtaining a legal joint plan for the system.The latter task requires each agent to plan “in-between” itscoordination points, adding internal actions that take it fromthe state following one public action to a state in which thenext public action can be executed. In addition, differentagents’ individual sub-plans must be consistent—if an agentsub-plan calls for executing an action that requires some preconditionto hold, then (i) either this or another agent mustproduce this precondition in time, and (ii) no agent is al-ICAPS'08 Multiagent Planning Workshop

lowed to destroy this precondition. For example, if the agentof truck T decides to load a package P in location L, thenP should either be in L from the beginning, or be somehowbrought to L in time, and, in any case, no other vehicleshould be allowed to grab this package from L before T .Of course, we have no way of guessing correctly howmany coordination points there are and what their contentis. On the other hand, we can try searching over all possibleguesses, checking whether a guess can be extendedinto a complete plan. The time complexity is directly relatedto this: to find coordination points we perform iterativedeepening over the number of coordination points,which requires time exponential in the number of coordinationpoints. If we are not careful, a naive such iterativedeepening is exponential in the total number of coordinationpoints among all agents; that would be very problematic asthis parameter is expected to grow at least as fast as the numberof agents in the system. The good news is that, with care,we can reduce the time complexity to be exponential onlyin the number of coordination points required by a singleagent. This number will be dominated by the agent that requiresthe most coordination points, and of course, we willseek to minimize this number. Note that this parameter isproblem-specific because it depends on both the initial stateand the goal of the MA system.CSP and (Intra-Agent) Planning with LandmarksWe now describe a concrete procedure for extending achoice of coordination points into a globally consistent planthat corresponds to a certain combination of constraint satisfactionand planningIn general, a constraint satisfaction problem (Dechter2003) is defined via a set of variables, U = {u i } n i=1 ,with respective domains {D i } n i=1 , and set of constraints{c i } m i=1 . Each constraint c i is associated with a subset ofvariables {u i1 , . . . , u il(i) }, and defines a subset of tuplesC i ⊆ D i1 × · · · × D il(i) to be the set of allowable joint assignmentsto these variables. An assignment 〈{θ 1 , . . . , θ k }〉to U is a satisfying assignment if its projection to the domainof each constraint satisfies that constraint, that is, if〈θ i1 , . . . , θ il(i) 〉 ∈ C i .Now, assume that we allow each agent at most δ ≥ 0 coordinationpoints. Thus, the total number of coordinationpoints across the system is at most kδ (recall that k is thenumber of agents). Given this explicit constraint on solvingΠ, we define a constraint satisfaction problem CSP Π;δ overk variables U = {u i } k i=1 , one for each agent ϕ i. Each suchvariable, u i , represents the agent’s choice of coordinationpoints. That is, its domain consists of different choices theagent could make for the choice of coordination points. Eachsuch choice consists of an action to execute and a time to executethis action. Thus, it is most convenient to view u i as avector of length δ, representing a sequence of δ coordinationpoints. Each entry in this vector is either empty (because theagent may need fewer than δ coordination points), or is assigneda pair of the form (a, t), where a is public action ofϕ i , and t ∈ {1, 2, . . . , kδ} is an abstract time point at whichϕ i commits to performing a. 1Our next step is to pose constraints on u i such that any solutionto the CSP defined by these variables and constraintscan be extended into a legal plan for Π just if there exists alegal plan satisfying the explicit δ-bound on each agent coordination.To make things simpler and more uniform, weassume the existence of some dummy agent that has a pairof actions producing the initial state at abstract time 0, andconsuming the goal state at abstract time kδ + 1.Our first constraint takes care of verifying the consistencyof an agent’s commitments with those of other agents affectingits non-private values.(C1) Coordination Constraint.An assignment 〈θ 1 , . . . , θ k 〉 to U satisfies C1 iff, for1 ≤ i ≤ n, (a, t) ∈ θ i implies that, for each publicprecondition p ∈ P pubi of a holds• for some u j , and some (a ′ , t ′ ) ∈ θ j , holds p ∈add(a ′ ) and t ′ < t, (i.e., “someone supplies p beforet”) and• for no u l we have (a ′′ , t ′′ ) ∈ θ l with p ∈ del(a ′′ ) andt ′ ≤ t ′′ ≤ t (i.e., “no one destroys p between t ′ andt”).For example, if u T represents a truck T and a =load(P, T, L), then, if (a, t) appears in the sequence of u T ,then either the agent of T or some other agent should makesure that package P gets to location L at t ′ < t, and noother agent picks it up from there within the correspondingabstract time interval [t ′ , t].Our second constraint is posed over the internal part of thecoordination-point actions to ensure that the agent is capableof supporting its own commitments. That is, the agent mustbe able to generate internal actions ensuring that the internalpreconditions of the (public) actions it has committed to areachieved, and in the right order. To specify this constraint,we begin by formalizing a special type of single-agent planningproblem which we call a STRIPS problem with actionlandmarks.Definition 2 A STRIPS problem with action landmarks isgiven by a tuple Π L = 〈P, A, I, G, σ〉 where• P , A, I, and G have the standard STRIPS semantics ofatoms, actions, initial state, and goal, respectively.• σ = 〈a 1 , . . . , a |σ| 〉 is a sequence of action instances fromA ′ , where A ′ is defined (similarly to A) in terms of P .A sequence ρ of actions from A∪A ′ is a plan for Π L just if (i)ρ is a plan for the regular STRIPS problem 〈P, A∪A ′ , I, G〉,and (ii) σ is a subsequence of ρ.Informally, our objective in a STRIPS problem with actionlandmarks is to solve it in the standard sense while ensuringthat the solution contains a certain sequence of actions.This sequence of actions may be disjoint from the regularactions in A, though it does not have to be. In our case the1 The “abstractness” of time points is crucial, but it is explainedand motivated later. For now, the reader may consider these asregular time points on some discrete scale.ICAPS'08 Multiagent Planning Workshop

actions of σ will be projections of public actions onto theirinternal preconditions. Note that planning with action landmarksis meaningful even in the absence of a clear end-goalG. In fact, this is exactly our usage of planning with actionlandmarks in the specification of the internal-planningconstraint below.(C2) Internal-Planning Constraint.An assignment 〈θ 1 , . . . , θ k 〉 to U satisfies C2 iff, foreach θ i = 〈(a θi1 , t 1), . . . , (a θiδ , t δ)〉, the STRIPS problemwith action landmarks〈P i , A inti , I ∩ P i , ∅, 〈a θi1 | int, . . . , a θiδ | int〉〉is solvable.Notice that C2 induces a set of unary constraints overU— it constrains each agent’s coordination-point sequencein isolation, and it does not depend on the actions of otheragents. However, unlike typical unary constraints, these areprocedural unary constraints over each u i in the form of asingle-agent planning problem of a certain form. We nowsee clearly how CSP and planning are combined – CSP isused to ensure the inter-variable consistency, while planningis used to ensure intra-variable consistency (i.e., legal valuesfor each u i ).Putting things together, the high-level skeleton of our algorithmfor MA planning problems is depicted below.procedure MA-planning (Π over agents ϕ 1, . . . , ϕ k )δ := 1loopConstruct CSP Π;δ over u 1, . . . , u k .if ( solve-csp(CSP Π;δ ) ) thenReconstruct a plan ρ from a solution for CSP Π;δ .return ρelseδ := δ + 1endloopThe MA-planning algorithm performs an infinite loop.Each iteration, it increments the (upper-bound on the) lengthδ of the coordination sequences. Within the loop, the algorithmconstructs the constraint satisfaction problem CSP Π;δalong the constraints C1 and C2, and checks its satisfiability.Flow-wise, this algorithm is similar to the iterativedepeningalgorithm for (single-agent) factored planning ofBrafman and Domshlak (2006), with the (as shown below,crucial) difference being in the constraint satisfaction problemschecked within the loop. Theorems 1 and 2 provide thecorrectness properties of the algorithm.Theorem 1 (Soundness) Given a MA-STRIPS problemΠ = 〈P, {A i } k i=1 , I, G〉, and an upper bound δ on thenumber of coordination points per agent, if an assignment〈θ 1 , . . . , θ n 〉 is a satisfying assignment to CSP Π;δ , then itcan be extended into a legal plan for Π.Theorem 2 (Completeness) Given a solvable MA-STRIPSproblem Π, there exists δ ≥ 0, such that CSP Π;δ is solvable.The proof of Theorem 1 requires taking care of numeroustechnical details, but conceptually it is quite straightforward.Satisfaction of the planning-based constraint C2implies “conditional” validity of individual agents’ plans,while these conditions are verified by the constraint C1. Thelatter corresponds to the standard partial-order causal-link(POCL) constraints of flaw prevention, while the standardordering constraints of POP are replaced with associatingactions with explicit time points (as is done, e.g., in temporalPOCL algorithms such as CPT (Vidal & Geffner 2006)).Finally, goal-achievement of the action sequence inducedby 〈θ 1 , . . . , θ n 〉 is ensured by our schematic addition of thedummy “goal-achiving” agent. The same line of reasoningunderlies the (simpler) proof of Theorem 2.ComplexityWe now proceed to consider the time complexity of theMA-planning algorithm. Informally, this complexity correspondsto the number of times we need to verify thata certain choice of coordination-sequence length forms abasis for a solution times the complexity of the verificationprocess. In other words, the time complexity ofMA-planning is captured by the time complexity of solvingthe CSP+planning problems CSP Π;δ . CSPs are a wellstudiedproblem, and we have a relatively good understandingof their complexity. The most relevant result for ourpurpose is that CSPs can be solved in time polynomial inthe problem size, and exponential in the tree-width of theinduced constraint graph (Dechter 2003). The constraintgraph is an undirected graph whose nodes correspond to theCSP variables, and there is an edge between u i and u j justif both participate in some constraint c. Informally, the treewidthof a graph is a measure of its “cliquishness,” or howtightly coupled its nodes are (Seymour & Thomas 1993).For example, the tree-width of a tree is 1, regardless of itssize, whereas the tree-width of a complete graph over nnodes is n.Let δ denote the minimal coordination-sequence lengthunder which a solution exists. Given that, there are at mostδ coordination points for each of the k agents, which mightall be executed at different time points, and each such coordinationpoint corresponds to a public action of one of theagents. Thus, the domain D i of each CSP variable u i ofCSP Π;δ captures|D i | =δ∑d=1( ) kδd· |A pubi | d = O((kδ|A pubi |) δ+1 ) (1)possible coordination sequences, where the first multiplicativeterm within the summation captures the choice of d ≤ δtime points, and the second term captures the choice ofpublic-action sequence of length d.The complexity of enforcing the unary internal-planningconstraints C2 is O(f(I) ∑ ki=1 |D i|), where I is the maximalcomplexity of the individual planning for each agentin Φ, and f(·) captures the cost of switching from regularplanning. If we let D denote max k i=1 D i then thiscan be written as O(f(I)kD), where D as well satisfiesD = O((kδ|A pubi |) δ+1 ). Note that the C2 constraints areunary constraints and they could be enforced “offline”, resultingin an equivalent CSP with reduced variable domains.ICAPS'08 Multiagent Planning Workshop

In turn, if CG Π;δ is the constraint graph of CSP Π;δ , thenchecking the coordination constraint C1 can be done in timeO(kD ω+1 ), where D = max k i=1 D i, and ω is the tree-widthof CG Π;δ (Dechter 2003). Hence, we can conclude:Theorem 3 The overall complexity of solving CSP Π;δ isO ( f(I) · k(kδ|A pub |) δ+1 + k(kδ|A pub |) δω+ɛ) (2)The first term of the summation is the cumulative complexityof the single-agent sub-problems, and the second term is thecomplexity of extending single-agent plans to a joint MAplan,with ɛ = δ + ω + 1 being the dominated factor in theexponent.Finally, we would like to establish a concrete connectionbetween the tree-width ω of the constraint graph CG Π;δ andthe topology of the MA system. In Lemma 1 below we doexactly that by connecting between the structure of CG Π;δand that of the agent interaction graph IG Π . The implicationis that this parameter can already be known to us at systemdesign time and does not depend on the particular planningproblem solved.Lemma 1 For any MA-STRIPS problem Π, and any δ > 0,the constraint graph CG Π;δ induced by the constraints C1-C2 is independent of δ, and is isomorphic to the moral graphof IG Π .A moral graph of a digraph G is obtained by removingthe edge directions, and adding an edge between each pair of(original) parents of each node of G. Sketching the proof ofLemma 1, note that the edges of the constraint graph CG Π;δare only due to the coordination constraints C1. Thus, thereis an edge between ϕ i and ϕ j either (A) if ϕ i has publicactions affecting preconditions of some public actions of ϕ j(or vice versa), or (B) if ϕ i and ϕ j both have public actionsaffecting (either positively or negatively) preconditions of(possibly different) public actions of some third agent ϕ l ∈Φ. Given that, the bijective node mapping ∀i : u i ↦→ ϕ iestablishes an isomorphism between CG Π;δ and the moralgraph of IG Π ; edges (A) and (B) of CG Π;δ are mapped tothe original edges of IG Π and the edges connecting betweenthe nodes’ parents, respectively.DiscussionConsidering the worst-case time complexity of MA-STRIPSplanning as a function of the time complexity I of STRIPSplanningfor each of the system’s agents, we have shown thatthe former can be upper-bounded bythat is, by thef(I) · exp(δ) + exp(δω)• factor f(·) induced by requesting each agent to plan whilecommitting to a certain sequence of actions,• multiplicative factor exponential only in δ, the minmaxnumber of per-agent commitments, and• additive (!) factor exponential only in δω, where ω is thetree-width of moral graph of the agent interaction graph.Here, ω and δ provide quantitative measures of the coupling“levels” of the system in general, and of the concrete probleminstance, respectively. Note that, putting aside for a momentthe factor f(·) of intra-agent planning, the complexityof MA-planning(1) has no direct exponential dependence on the number ofagents, k,(2) has neither direct exponential dependence on the size|Π| of the MA planning problem, nor such dependenceon the length of a joint plan for it (and this in contrast tostandard planning techniques), and(3) has no direct exponential dependence on the length ofindividual agent plans, in contrast to the recent factoredplanning techniques we build upon (Amir & Engelhardt2003; Brafman & Domshlak 2006)).Having read this far, the reader may rightfully commentthat planning for each individual agent can already be exponentialin the overall size of the problem. Indeed, if someof the domains of individual agents have size comparable tothat of the whole multi-agent system, that is, |P i | = Θ(|P |),the whole discussion of multi-agent planning complexityseems like a waste of time, as some of the individual planningproblems are about as hard as the problem of planningfor the entire system. In that case, treating the system as asingle entity is likely to be more profitable.More natural and interesting settings correspond to systemsin which each agent’s domain is not too large, and thecomplexity of the system stems from the existence of manysuch interacting agents. In such systems we would expectthe number of internal atoms of each agent to be relativelysmall – that is, constant or O(log |P |). Now, planning fora single agent, even if exponential in log |P |, is still polynomialin P . In many MA systems this appears to be thecase. For example, in the Rovers domain mentioned before,individual agents are often designed to fulfill certainwell-defined roles, and their internal combinatorics can naturallyend-up being simple. In fact, this is one of the majorpromises in devising heterogeneous MA systems: “One ofthe powerful motivations for distributed problem solving isthat it is difficult to build artifacts (or train humans) to becompetent in every possible task. Moreover, even if it feasibleto build (or train) an omni-capable agent, it is oftenoverkill because, at any given time, most of those capabilitieswill go to waste. The strategy in human systems, andadopted in many distributed problem-solving systems, is tobring together on demand combinations of specialists in differentareas to combine their expertise to solve problems thatare beyond their individual capabilities.” (Durfee 1999). Anice example of this approach in the context of planning andscheduling has been proposed in (Wilkins & Myers 1998),where sophisticated systems for planning and scheduling aredecomposed into modules, each of which is transformed intoan agent, allowing experimentation with different degrees ofcoupling between the planning and scheduling capabilities.Finally, let us consider closely the planning-withlandmarksfactor f(·); at least at first view, planning withaction landmarks seems to be more complicated than standardSTRIPS planning. It is easy to show, however, thatICAPS'08 Multiagent Planning Workshop

from the worst-case time complexity perspective the overheadof adding landmarks is not significant. 2 This is becauseany problem Π L = 〈P, A, I, G, 〈a 1 , . . . , a δ 〉〉 with δaction landmarks can be compiled into an equivalent, regularSTRIPS problem Π by(i) adding a single auxiliary multi-valued variable withdomain {q 1 , . . . , q δ },(ii) reformulating each action landmark a i by settingpre(a i ) := pre(a i )∪{q i−1 } and add(a i ) := add(a i )∪{q i }, and(iii) extending the goal G to G ∪ {q δ }.Note that, with this simple compilation, the state space ofΠ is only δ times larger than the state space of Π L . Thus,assuming individual planning for each agent is polynomial(in the size of the entire system description) it is easy toverify that STRIPS planning with action landmarks for eachsuch agent remains polynomial-time as well.To extend the algorithm to non-disjoint action sets weneed to distinguish between actions that can be performedby two agents independently and actions that require truecoordination at execution. The first case is the simplest –we create two copies of the action with different names andare back to the case of disjoint sets. The second case coversboth actions that require joint-execution and actions that are“mutually exclusive” – in both cases the agents must executein coordination. The interaction graph must be modified toinclude edges between agents that “share” such actions, andthe constraints must be modified to ensure that these actionsco-occur (or not) within the sequence of public actions ofthe corresponding agents. Naturally, the interaction graphmay be denser because of such actions, and their executionrequires the ability to synchronize.Another point to note is that the MA-planning algorithmhas kδ abstract time points in which public actions are taken.These time points are abstract because any number of internalactions can come between any two public actions. Inessence they serve only to constrain the order of the publicactions of different agents, and not as real time points. Infact, the algorithm does the most to decouple the time pointsused by each agent. This may be counter-intuitive, as usuallywe view fully synchronized systems as easier to dealwith. However, here additional synchronization would actuallybe a burden on the planning algorithms, as it wouldadd unnecessary constraints to the system, and would actuallyincrease the worst-case time complexity of the algorithms.Moreover, we see that the agents need not communicatetheir internal plans, nor do they need to synchronizeduring execution time. All an agent needs to know is thatthe preconditions for its next public action are satisfied.Finally, the ability to perform the planning process in adistributed manner is of great interest, and is conceptuallysimple in our case. The key step in our algorithm is solvingan appropriate CSP. This CSP has a natural distributedformulation and any of the many (distributed) algorithms2 Of course, empirically, the situation may be quite different.But by this point it should be apparent to the reader that here wefocus only on formal, worst-case analysis of these issues.for solving distributed CSPs could be used to generate adistributed version of the MA-planning algorithm (Yokoo2001). The particular choice of the distributed CSP algorithmwould affect properties such as communication complexity,and this can be an interesting question for futurework.Reducing the Time ComplexityConsidering the worst-case time complexity of theMA-planning algorithm as captured by Eq. 2, and recallingour interest in the time complexity of MA planning mainlyas a function of time complexity of local planning for agents,a complexity bottleneck appears to be the exponent in thetree-width of the constraint graph CG Π;δ . In what follows,we show that this bottleneck can be partly eliminated, andsometimes to a very large degree.Considering the statement of Lemma 1, note that the treewidthof CG Π;δ can be Θ(k) even if the tree-width of theundirected graph induced by the agent interaction graph isO(1). The reason is that the coordination constraint forthe agent ϕ i glues together the CSP variables correspondingto all possible providers and all possible destroyers ofthe preconditions of public actions A pubi (cf. the use of themoral graph in Lemma 1). Closely considering the languageused to “communicate” commitments within the coordinationprocess imposed by solving CG Π;δ , it turns out thatsometimes we can do substantially better.Each sequence of coordination points θ i =〈(a θi1 , t 1), . . . , (a θiδ , t δ)〉 posed by agent ϕ i corresponds to aset of δ announcements of the form “at time t I will performaction a θi ”. Now, let π i = max a∈Apub |pre(a) ∩ P pubi | beithe tight upper bound on the number of public preconditionsof an action of ϕ i . Note that this quantity is expected tobe very low; e.g., in most (if not all) standard planningbenchmarks we have π i = O(1) (Helmert 2003). Giventhat, let us extend the verbosity of each coordination pointfrom (a, t) to (a, t, {(j 1 , t 1 ), . . . , (j πi , t πi )}) having thesemantics “at time t I will perform action a, and I requireagents ϕ jl to provide me with the (j l -th) non-privateprecondition of a at time t jl , respectively.” This modificationof the language does not affect the internal-planningconstraints, but does effect the coordination constraints thatare now reformulated as follows.(C3) Extended Coordination Constraint. An assignment〈θ 1 , . . . , θ k 〉 to U satisfies C3 iff, for 1 ≤ i ≤ n,(a, t, {(j 1 , t 1 ), . . . , (j πi , t πi )}) ∈ θ i implies that, for1 ≤ l ≤ π i , if p l ∈ P pubi is the j l -th public preconditionof a, then• for some u jl , and some action a ′ ∈ A pubj l, holds p l ∈add(a ′ ) and (a ′ , t l , {·}) ∈ θ jl , and• for no u j we have (a ′′ , t ′′ , {·}) ∈ θ j if p l ∈ del(a ′′ )and t l ≤ t ′′ ≤ t.Intuitively, what we required were commitments that notmerely demand that someone will supply some condition,but rather, explicitly name the supplier and the supply time.This may appear a bad idea: we increased the domainICAPS'08 Multiagent Planning Workshop

of the CSP variable because there are now many moresyntactically-different coordination sequences of length δ.However, this constraint also “unglues” the providers andthe destroyers of each agent ϕ i . The providers now need notensure together that some condition is supplied, but eachprovider worries only about the conditions it is explicitly requestedto supply. According to Lemma 2, as long as π i issmall, this formulation can buy us a lot.Lemma 2 For any MA-STRIPS problem Π, and any δ > 0,the constraint graph CG Π;δ induced by the constraints C2-C3 is independent of δ, and is isomorphic to the undirectedgraph underlying IG Π .The proof of Lemma 2 is similar to that of Lemma 1, exceptthat now there is an edge between ϕ i and ϕ j in the constraintgraph CG Π;δ only if ϕ i has public actions affectingpreconditions of an public action of ϕ j (or vice versa).Let us now consider more closely the complexity ofMA-planning with the reformulated constraint satisfactionproblems CG Π;δ . The domain D i of each CSP variable u inow captures|D i | =δ∑( ) kδd· |A pubi | d · (k 2 d) πi(3)d=1=O((kδ|A pubi |) δ+1 ) · δ(k 2 δ) πipossible coordination sequences, where the first two multiplicativeterms within the summation are as in Eq. 1, andthe third term captures the choice of who (k) supports when(kδ) each of the π i public preconditions of the action. Inturn, the complexity of forcing the unary internal-planningconstraints C2 remains exactly as before, while the complexityof checking the coordination constraints C3 can now bedone in time O(kD ϖ+1 ), where ϖ is the tree-width of the(undirected) agent interaction graph IG Π . The overall complexityof solving CSP Π;δ is thus order off(I) · k(kδ|A pub |) δ+1 + k(kδ|A pub |) δϖ+ɛ′ · (k 2 δ) πiϖ+ɛ′′ ,(4)Note that, as we already mentioned, the tree-width ϖ can besubstantially lower than the (induced by C1) tree-width ω,possibly up to a reduction from Θ(k) to 1. Hence, the reductionof worst-case time complexity (indirectly) resultingfrom extending the agents’ language of commitments frommessages used in C1 to more complex messages used in C3can be exponential in the size of the multi-agent system.SummaryWe identified two parameters that quantify the couplinglevel of a multi-agent planning problem. One is systemdependent—the tree-width of the agent interaction graph,and the other is problem dependent—the minmax numberof coordination points per agent. When these parameters arefixed, the complexity of planning scales only polynomiallywith the size of the system.Our results provide novel insights into the area of problemdecomposition, and they may also help guide the design ofsuch systems. That is, if we are to allocate actions to agents,we should strive to minimize the tree-width of the resultingagent interaction graph. They also show how a specialtype of single-agent planning problem is used to solve multiagentplanning problems.There are a number of natural issues for future work.Of great interest is the design of more practical algorithmsguided by the theoretical insights of this paper. If basedon CSPs, these would require more efficient encodings ofthe problem. Execution monitoring for such systems is alsoan interesting topic, as the use of abstract time points givesus flexibility to handle delays as well as work with asynchronoussystems.ReferencesAmir, E., and Engelhardt, B. 2003. Factored planning. InIJCAI, 929–935.Brafman, R. I., and Domshlak, C. 2006. Factored planning:How, when, and when not. In AAAI, 809–814.Bresina, J.; Dearden, R.; Meuleau, N.; Ramakrishnan, S.;Smith, D.; and Washington, R. 2002. Planning under continuoustime and resource uncertainty: A challenge for AI.In UAI, 77–84.Clement, B. J.; Durfee, E. H.; and Barrett, A. C. 2007.Abstract reasoning for planning and coordination. JAIR28:453–515.Dechter, R. 2003. Constraint Processing. Morgan Kaufmann.Durfee, E. H. 1999. Distributed problem solving and planning.In Multiagent systems: a modern approach to distributedartificial intelligence. 121–164.Erol, K.; Hendler, J.; and Nao, D. S. 1994. HTN planning:Complexity and expressivity. In AAAI, 1123–1128.Fikes, R. E., and Nilsson, N. 1971. STRIPS: A new approachto the application of theorem proving to problemsolving. AIJ 2:189–208.Helmert, M. 2003. Complexity results for standard benchmarkdomains in planning. AIJ 146(2):219–262.Kelareva, E.; Buffet, O.; Huang, J.; and Thiébaux, S. 2007.Factored planning using decomposition trees. In IJCAI,1942–1947.Knoblock, C. 1994. Automatically generating abstractionsfor planning. AIJ 68(2):243–302.Moses, Y., and Tennenholtz, M. 1995. Multi-entity models.Machine Intelligence 14:63–88.Seymour, P. D., and Thomas, R. 1993. Graph searching andmin-max theorem for tree-width. Journal of CombinatorialTheory 58:22–33.Vidal, V., and Geffner, H. 2006. Branching and pruning:An optimal temporal POCL planner based on constraintprogramming. AIJ 170(3):298–335.Wilkins, D. E., and Myers, K. 1998. A multiagent planningarchitecture. In Int. Con. on AI Planning Systems, 154–162.Yokoo, M. 2001. Distributed Constraint Satisfaction:Foundations of Cooperation in Multi-agent Systems.Springer.ICAPS'08 Multiagent Planning Workshop

From One to Many: Planning for Loosely Coupled Multi-Agent Systems

Create successful ePaper yourself

Delete template?

Save as template?