08.07.2015 Views

Planning under Uncertainty in Dynamic Domains - Carnegie Mellon ...

Planning under Uncertainty in Dynamic Domains - Carnegie Mellon ...

Planning under Uncertainty in Dynamic Domains - Carnegie Mellon ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong> <strong>in</strong> <strong>Dynamic</strong> Doma<strong>in</strong>sJim BlytheMay 25, 1998CMU-CS-98-147Computer Science Department<strong>Carnegie</strong> <strong>Mellon</strong> UniversityPittsburgh, PASubmitted <strong>in</strong>partial fulllment of the requirementsfor the degree ofDoctor of Philosophy.Thesis Committee:Jaime CarbonellManuela VelosoReid SimmonsJudea Pearl, U.C.L.A.c 1997 Jim BlytheThe views and conclusions conta<strong>in</strong>ed <strong>in</strong> this document are those of the authors and should notbe <strong>in</strong>terpreted as represent<strong>in</strong>g the ocial policies, either expressed or implied, of any other parties.


Br<strong>in</strong>g<strong>in</strong>g Healthcare News to Your HomeWe Are Committed to Improv<strong>in</strong>g the Health of the CommunityColumbus Community Hospitalto BUILD NEW HEALTHAND WELLNESS CENTERColumbus Community Hospital’s mission is to improve the health of the communitieswe serve. Preventive care and healthy lifestyles are important components <strong>in</strong> achiev<strong>in</strong>gthis goal and improv<strong>in</strong>g quality of life.To help with this, Columbus Community Hospital (CCH) will be build<strong>in</strong>g a Healthand Wellness Center on its campus. This center will provide residents with an excellentopportunity to improve their health and wellness as well as establish lifelong healthyhabits for their families.LEARN, PLAN AND TAKE ACTIONThe Health and Wellness Center will br<strong>in</strong>g together wellness; outpatient rehabilitation,<strong>in</strong>clud<strong>in</strong>g physical therapy, occupational therapy and speech therapy; fitness; and childcare services. It will be a dest<strong>in</strong>ation where people can learn, plan and take action toimprove their health and quality of life.“There is no better way to improve the health and wellness of our community than<strong>in</strong> the arena of lifestyle measures, which this center will provide the <strong>in</strong>frastructure todevelop and susta<strong>in</strong>,” says Jeffrey Gotschall, M.D., family medic<strong>in</strong>e physician and CCHBoard Vice Chairman.All medical rehabilitation services will be provided by CCH. These services are currentlyprovided with<strong>in</strong> the Hospital, but will move to the new center when it opens. This willallow for easier access, more space and <strong>in</strong>creased convenience for patients.The Hospital, along with the Board of Directors, has chosen the YMCA of Columbusto be the fitness services provider. The YMCA has a strong commitment to health andwellness, which makes it the perfect provider for this service.The Hospital is seek<strong>in</strong>g the best and most experienced child care provider for the facility.HOLISTIC APPROACH TO PERSONAL WELLNESS“The challenge we face today <strong>in</strong> our community and across the nation is a population atrisk for, or liv<strong>in</strong>g with, chronic diseases. We are see<strong>in</strong>g <strong>in</strong>creased <strong>in</strong>cidences of obesity,diabetes and heart disease at an earlier age <strong>in</strong> men, women and even children,” expla<strong>in</strong>sMichael Hansen, CCH President and CEO. “Based on the Community Health NeedsAssessment, this service is extremely necessary to help residents of the Columbus arealead healthier lives by provid<strong>in</strong>g a holistic approach to personal wellness.”The Healthand WellnessCenter will helpresidents ofthe Columbusarea leadhealthierlives byprovid<strong>in</strong>g aholistic approachto personalwellness.– MICHAEL HANSEN,CCH PRESIDENT AND CEO“We are excited that this <strong>in</strong>vestment will support our Community Health Needs Assessmentand br<strong>in</strong>g professional healthcare providers together to affect the health, well-be<strong>in</strong>g andquality of life of our community and region for generations to come,” says Jim Pillen,CCH Board chairman.Architectural plann<strong>in</strong>g is <strong>under</strong> way and groundbreak<strong>in</strong>g will take place <strong>in</strong> the fall of2013. More updates about the new Health and Wellness Center can be found on theCCH website at www.columbushosp.org.


Abstract<strong>Plann<strong>in</strong>g</strong>, the process of nd<strong>in</strong>g a course of action which can be executed to achievesome goal, is an important and well-studied area of AI. One of the central assumptionsof classical AI-based plann<strong>in</strong>g is that after perform<strong>in</strong>g an action the result<strong>in</strong>g statecan be predicted completely and with certa<strong>in</strong>ty. This assumption has allowed thedevelopment of plann<strong>in</strong>g algorithms that provably achieve their goals, but it has alsoh<strong>in</strong>dered the use of planners <strong>in</strong> many real-world applications because of their <strong>in</strong>herentuncerta<strong>in</strong>ty.Recently, several planners have been implemented that reason probabilistically aboutthe outcomes of actions and the <strong>in</strong>itial state of a plann<strong>in</strong>g problem. However, theirrepresentations and algorithms do not scale well enough to handle large problemswith many sources of uncerta<strong>in</strong>ty. This thesis <strong>in</strong>troduces a probabilistic plann<strong>in</strong>galgorithm that can handle such problems by focuss<strong>in</strong>g on a smaller set of relevantsources of uncerta<strong>in</strong>ty, ma<strong>in</strong>ta<strong>in</strong>ed as the plan is developed. This is achieved by us<strong>in</strong>gthe candidate plan to constra<strong>in</strong> the sources of uncerta<strong>in</strong>ty that are considered, <strong>in</strong>crementallyconsider<strong>in</strong>g more sources as they are shown to be relevant. The algorithmis demonstrated <strong>in</strong> an implemented planner, called Weaver, that can handle uncerta<strong>in</strong>tyabout actions taken by external agents, <strong>in</strong> addition to the k<strong>in</strong>ds of uncerta<strong>in</strong>tyhandled <strong>in</strong> previous planners. External agents may cause many simultaneous changesto the world that are not relevant to the success of a plan, mak<strong>in</strong>g the ability todeterm<strong>in</strong>e and ignore irrelevant events a crucial requirement for an ecient planner.Three additional techniques are presented that improve the planner's eciency <strong>in</strong>anumber of doma<strong>in</strong>s. First, the possible external events are analyzed before plann<strong>in</strong>gtime to produce factored Markov cha<strong>in</strong>s which can greatly speed up the probabilisticevaluation of the plan when structural conditions are met. Second, doma<strong>in</strong><strong>in</strong>dependentheuristics are <strong>in</strong>troduced for choos<strong>in</strong>g an <strong>in</strong>cremental modication toapply to the current plan. These heuristics are based on the observation that thefailure of the candidate plan can be used to condition the probability that the modicationwill be successful. Third, analogical replay is used to share plann<strong>in</strong>g eortacross branches of the conditional plan. Empirical evidence shows that Weaver cancreate high-probability plans <strong>in</strong> a plann<strong>in</strong>g doma<strong>in</strong> for manag<strong>in</strong>g the clean-up of oilspills at sea.iii


Contents1 Introduction 11.1 <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> uncerta<strong>in</strong>ty : : : : : : : : : : : : : : : : : : : : : : : 21.2 An example plann<strong>in</strong>g problem : : : : : : : : : : : : : : : : : : : : : : 31.3 Contributions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 41.4 Reader's guide to the thesis : : : : : : : : : : : : : : : : : : : : : : : 62 Related work 92.1 Overview of \classical" plann<strong>in</strong>g. : : : : : : : : : : : : : : : : : : : : 92.2 Reactive plann<strong>in</strong>g and approaches that mix plann<strong>in</strong>g with execution. 102.3 Extensions to classical plann<strong>in</strong>g systems : : : : : : : : : : : : : : : : 122.4 Approaches based on Markov decision processes : : : : : : : : : : : : 132.4.1 Overview of Markov Decision Processes : : : : : : : : : : : : : 142.4.2 <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> uncerta<strong>in</strong>ty with MDPs : : : : : : : : : : : : : 153 <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong> 193.1 Overview of prodigy 4.0 : : : : : : : : : : : : : : : : : : : : : : : : 193.1.1 Doma<strong>in</strong>s, problems and plans : : : : : : : : : : : : : : : : : : 193.1.2 Construct<strong>in</strong>g plans : : : : : : : : : : : : : : : : : : : : : : : : 223.1.3 Organization of the search space : : : : : : : : : : : : : : : : : 243.2 A representation for plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty : : : : : : : : : : : : 253.3 A Markov decision process model of uncerta<strong>in</strong>ty : : : : : : : : : : : : 313.4 Putt<strong>in</strong>g it together: a complete example : : : : : : : : : : : : : : : : 364 The Weaver Algorithm 434.1 Conditional plann<strong>in</strong>g : : : : : : : : : : : : : : : : : : : : : : : : : : : 444.1.1 Extensions to the prodigy 4.0 node set to support probabilisticplann<strong>in</strong>g : : : : : : : : : : : : : : : : : : : : : : : : : : 444.1.2 B-Prodigy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 474.2 Calculat<strong>in</strong>g the probability of success : : : : : : : : : : : : : : : : : : 514.3 Incrementally improv<strong>in</strong>g a plan with the plan critic : : : : : : : : : : 59v


4.3.1 Analys<strong>in</strong>g the belief net and choos<strong>in</strong>g a aw : : : : : : : : : : 604.3.2 Add<strong>in</strong>g a conditional branch : : : : : : : : : : : : : : : : : : : 614.3.3 Add<strong>in</strong>g preventive steps : : : : : : : : : : : : : : : : : : : : : 624.4 Discussion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 645 Eciency improvements <strong>in</strong> plan evaluation 695.1 The Markov cha<strong>in</strong> representation for external events : : : : : : : : : : 705.2 Us<strong>in</strong>g an event graph to guarantee a correct Markov cha<strong>in</strong> : : : : : : 735.3 Us<strong>in</strong>g the event graph <strong>in</strong> Weaver : : : : : : : : : : : : : : : : : : : : 756 Eciency improvements <strong>in</strong> plann<strong>in</strong>g 796.1 Doma<strong>in</strong>-<strong>in</strong>dependent search heuristics : : : : : : : : : : : : : : : : : : 796.1.1 Local estimations of probabilities : : : : : : : : : : : : : : : : 796.1.2 The footpr<strong>in</strong>t pr<strong>in</strong>ciple : : : : : : : : : : : : : : : : : : : : : : 806.2 Us<strong>in</strong>g derivational analogy <strong>in</strong> a conditional planner : : : : : : : : : : 856.2.1 The need to share search eort with<strong>in</strong> the conditional planner 866.2.2 Us<strong>in</strong>g derivational analogy <strong>in</strong> Weaver : : : : : : : : : : : : : : 886.2.3 Illustrations of performance us<strong>in</strong>g synthetic doma<strong>in</strong>s : : : : : 897 Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong> 937.1 The oil-spill clean-up doma<strong>in</strong> : : : : : : : : : : : : : : : : : : : : : : 937.2 Case studies of improv<strong>in</strong>g plan reliability : : : : : : : : : : : : : : : : 947.3 Experimental results for improv<strong>in</strong>g plan reliability. : : : : : : : : : : 1027.4 Mutual constra<strong>in</strong>ts between the planner, evaluator and critic : : : : : 1077.4.1 Us<strong>in</strong>g the plan to constra<strong>in</strong> events <strong>in</strong> the evaluator : : : : : : 1087.4.2 The event graph : : : : : : : : : : : : : : : : : : : : : : : : : 1097.5 Doma<strong>in</strong>-<strong>in</strong>dependent heuristics <strong>in</strong> the oil-spill doma<strong>in</strong> : : : : : : : : : 1107.6 Analogical replay : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1127.6.1 Case studies : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1127.6.2 Experiments <strong>in</strong> the oil-spill doma<strong>in</strong> : : : : : : : : : : : : : : : 1148 Conclusions 1178.1 Contributions of the thesis : : : : : : : : : : : : : : : : : : : : : : : : 1178.2 Future research directions : : : : : : : : : : : : : : : : : : : : : : : : 118A Proofs of theorems 121A.1 The plan's belief net provides a lower bound on its probability of success.121A.2 Event graphs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 122vi


B The Oil-spill doma<strong>in</strong> 125B.1 Doma<strong>in</strong> specication : : : : : : : : : : : : : : : : : : : : : : : : : : : 125B.2 Random problem generator : : : : : : : : : : : : : : : : : : : : : : : 155Bibliography 157vii


viii


List of Tables2.1 The policy iteration algorithm : : : : : : : : : : : : : : : : : : : : : : 152.2 The value iteration algorithm : : : : : : : : : : : : : : : : : : : : : : 163.1 Summary of the prodigy 4.0 plann<strong>in</strong>g algorithm. Each decisionpo<strong>in</strong>t is a potential backtrack<strong>in</strong>g po<strong>in</strong>t <strong>in</strong> the algorithm. : : : : : : : : 243.2 An annotated trace of prodigy 4.0 which shows a sequence of searchtree nodes created to produce the candidate plan shown <strong>in</strong> Figure 3.3.Each node except n5 is a child of the node <strong>in</strong> the l<strong>in</strong>e above. The text<strong>in</strong> italics has been added to the trace output for clarity of presentation. 263.3 The <strong>in</strong>itial state and goal description. Here t and f stand for \true"and \false". : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 384.1 An annotated trace of Prodigy, show<strong>in</strong>g the use of a protection node(n14) to add a precondition to the step at node n13 which will stop itsconditional eect from tak<strong>in</strong>g place. : : : : : : : : : : : : : : : : : : : 464.2 Algorithm for B-prodigy, based on prodigy 4.0. Steps which onlyappear <strong>in</strong> B-prodigy are shown <strong>in</strong> bold font. : : : : : : : : : : : : : 484.3 The aws uncovered by the plan critic for the rst candidate plan forthe example problem. For each aw a set of desired values is computedfrom the preconditions of the child action node, and the probabilitymass of this set is show. The explanations column shows parent nodevalues that can lead to an undesired value. : : : : : : : : : : : : : : 615.1 Algorithm to construct the belief net to evaluate a plan. : : : : : : : 766.1 Overview of the analogical replay procedure comb<strong>in</strong>ed with B-prodigy. 896.2 Operator schemas <strong>in</strong> the test doma<strong>in</strong>. : : : : : : : : : : : : : : : : : : 907.1 The objects and goal for the example problem used to demonstrate aconditional plan formed <strong>under</strong> scarce resources. : : : : : : : : : : : : 987.2 The <strong>in</strong>itial state for the example problem used to demonstrate a conditionalplan formed <strong>under</strong> scarce resources. : : : : : : : : : : : : : : 987.3 Prodigy's solution to the example problem, ignor<strong>in</strong>g <strong>in</strong>ference rules. : 99ix


7.4 Weaver's solution to the example problem, ignor<strong>in</strong>g <strong>in</strong>ference rules. : 997.5 The objects and goal for the example problem used to demonstrateadd<strong>in</strong>g steps to prevent anevent. : : : : : : : : : : : : : : : : : : : : 1017.6 The <strong>in</strong>itial state for the example problem used to demonstrate add<strong>in</strong>gsteps to prevent anevent. : : : : : : : : : : : : : : : : : : : : : : : : 1017.7 Proportion of nodes replayed by derivational analogy <strong>in</strong> randomly generatedproblems. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 115B.1 The number of objects <strong>in</strong> a problem is chosen randomly accord<strong>in</strong>g toa uniform distribution. This table shows the m<strong>in</strong>imum and maximumnumber of objects generated for each type. : : : : : : : : : : : : : : : 156x


List of Figures3.1 An operator that can move an object of type Barge from the locationbound to variable to the location bound to variable .Variables have an associated type and can also be constra<strong>in</strong>ed withfunctions. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 203.2 Two more operators <strong>in</strong> the example doma<strong>in</strong>. : : : : : : : : : : : : : : 223.3 The type hierarchy for the plann<strong>in</strong>g doma<strong>in</strong>. Objects <strong>in</strong> the exampleplann<strong>in</strong>g problem are shown <strong>in</strong> brackets below their type. : : : : : : : 233.4 A candidate plan for the plann<strong>in</strong>g problem to dispose of the oil. Thevariable names have been ommitted from the <strong>in</strong>stantiated operators.The head plan conta<strong>in</strong>s one step, to move the barge from Richmondto the west coast. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 233.5 The Move-Barge action from the oil-spill doma<strong>in</strong> used <strong>in</strong> Weaver hasa duration and a probability distribution of possible outcomes. : : : : 293.6 An exogenous event. : : : : : : : : : : : : : : : : : : : : : : : : : : : 303.7 The set of states reachable when the action Move-Barge is executedfrom the <strong>in</strong>itial state shown, when Weather-Brightens is the onlyexogenous event <strong>in</strong> the doma<strong>in</strong> and the action has a duration of onetime unit. The probability of reach<strong>in</strong>g each state is shown along its arc. 313.8 The two possible sequences of states <strong>in</strong> the <strong>under</strong>ly<strong>in</strong>g Markov decisionprocess aris<strong>in</strong>g from execut<strong>in</strong>g the Move-Barge operator <strong>in</strong> the stateshown on the left <strong>in</strong> the absence of exogenous events. : : : : : : : : : 323.9 The possible sequences of states <strong>in</strong> the <strong>under</strong>ly<strong>in</strong>g Markov decisionprocess aris<strong>in</strong>g from execut<strong>in</strong>g the Move-Barge operator <strong>in</strong> the stateshown on the left, when the exogenous event Weather-Brightens canalso take place. Only literals that are changed from a parent state areshown. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 333.10 The operators from the simple oil-spill doma<strong>in</strong> used <strong>in</strong> this chapter.When the duration of an operator isn't mentioned, it defaults to 1. : 373.11 The exogenous events from the oil-spill doma<strong>in</strong>. : : : : : : : : : : : : 38xi


3.12 Reachability graph of literal states <strong>in</strong> the plann<strong>in</strong>g problem from the<strong>in</strong>itial state, consider<strong>in</strong>g only operators. The <strong>in</strong>itial state is shown <strong>in</strong>the top left-hand corner, and the four goal states are shown <strong>in</strong> bold <strong>in</strong>the bottom left-hand corner. Actions correspond<strong>in</strong>g to the plan thatis produced for this problem are shown <strong>in</strong> bold. : : : : : : : : : : : : 393.13 Part of the MDP that would correspond to the example problem if ithad no exogenous events, restricted to the states <strong>in</strong> which (oil-<strong>in</strong>-tanker)is true. All transitions and <strong>in</strong>termediate states correspond to a move-bargeaction. The transitions that correspond to <strong>in</strong>itiat<strong>in</strong>g an action are labelledwith their probabilities. Intermediate transitions are not labelled. 403.14 The full set of states for the MDP for the example problem. Thetransitions due to exogenous events are shown with dotted l<strong>in</strong>es, connect<strong>in</strong>gthe two planes <strong>in</strong>to which the states are divided. In the upperplane the state variable (fair-weather) is true and <strong>in</strong> the lower plane(poor-weather) is true. : : : : : : : : : : : : : : : : : : : : : : : : : 414.1 Weaver's ma<strong>in</strong> loop has three modules: a conditional planner, a planevaluator and a plan critic. : : : : : : : : : : : : : : : : : : : : : : : : 434.2 The Pump-Oil and Move-Barge operators are modied respectively torequire and conditionally delete (operational ). The newoperator Make-Ready can add barge-ready. : : : : : : : : : : : : : : 454.3 Aversion of the operator Move-Barge with two possible outcomes. : : 494.4 An <strong>in</strong>itial tail plan to solve the example problem, show<strong>in</strong>g two alternativecourses of action. Step s3 has two dierent possible outcomes,labelled and . The steps along the top row are restricted to outcome of s3, and use barge1 <strong>in</strong> the situation when it is operational.The steps along the bottom row are restricted to outcome of s3 anduse barge2. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 494.5 A more developed version of the candidate plan. One step has beenmoved to the head plan and now determ<strong>in</strong>es two possible current states,given context labels and . : : : : : : : : : : : : : : : : : : : : : : 504.6 A reachability graph of states <strong>in</strong> the plann<strong>in</strong>g problem correspond<strong>in</strong>gto the conditional plan to transfer the oil with one of two barges. Goalstates are marked with a thick border, and the probability of arriv<strong>in</strong>gat each state is shown above it. Time <strong>in</strong>creases from left to right, andthe actions taken are shown with their <strong>in</strong>tervals below the nodes. : : : 524.7 A small example Bayesian belief network. : : : : : : : : : : : : : : : : 534.8 The two-stage algorithm to construct a belief net to evaluate a plan.In the rst stage, nodes are added to the belief net to represent actionsand their preconditions and eects. In the second stage, persistenceassumptions are replaced with nodes represent<strong>in</strong>g exogenous eventsand their preconditions and eects when appropriate. : : : : : : : : : 55xii


4.9 Belief nets represent<strong>in</strong>g the rst plan created by the conditional plannerfor the example problem, before and after event nodes are added tojustify the persistence of the (weather) variable. In the rst graph, thedotted l<strong>in</strong>es show the unexplored persistence <strong>in</strong>tervals. In the secondgraph, gray nodes represent possible occurrences of external eventsdur<strong>in</strong>g one <strong>in</strong>terval. No exogenous events aect the other <strong>in</strong>tervals. : 564.10 The belief net for the rst conditional branch of this section's exampleplan, with marg<strong>in</strong>al probabilities shown. : : : : : : : : : : : : : : : : 584.11 The two nets constructed for the two conditional branches of this section'sexample plan. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 594.12 A version of the Move-Barge operator that depends on the literal(barge-ready ). : : : : : : : : : : : : : : : : : : : : : : : : : : 634.13 Operators <strong>in</strong> the synthetic doma<strong>in</strong> to illustrate the utility of the constra<strong>in</strong>tson the planner from the critic. The <strong>in</strong>itial plan found by theconditional planner will have a success probability of1=2 n . Either oneconditional branch or one protection step can <strong>in</strong>crease the probabilityto 1. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 654.14 The nondeterm<strong>in</strong>ism from the operators <strong>in</strong> the previous doma<strong>in</strong> is replacedwith a s<strong>in</strong>gle exogenous event. The operators F<strong>in</strong>al, Alt-F<strong>in</strong>aland Protect are unchanged and so are the solutions. : : : : : : : : : 654.15 Operators <strong>in</strong> the synthetic doma<strong>in</strong> to illustrate the utility of the constra<strong>in</strong>tson the evaluator and critic from the planner. : : : : : : : : : 664.16 A modied Pump-Oil operator and a new Deploy-Pump operator thatmake a conditional plan necessary to achieve a probability of successgreater than 0.5 <strong>in</strong> the example problem. : : : : : : : : : : : : : : : : 685.1 Belief nets from Chapter 4, <strong>in</strong> which shaded nodes represent specicoccurrences of external events. : : : : : : : : : : : : : : : : : : : : : : 705.2 A belief net represent<strong>in</strong>g the same plan but with Markov cha<strong>in</strong>s usedto compute changes <strong>in</strong> the weather over the required time <strong>in</strong>tervalsrather than a series of event nodes. : : : : : : : : : : : : : : : : : : : 715.3 The nodes govern<strong>in</strong>g weather from the plan's belief net follow theMarkov cha<strong>in</strong> on the right. : : : : : : : : : : : : : : : : : : : : : : : : 725.4 The event Oil-Spills aects the location of the oil and is partlydeterm<strong>in</strong>ed by the weather. : : : : : : : : : : : : : : : : : : : : : : : 725.5 The event graph for the example doma<strong>in</strong>, with boxes for eventschemasand ovals for literal schemas. : : : : : : : : : : : : : : : : : : : : : : : 735.6 The Markov model built to model the evolution of the state variable(oil), <strong>in</strong>stantiated by the literal (oil-<strong>in</strong>-tanker west-coast). : : 745.7 The belief net created to evaluate the rst branch of the example planwith exogenous events aect<strong>in</strong>g both the weather and the oil modelledwith a Markov cha<strong>in</strong>. : : : : : : : : : : : : : : : : : : : : : : : : : : : 77xiii


6.1 Operators <strong>in</strong> the parameterised doma<strong>in</strong> \Mop(n,)". There are n operatorsS i for 1 i n : : : : : : : : : : : : : : : : : : : : : : : : : : 816.2 The operators for an example doma<strong>in</strong>. : : : : : : : : : : : : : : : : : 826.3 Operators <strong>in</strong> the car-start<strong>in</strong>g doma<strong>in</strong> : : : : : : : : : : : : : : : : : : 836.4 Initial tail plan to solve example problem. The directed arcs are causall<strong>in</strong>ks show<strong>in</strong>g that a step establishes a necessary precondition of thestep it po<strong>in</strong>ts to. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 866.5 A second stage of the planner. There are now two possible currentstates for contexts and . : : : : : : : : : : : : : : : : : : : : : : : 876.6 Atypical plan <strong>in</strong> the test doma<strong>in</strong>. : : : : : : : : : : : : : : : : : : : : 916.7 Time <strong>in</strong> seconds to solve plann<strong>in</strong>g problem with and without analogyplotted aga<strong>in</strong>st the number of branches <strong>in</strong> the conditional plan. Eachpo<strong>in</strong>t isanaverage of ve plann<strong>in</strong>g episodes. : : : : : : : : : : : : : : 917.1 Part of the type hierarchy, show<strong>in</strong>g the 52 most important of the 151types <strong>in</strong> the doma<strong>in</strong>. : : : : : : : : : : : : : : : : : : : : : : : : : : : 957.2 San Francisco Bay area : : : : : : : : : : : : : : : : : : : : : : : : : : 967.3 The exogenous event threatened-shore-changes models the uncerta<strong>in</strong>tyof the oil's path <strong>in</strong> the water. : : : : : : : : : : : : : : : : : : : 977.4 The exogenous event oil-spills models the uncerta<strong>in</strong> ow of oil fromthe tanker to the sea. : : : : : : : : : : : : : : : : : : : : : : : : : : : 1007.5 The average probability of success for the plans for the randomly generatedproblems, plotted aga<strong>in</strong>st the number of iterations of improvement.The upper and lower l<strong>in</strong>es show the 90th and 10th percentilesof the probabilities respectively. : : : : : : : : : : : : : : : : : : : : : 1037.6 The number of examples that reach their maximum probability at eachiteration. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1047.7 The graph on the left shows the average improvement ga<strong>in</strong>ed from eachway to improve a plan across the random set of problems. The graphon the right shows their cumulative contribution to success probability.Although <strong>in</strong> this doma<strong>in</strong> each conditional branch is generally only 1/4as eective as add<strong>in</strong>g prevent<strong>in</strong>g steps, its overall contribution is ahigher proportion because it is used more often. : : : : : : : : : : : : 1047.8 Average probabilities of plans for the same example set, plotted aga<strong>in</strong>stthe number of iterations, when Weaver's mechanisms for improv<strong>in</strong>gplan probability are selectively disabled. : : : : : : : : : : : : : : : : 1057.9 Histogram of the percentage of trials whose <strong>in</strong>itial probability of successfell <strong>in</strong> each of the ten probability ranges 0.1 <strong>in</strong> width. Roughly halfthe <strong>in</strong>itial cases have probability between 0 and 0.1 : : : : : : : : : : 106xiv


7.10 The lower l<strong>in</strong>e shows the mean probability of success plotted aga<strong>in</strong>st theiteration of Weaver for the cases whose <strong>in</strong>itial probability was between0 and 0.1. The upper l<strong>in</strong>e shows these values for the other cases andthe middle l<strong>in</strong>e shows the global mean. : : : : : : : : : : : : : : : : : 1067.11 Evolution of probabilities for the <strong>in</strong>itially low probability cases : : : : 1077.12 Evolution of probabilities for the <strong>in</strong>itially high probability cases : : : 1087.13 The event sea-gets-worse. : : : : : : : : : : : : : : : : : : : : : : : 1097.14 The times taken to create the Jensen jo<strong>in</strong> tree for the <strong>in</strong>itial plans withsimulated duration less than 12 hours. On the left, explicit event nodeswere used and on the right, Markov cha<strong>in</strong>s built us<strong>in</strong>g the event graph.Times us<strong>in</strong>g the event graph are on average less than one percent oftimes taken with explicit event nodes. : : : : : : : : : : : : : : : : : : 1107.15 The times taken to create the Jensen jo<strong>in</strong> tree us<strong>in</strong>g Markov cha<strong>in</strong>s forall <strong>in</strong>itial plans with simulated duration less than 1,000 hours. : : : : 1117.16 With and without greedy control rules. : : : : : : : : : : : : : : : : : 112xv


xvi


AcknowledgmentsMany, many people have helped me with the ideas and presentation of this thesis.Of course my committeehave been great at both generat<strong>in</strong>g and ren<strong>in</strong>g ideas: JaimeCarbonell, Manuela Veloso, Reid Simmons and Judea Pearl. Jaime and Manuela <strong>in</strong>particular have been very <strong>in</strong>strumental <strong>in</strong> the work. The Prodigy project over theyears has always been an excellent group of people to work with. Alicia Perez, MeiWang, Mike Cox, Scott Reilly, Eugene F<strong>in</strong>k, Rujith de Silva, Karen Haigh, PeterStone and Will Uther all gave mevaluable feedback <strong>in</strong> many meet<strong>in</strong>gs. Eugene andMike greatly improved the thesis with their detailed comments. Outside the Prodigygroup, Lonnie Chrisman, Sven Koenig, Rich Goodw<strong>in</strong>, John Mount, Bryan Loyall,Gary Pelton, Mike Williamson, Steve Hanks, Martha Pollack, Steve Chien, LouisePryor and Robert Goldman have been sources of <strong>in</strong>spiration and of many usefulconversations. The AI group at NASA Ames provided a great environment to home<strong>in</strong> on a thesis topic over the summer, with help from Steve M<strong>in</strong>ton, Mark Drummondand Peter Cheeseman. I owe the greatest debt of thanks, though, to my wife Cathy,whose love and support allowed me to nish this thesis and laugh about it afterwards.xvii


xviii


Chapter 1Introduction<strong>Plann<strong>in</strong>g</strong> and problem solv<strong>in</strong>g are essential characteristics of <strong>in</strong>telligent behaviour andhave held a central place <strong>in</strong> AI research s<strong>in</strong>ce the beg<strong>in</strong>n<strong>in</strong>g of the eld. The emphasisof research <strong>in</strong> plann<strong>in</strong>g has most often been the design of ecient algorithms to ndplans given a simplied representation of a plann<strong>in</strong>g task, encod<strong>in</strong>g only its essentialfeatures. By assum<strong>in</strong>g a certa<strong>in</strong> syntactic representation for plann<strong>in</strong>g doma<strong>in</strong>s andproblems, general-purpose plann<strong>in</strong>g algorithms could be designed and <strong>in</strong> some casesproofs could be given of their correctness and completeness [Fikes & Nilsson 1971;Tate 1977; Chapman 1987; M<strong>in</strong>ton et al. 1989; McAllester & Rosenblitt 1991].While these systems represent signicant technical achievements, when we wantto apply them to problems that occur <strong>in</strong> the world around us we nd that the assumptionsthey make can be severely limit<strong>in</strong>g. For example, their representations donot address trade-os between goals with dierent value <strong>in</strong> cases where they cannotall be achieved, or the possibility that the world can be changed <strong>in</strong>dependently of theactions taken by the plann<strong>in</strong>g agent. Nor do they address the fact that planners oftendo not know the state of the world or the eects of actions taken <strong>in</strong> it with certa<strong>in</strong>ty.<strong>Uncerta<strong>in</strong>ty</strong> aects a wide spectrum of plann<strong>in</strong>g doma<strong>in</strong>s, from Saturn orbit <strong>in</strong>sertionfor the Cass<strong>in</strong>i satellite (where the primary rocket may fail requir<strong>in</strong>g thesecondary rocket to be primed before the maneuver) to a plan to have a picnic <strong>in</strong> apark (where one should take <strong>in</strong>to account the chance of ra<strong>in</strong> or that the park may becrowded). Frequently there are many th<strong>in</strong>gs that can go wrong with each step of aplan and a detailed plan created <strong>in</strong> advance will almost certa<strong>in</strong>ly fail. For example,it is not feasible to create a detailed plan for driv<strong>in</strong>g to the oce that species turns,pauses and levels of acceleration before gett<strong>in</strong>g <strong>in</strong> the car. The planner does not knowif the trac lights will be red or green, or what trac will be encountered. This ledsome researchers to abandon programs that choose steps <strong>in</strong> advance <strong>in</strong> favour of \reactive"programs that choose each step as it is to be made [Agre & Chapman 1987;Schoppers 1989b].However it is often necessary to make cont<strong>in</strong>gency plans ahead of time even if notevery situation can be predicted and planned for. For example, the Cass<strong>in</strong>i spacecraft'ssecondary rockets take several hours to re up, but must be ready immediately1


2 Chapter 1. Introductionif the primary rockets should fail dur<strong>in</strong>g the critical m<strong>in</strong>utes of orbit <strong>in</strong>sertion. Thespacecraft cannot wait until this po<strong>in</strong>t to th<strong>in</strong>k about the problem, and reactivesystems that explicitly prime the secondary rocket beg the question of how suchbehaviour could be designed automatically.To deal with such problems, a plann<strong>in</strong>g system must be able to prioritise thedierent possible situations that might occur when a candidate plan is executed,and produce a plan that covers the most important ones while leav<strong>in</strong>g others to becompleted as more <strong>in</strong>formation is available dur<strong>in</strong>g execution. One way to approachthis problem, adopted <strong>in</strong> this thesis, is to assign probabilities to the various sourcesof uncerta<strong>in</strong>ty <strong>in</strong> the plann<strong>in</strong>g doma<strong>in</strong> and use them to derive probabilities for eachof the dierent possible outcomes when the plan is executed. The outcomes withhigher probability are treated as hav<strong>in</strong>g higher priority. This approach is attractivebecause a s<strong>in</strong>gle parameter | a m<strong>in</strong>imum acceptable value for the probability of plansuccess | can be used to control the planner and values can be comb<strong>in</strong>ed us<strong>in</strong>g thestandard axioms of probability. Comput<strong>in</strong>g the probability of success of a plan canbe expensive, however.This thesis describes an implemented algorithm to produce plans that meet athreshold probability of success <strong>in</strong> uncerta<strong>in</strong> doma<strong>in</strong>s. The thesis concentrates onplan generation and does not address plan execution, plan monitor<strong>in</strong>g or re-plann<strong>in</strong>galthough these are important pieces of a complete system for plann<strong>in</strong>g <strong>in</strong> uncerta<strong>in</strong>doma<strong>in</strong>s. The central contributions of the thesis are described below after the keyconcepts of plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty are covered <strong>in</strong> more detail.1.1 <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> uncerta<strong>in</strong>ty<strong>Plann<strong>in</strong>g</strong>, <strong>in</strong> the absence of uncerta<strong>in</strong>ty, is the process of nd<strong>in</strong>g a sequence of operatorsto transform an <strong>in</strong>itial state <strong>in</strong>to one that matches a goal description. The <strong>in</strong>itialstate is usually specied by giv<strong>in</strong>g a list of facts that hold <strong>in</strong> it, and the goal descriptionis usually a logical sentence us<strong>in</strong>g a subset of the facts as terms. Operators areusually described <strong>in</strong> terms of preconditions and eects, where the precondition of anoperator is a logical sentence describ<strong>in</strong>g the states <strong>in</strong> which the operator can legallybe applied, and the eects describe the changes that will be brought about <strong>in</strong> a stateif the operator is applied. Follow<strong>in</strong>g strips [Fikes & Nilsson 1971], the eects areusually represented as a list of facts to be deleted from the current state and a list offacts to be added.This thesis builds on the Prodigy planner [M<strong>in</strong>ton et al. 1989; Carbonell et al.1992; Veloso et al. 1995], a doma<strong>in</strong>-<strong>in</strong>dependent planner that was designed as a testbedfor learn<strong>in</strong>g and problem solv<strong>in</strong>g. Prodigy provides a rich language for specify<strong>in</strong>gsearch control knowledge and <strong>in</strong>cludes a number of <strong>in</strong>dependent mach<strong>in</strong>e learn<strong>in</strong>gmodules that can produce control knowledge from experience <strong>in</strong> a particular doma<strong>in</strong>or use saved cases.


1.2. An example plann<strong>in</strong>g problem 3<strong>Uncerta<strong>in</strong>ty</strong> can enter a plann<strong>in</strong>g doma<strong>in</strong> through a number of dierent types ofcause. The <strong>in</strong>itial state may not be completely known, either because it is not completelyobservable by the agent or because of delays between form<strong>in</strong>g and execut<strong>in</strong>ga plan. The outcomes of actions taken <strong>in</strong> the doma<strong>in</strong> may be non-determ<strong>in</strong>istic andother agents may bechang<strong>in</strong>g the world though exogenous events that are not completelyknown or predictable. F<strong>in</strong>ally the goals of the agent maychange over time,perhaps unpredictably. For each of these sources, partial <strong>in</strong>formation is still useful,for example a subset of the possible <strong>in</strong>itial states that is guaranteed to conta<strong>in</strong> theactual <strong>in</strong>itial state, or a probability that some new goal will be given over a xed timescale can be helpful <strong>in</strong> build<strong>in</strong>g a work<strong>in</strong>g plan.This thesis presents a planner that can handle uncerta<strong>in</strong>ty from three types ofcause: more than one possible <strong>in</strong>itial state, non-determ<strong>in</strong>istic outcomes of actionsand non-determ<strong>in</strong>istic exogenous events. For each of these, the planner requires aprobabilistic representation. It then attempts to nd a plan that exceeds a m<strong>in</strong>imumdesired probability of success.The presence of exogenous events means that each step <strong>in</strong> a plan that has an appreciableduration is eectively nondeterm<strong>in</strong>istic. Explicitly consider<strong>in</strong>g each of thesealternative outcomes could lead to a comb<strong>in</strong>atorial explosion <strong>in</strong> the separate cases thata planner would need to consider. The plann<strong>in</strong>g algorithm present <strong>in</strong> the thesis, calledWeaver, addresses this problem by lazily <strong>in</strong>clud<strong>in</strong>g sources of uncerta<strong>in</strong>ty <strong>in</strong> its evaluationof a plan. This is done build<strong>in</strong>g creat<strong>in</strong>g a plan <strong>in</strong> two alternat<strong>in</strong>g steps, plancreation and plan evaluation. In the plan creation step, Weaver chooses an outcomefor a step that is desired <strong>in</strong> order to achieve a goal and does not explicitly considerthe other outcomes, although it does make sure the outcome is reasonably likely. Inthe plan evaluation step, all sources of uncerta<strong>in</strong>ty can potentially be considered, butonly those that are shown to have an aect on the plan's chance of success are explicitlyrepresented. After this, the plan creation step is then re-entered, with one ormore of the sources of uncerta<strong>in</strong>ty that are known to be important be<strong>in</strong>g explicitlyconsidered.1.2 An example plann<strong>in</strong>g problemConsider for example a plann<strong>in</strong>g doma<strong>in</strong> describ<strong>in</strong>g an oil tanker that has run agroundnear the coast and is beg<strong>in</strong>n<strong>in</strong>g to leak oil <strong>in</strong>to the sea. The goal is to stop theoil from pollut<strong>in</strong>g the water or reach<strong>in</strong>g any ofseveral areas of coastl<strong>in</strong>e along theshore. The available actions <strong>in</strong>clude pump<strong>in</strong>g the oil from the tanker <strong>in</strong>to a secondaryvessel, surround<strong>in</strong>g the tanker with booms to conta<strong>in</strong> the oil, us<strong>in</strong>g booms and otherequipment to prevent the oil from reach<strong>in</strong>g the shore and clean<strong>in</strong>g oil up from eitherthe sea or the shore. Clean<strong>in</strong>g up oil from the water is successful with 70% probability.The amount of oil spilled from the tanker is a stochastic process represented as anexogenous event. Its path of travel once <strong>in</strong> the water is also treated as an exogenousevent. Many of the pieces of equipment used <strong>in</strong> the recovery process require good


4 Chapter 1. Introductionweather conditions and change <strong>in</strong> the weather is also modelled with exogenous events.Weaver beg<strong>in</strong>s by ignor<strong>in</strong>g the sources of uncerta<strong>in</strong>ty <strong>in</strong> the doma<strong>in</strong>. An <strong>in</strong>itialplan might put equipment <strong>in</strong> place and pump the oil from the tanker <strong>in</strong>to anothervessel. When this plan is evaluated, those sources of uncerta<strong>in</strong>ty that can aect theplan are considered. For example, <strong>in</strong> the time taken to move the vessel and the pump,oil may spill from the tanker and the weather around the tanker maychange. Howevernondeterm<strong>in</strong>istic changes to the world, such asweather changes <strong>in</strong> other parts of thesea, are still ignored. The plan critic will select a potential aw <strong>in</strong> the plan to have theplanner improve, for example that oil can be spilled. This event isnow representedto the planner, which may produce a plan to clean oil from the water if it is spilled,or to move booms around the tanker to prevent the oil be<strong>in</strong>g spilled.In either case oil may still be <strong>in</strong> the water when the new plan is completed,although with lower probability than <strong>in</strong> the orig<strong>in</strong>al plan. On the next improvement,the plan critic might direct the planner to pay attention to the situation where oilmight reach some shorel<strong>in</strong>e, and the planner might create a conditional branch <strong>in</strong>the plan to protect the shore or clean it up <strong>in</strong> that case. Although an alternativeplan might always protect the shore without test<strong>in</strong>g the state, <strong>in</strong> some cases a higherprobability of success can be achieved with a conditional plan than otherwise. If thereare not enough resources, for example, to protect both potentially threatened piecesof shorel<strong>in</strong>e, a branch<strong>in</strong>g plan that tests the oil's position before allocat<strong>in</strong>g equipmentwould be more successful than any non-branch<strong>in</strong>g plan.1.3 ContributionsThe thesis describes a novel algorithm for produc<strong>in</strong>g conditional plans based on theProdigy planner. The ma<strong>in</strong> contributions of the thesis are techniques that allowthe planner both to nd plans <strong>in</strong> situations where there can be many sources ofuncerta<strong>in</strong>ty and to reason eciently about these plans. These techniques can bedivided <strong>in</strong>to those that aect the process of plan creation and those that aect theprocess of plan evaluation.Plan creationAlthough previous uncerta<strong>in</strong> planners, such as Buridan and C-Buridan [Kushmerick,Hanks, & Weld 1995], represented probabilistic eects of actions, their techniquesdid not scale well because of the jo<strong>in</strong>t expenses of handl<strong>in</strong>g all the cont<strong>in</strong>genciesthe planner may need to consider and evaluat<strong>in</strong>g a plan with a number of sourcesof uncerta<strong>in</strong>ty. The problem is more severe when uncerta<strong>in</strong>ty through exogenousevents is considered, because there can be a number of dierent result<strong>in</strong>g states afterapply<strong>in</strong>g any action, rather than just those that have explicitly marked uncerta<strong>in</strong>outcomes, and the number of such states climbs exponentially with the number ofexogenous events.


1.3. Contributions 5A contribution of this thesis is an implemented planner that selectively ignoressome of the sources of uncerta<strong>in</strong>ty while choos<strong>in</strong>g actions for the plan, and <strong>in</strong> aseparate evaluation stage considers the sources that are known to be relevant to thecandidate plan, ignor<strong>in</strong>g others. This process is iterative, with the planner ren<strong>in</strong>g itscandidate plan and becom<strong>in</strong>g aware of more sources of uncerta<strong>in</strong>ty on each iteration.Us<strong>in</strong>g this technique, the planner is the rst to handle uncerta<strong>in</strong> exogenous events.Two further contributions of the thesis are aimed at improv<strong>in</strong>g the eciencyof plan creation. First, based on a study of the algorithm a family of doma<strong>in</strong><strong>in</strong>dependentsearch heuristics is proposed that relax the assumptions of <strong>in</strong>dependencetypically made <strong>in</strong> probabilistic planners, us<strong>in</strong>g the observation that the failure of acandidate plan can be used to condition the probability that the modication will besuccessful. Heuristics from this family are demonstrated. Second, a variant of<strong>in</strong>ternalanalogy is used to reduce the amount of duplicated work <strong>in</strong> dierent conditionalbranches of the plans produced.Plan evaluationA plan's probability of success is computed by automatically creat<strong>in</strong>g a Bayesianbelief net to represent the plan and evaluat<strong>in</strong>g it us<strong>in</strong>g a standard jo<strong>in</strong> tree method[Pearl 1988]. A contribution of this thesis is an algorithm to produce a belief net thatcan be eciently evaluated when there are exogenous events. The algorithm exploitstwo properties that are expected to be common <strong>in</strong> probabilistic plann<strong>in</strong>g doma<strong>in</strong>s:a representation for exogenous events that uses stationary probabilities, and a highvariability <strong>in</strong> the length of establishment<strong>in</strong>tervals of doma<strong>in</strong>-level properties <strong>in</strong> plans.The rst property makes it possible to use Markov cha<strong>in</strong>s to summarise the actionof exogenous events over time and the second property makes it useful to do so. Ibriey expla<strong>in</strong> this approach below, and describe it <strong>in</strong> more detail <strong>in</strong> Chapter 5.In the representation for exogenous events developed <strong>in</strong> this thesis, events haveprobabilities of occurrence and probability distributions for their possible outcomes.All of these probabilities may depend on the state of the world at the time the eventmay occur, but do not otherwise depend on the time of occurrence. This stationaryproperty of the probabilities makes it possible to dene Markov cha<strong>in</strong>s, whose nodesrepresent abstract world states, that can be used to <strong>in</strong>fer the probability of futureabstract world states.The actions that planners consider can often take widely vary<strong>in</strong>g amounts of timeto perform. For <strong>in</strong>stance if a planner considers mov<strong>in</strong>g a boat or aircraft to a desiredlocation, the action may take from <strong>under</strong> an hour to several days depend<strong>in</strong>g on thecurrent location and state of read<strong>in</strong>ess of the boat or aircraft. Because of thesewide variations, the lengths of time between when the value of a doma<strong>in</strong> variable isestablished <strong>in</strong> a plan and when it is required can also vary greatly. We call such an<strong>in</strong>terval of time an establishment <strong>in</strong>terval for the variable. We see that if time weredivided <strong>in</strong>to units of equal length when a belief net is created to evaluate the plan,some establishment <strong>in</strong>tervals would necessarily cross many time units, and if belief


6 Chapter 1. Introductionnet nodes at one time po<strong>in</strong>t were connected to parents at the previous time po<strong>in</strong>t,long paths of nodes <strong>in</strong> the belief net would be created. The time to evaluate thisbelief net would be at least l<strong>in</strong>ear <strong>in</strong> the length of these paths and frequently worse,s<strong>in</strong>ce belief net evaluation is NP-hard <strong>in</strong> general [Cooper 1990].One can use the ability to describe exogenous events us<strong>in</strong>g Markov cha<strong>in</strong>s toescape this diculty. The probability distribution of states <strong>in</strong> a Markovcha<strong>in</strong> at somefuture time t given the distribution at time 0 can be computed <strong>in</strong> time logarithmic<strong>in</strong> t. Methods to dene such Markov cha<strong>in</strong>s that guarantee correctness are given <strong>in</strong>Chapter 3. The cha<strong>in</strong>s that are computed might have exponential size <strong>in</strong> the numberof exogenous events <strong>in</strong> the worst case. The improved time performance <strong>in</strong> evaluat<strong>in</strong>gthe result<strong>in</strong>g belief net outweighs the cost of comput<strong>in</strong>g the Markov cha<strong>in</strong>s when theestablishment <strong>in</strong>tervals vary suciently and the result<strong>in</strong>g cha<strong>in</strong>s are not too large. Ishow <strong>in</strong> Chapter 7 that there are signicant benets <strong>in</strong> the oil-spill doma<strong>in</strong> and it islikely that this is the case <strong>in</strong> a large number of <strong>in</strong>terest<strong>in</strong>g plann<strong>in</strong>g doma<strong>in</strong>s.Limitations <strong>in</strong> scopeCreat<strong>in</strong>g robust plans <strong>under</strong> uncerta<strong>in</strong>ty is a hard problem that requires a muchbroader set of algorithms and approaches than can be dealt with properly <strong>in</strong> onethesis. Handl<strong>in</strong>g some of the aspects of the problem that have been ignored herewould make avery <strong>in</strong>terest<strong>in</strong>g extension to this work. In particular, the plannerdescribed here does not address plan monitor<strong>in</strong>g or re-plann<strong>in</strong>g. After the <strong>in</strong>itial planis created, a complete system would beg<strong>in</strong> to check certa<strong>in</strong> key doma<strong>in</strong> features asthe plan is executed, updat<strong>in</strong>g its beliefs that the various alternative courses of actionwill be successful, and revis<strong>in</strong>g part or all of the plan if it was deemed necessary. Thepotential need to revise a plan could also aect the <strong>in</strong>itial plan, s<strong>in</strong>ce an ideal planwould not make early commitments that would reduce later options for revision. Infact the objective measure of a robust plan might <strong>in</strong>clude the potential for re-plann<strong>in</strong>g.In this thesis the objective measure views the plan as a static, complete object. Whilethe belief net approach lends itself to plan monitor<strong>in</strong>g and belief updates, this aspectis not addressed.Also ignored is the important issue of the source of and condence <strong>in</strong> the planner'sknowledge about its uncerta<strong>in</strong>ty. For example s<strong>in</strong>gle po<strong>in</strong>t probabilities are givenfor each possible action outcome rather than <strong>in</strong>tervals of probability. Aga<strong>in</strong> theapproach taken to plan evaluation makes some exam<strong>in</strong>ation possible of the importanceof the assumptions about uncerta<strong>in</strong>ty, via sensitivity analysis, but this has not beenaddressed.1.4 Reader's guide to the thesisThe next chapter surveys some of the related work to this thesis. Chapter 3 describesthe model used to describe plann<strong>in</strong>g problems that conta<strong>in</strong> sources of uncerta<strong>in</strong>ty.


1.4. Reader's guide to the thesis 7This model is based on the Markov decision process although the planner does notexplicitly build one. Chapter 4 describes the plann<strong>in</strong>g algorithm used <strong>in</strong> Weaver andprovides a simple example. Chapter 5 presents a method to improve the eciency ofevaluat<strong>in</strong>g plans, and chapter 6 presents methods to improve the eciency of plann<strong>in</strong>gwhen there is uncerta<strong>in</strong>ty. Chapter 7 presents an empirical analysis of the plann<strong>in</strong>gproblem and the methods to improve eciency <strong>in</strong> a large plann<strong>in</strong>g doma<strong>in</strong>. F<strong>in</strong>ally,chapter 8 summarises the ma<strong>in</strong> contributions of the thesis.


8 Chapter 1. Introduction


Chapter 2Related workWork <strong>in</strong> automatic plann<strong>in</strong>g began early <strong>in</strong> the eld of articial <strong>in</strong>telligence, with suchprograms as GPS [Newell & Simon 1963] and strips [Fikes & Nilsson 1971]. As wellas <strong>in</strong>troduc<strong>in</strong>g new plann<strong>in</strong>g algorithms [Chapman 1987; McAllester & Rosenblitt1991], later work considered more general conditions, for example where externalevents could take place dur<strong>in</strong>g plan execution [Vere 1983], where the doma<strong>in</strong> theoryis <strong>in</strong>complete or <strong>in</strong>correct [Gil 1992; Wang 1996; Gervasio 1996] or where the agentmust consider the relative quality of alternative plans [Perez 1995].There has been a wide variety ofwork done <strong>in</strong> the design of systems for plann<strong>in</strong>g<strong>under</strong> uncerta<strong>in</strong>ty. In this chapter I provide a survey of some of this work, po<strong>in</strong>t outthe work most closely related to this thesis and show how the thesis contributes tothebodyofwork <strong>in</strong> this area.2.1 Overview of \classical" plann<strong>in</strong>g.Broadly, plann<strong>in</strong>g systems aim to construct a pattern of behaviour for a performancesystem to accomplish some given goal <strong>in</strong> its environment. Most plann<strong>in</strong>g systems aregiven a representation for possible actions that can be taken <strong>in</strong> the environment <strong>in</strong>terms of the conditions required to hold for the action to be taken and the eects onthe environment when the action is taken. Systems referred to as \classical" plann<strong>in</strong>gsystems typically make the follow<strong>in</strong>g additional assumptions: The environment can be represented <strong>in</strong> a sequence of discrete states, which arerepresented us<strong>in</strong>g a logical language. The actions are represented <strong>in</strong> a language similar to that used by strips [Fikes& Nilsson 1971], <strong>in</strong> which the action's preconditions form a logical sentence overthe language used to dene states, and the action's eects are represented by aset of terms to be added to or deleted from the terms used to represent a state. The goal is a property of a state, characterised by a logical sentence (ratherthan a property of a sequence of states).9


10 Chapter 2. Related workIn addition to these assumption, classical planners return a plan <strong>in</strong> the form of asequence of actions, either totally or partially ordered.This is a much studied family <strong>in</strong> AI plann<strong>in</strong>g because the assumptions are powerfuland seem reasonable. An <strong>in</strong>terest<strong>in</strong>g range of behaviours can be captured withstrips-style operators and they allow a planner to perform backward cha<strong>in</strong><strong>in</strong>g s<strong>in</strong>cegiven a goal, one can subgoal on the preconditions required for an action or sequenceactions to achieve the goal. In addition the logical representation allows a plan to beproved correct. Prodigy, the plann<strong>in</strong>g system extended <strong>in</strong> this thesis, is a classicalplanner accord<strong>in</strong>g to this denition [Veloso et al. 1995]. A more detailed description ofProdigy's action representation, which is an example of a strips-like representation,can be found <strong>in</strong> Section 3.1.Work <strong>in</strong> classical plann<strong>in</strong>g has typically focussed on improv<strong>in</strong>g the eciency withwhich a plan is created. For example partial-order planners, <strong>in</strong>troduced <strong>in</strong> the lateeighties [Chapman 1987; McAllester & Rosenblitt 1991; Penberthy &Weld 1992],attempt to reduce the search space us<strong>in</strong>g a \least commitment" pr<strong>in</strong>ciple where steps<strong>in</strong> a plan are only ordered with respect to one another as required to prove that theplan is successful. Planners can be made faster by solv<strong>in</strong>g a series of abstractions of theplann<strong>in</strong>g problem, each more detailed until the problem is solved <strong>in</strong> full complexity,at each stage us<strong>in</strong>g the solution from the previous stage [Sacerdoti 1974; Yang &Tenenberg 1990; Knoblock 1991]. Work has also been done on controll<strong>in</strong>g search withvarious heuristics [Blythe & Veloso 1992; Gerev<strong>in</strong>i & Schubert 1996; Pollack, Josl<strong>in</strong>,&Paolucci 1997] and with explicit control rules [M<strong>in</strong>ton 1988; M<strong>in</strong>ton et al. 1989].The strips action representation does not support non-determ<strong>in</strong>istic action outcomesand classical planners do not have a means to represent sources of change <strong>in</strong>the doma<strong>in</strong> other than the actions taken by the performance agent. These systemstherefore cannot be used for plann<strong>in</strong>g problems <strong>in</strong>volv<strong>in</strong>g uncerta<strong>in</strong>ty. However theideas and algorithms developed <strong>under</strong> the assumptions of classical plann<strong>in</strong>g form thebasis of many approaches to plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty <strong>in</strong>clud<strong>in</strong>g the ones described<strong>in</strong> this thesis.2.2 Reactive plann<strong>in</strong>g and approaches that mixplann<strong>in</strong>g with execution.When a plan is executed <strong>in</strong> a stochastic doma<strong>in</strong> <strong>in</strong> which <strong>in</strong>itial conditions and actionoutcomes are uncerta<strong>in</strong>, many dierent scenarios can take place as a result of theexecution. A closed-loop plan, one that conta<strong>in</strong>s no sens<strong>in</strong>g and always executesthe same set of actions, will usually be too brittle to achieve high reliability <strong>in</strong> sucha doma<strong>in</strong>. Instead, plann<strong>in</strong>g systems must create branch<strong>in</strong>g plans that take <strong>in</strong>toaccount the <strong>in</strong>termediate states reached while the plan is be<strong>in</strong>g executed and takedierent actions accord<strong>in</strong>gly.Some approaches to this problem, motivated <strong>in</strong> addition by real-time constra<strong>in</strong>ts,aim to create strategies for behaviour <strong>in</strong> which a system repeatedly senses its envi-


2.2. Reactive plann<strong>in</strong>g and approaches that mix plann<strong>in</strong>g with execution. 11ronment and chooses an action based on this <strong>in</strong>formation, similar to a policy ona Markov decision process described <strong>in</strong> Section 2.4. These systems have no explicitrepresentation of a global plan and some, such as Brook's subsumption architecture[Brooks 1986] and Agre and Chapman's Pengi [Agre & Chapman 1987],have little or no <strong>in</strong>ternal state. The subsumption architecture arranges reactive subsytems<strong>in</strong>to hierarchies, so that higher-level behaviour is achieved by one systemover-rid<strong>in</strong>g, or subsum<strong>in</strong>g, a more basic system. Neither of these systems, however,shows how a reactive plan could be built from a declarative description ofthe environment. Schoppers [Schoppers 1989a] develops a similar scheme <strong>in</strong> his\universal plans" and describes how they can be created automatically. Other reactiveplann<strong>in</strong>g systems such asprs [George & Lansky 1987], rap [Firby 1987;1989] and hap [Loyall & Bates 1991] do <strong>in</strong>clude <strong>in</strong>ternal state | prs represents theagent's beliefs, desires and <strong>in</strong>tentions explicitly | and also control structures such asiteration.These systems can produce appropriate behaviour <strong>under</strong> dierent execution conditionsbut for each cont<strong>in</strong>gency that arises the appropriate response must be programmedby ahuman <strong>in</strong> advance. In many doma<strong>in</strong>s, however, it is not feasible tospecify the correct action <strong>in</strong> each possible situation <strong>in</strong> advance. Instead we would liketo comb<strong>in</strong>e the speed of a reactive planner <strong>in</strong> familiar situations with the exibilityof a classical planner <strong>in</strong> situations not previously anticipated. Systems have beendeveloped for this purpose that comb<strong>in</strong>e reactive execution with classical plann<strong>in</strong>g,us<strong>in</strong>g the latter when no pre-programmed response is available for some cont<strong>in</strong>gency.Both the Theo-agent [Blythe & Mitchell 1989] and the \anytime synthetic projection"technique of Drummond and Bres<strong>in</strong>a [Drummond & Bres<strong>in</strong>a 1990] followreactive rules if they are present, and otherwise fall back on plann<strong>in</strong>g and then compilethe result<strong>in</strong>g plan <strong>in</strong>to new reactive rules to be used <strong>in</strong> future episodes. The twosystems dier on the action and goal representation and on the plann<strong>in</strong>g technique.The Theo-agent uses a strips-like action representation and backward cha<strong>in</strong><strong>in</strong>g. Inanytime synthetic projection, forward cha<strong>in</strong><strong>in</strong>g is used with a richer action and goallanguage that can specify probabilistic outcomes, exogenous events and goals to ma<strong>in</strong>ta<strong>in</strong>predicates over time <strong>in</strong>tervals.A mixed plann<strong>in</strong>g and execution strategy can provide further advantages if it is<strong>under</strong> explicit control. Some goals can be planned for <strong>in</strong> advance, while others can bedeferred until part way through the execution, when there may be more <strong>in</strong>formationabout the best course of action. By not always giv<strong>in</strong>g priority to reactive rules, themixed-strategy system can avoid pitfalls <strong>in</strong> some cases. On the other hand, delay<strong>in</strong>gplann<strong>in</strong>g for these goals can drastically reduce the number of alternatives to consider.Gervasio shows how to build \completable plans" which are designed to be <strong>in</strong>complete,and amenable to be<strong>in</strong>g further elaborated dur<strong>in</strong>g execution [Gervasio & DeJong1994; Gervasio 1996]. Goodw<strong>in</strong> [Goodw<strong>in</strong> 1994] considers the question of when toswitch from plann<strong>in</strong>g to execut<strong>in</strong>g <strong>in</strong> a time-dependent plann<strong>in</strong>g problem. Onder andPollack [Onder & Pollack 1997] describe a probabilistic planner that reasons aboutwhich cont<strong>in</strong>gencies to plan for before execution and which to defer. Wash<strong>in</strong>gton


12 Chapter 2. Related workmakes use an abstraction hierarchy, plann<strong>in</strong>g at some abstract level before executionand pick<strong>in</strong>g a concrete <strong>in</strong>stance of the abstract plan dur<strong>in</strong>g execution [Wash<strong>in</strong>gton1994].2.3 Extensions to classical plann<strong>in</strong>g systemsWith<strong>in</strong> the broader context of systems for plann<strong>in</strong>g and execution <strong>in</strong> uncerta<strong>in</strong> environments,some recent work has focussed on creat<strong>in</strong>g a plan before execution thataccounts for a signicant subset of the cont<strong>in</strong>gencies that might occur dur<strong>in</strong>g execution,but not necessarily all of them. For this purpose it is useful to have some wayto compare the dierent cont<strong>in</strong>gencies to nd the most important. A probabilisticrepresentation of the sources of uncerta<strong>in</strong>ty iswell suited for this, although somehave used a Dempster-Shaer formalism [Mansell 1993], fuzzy reason<strong>in</strong>g [Bonissone& Dutta 1990] or a m<strong>in</strong>imax approach [Koenig & Simmons 1995].Work on extensions to plann<strong>in</strong>g systems us<strong>in</strong>g decision theory began shortly afterthe <strong>in</strong>itial work on strips [Feldman & Sproull 1977], but this l<strong>in</strong>e of work waslargely discont<strong>in</strong>ued until the 1990's. Wellman lays the ground-work for develop<strong>in</strong>gprobabilistic extensions to classical plann<strong>in</strong>g systems, po<strong>in</strong>t<strong>in</strong>g out <strong>in</strong> [Wellman1990b] that the strips representation of action embodies a Markov assumption |s<strong>in</strong>ce the success of an operator depends on its preconditions, this is conditionally<strong>in</strong>dependent of all other knowledge if the current state is known. This makes it possibleto factor the formula that describes the probability of plan success, us<strong>in</strong>g forexample the standard techniques for Bayesian belief networks [Pearl 1988]. Wellmanalso shows [Wellman 1990a] how proofs that one partial plan \dom<strong>in</strong>ates" another,<strong>in</strong> that each of its possible completions will have a higher utility than those of theother, can be used to reduce the search space for high-utility plans. Goldszmidt andDarwiche [Goldszmidt & Darwiche 1995] also discuss us<strong>in</strong>g belief nets to evaluateplans, but unlike the work described <strong>in</strong> this thesis they do not construct belief netsautomatically. In another important paper, Peot and Smith describe an extensionto the snlp algorithm, called cnlp for build<strong>in</strong>g conditional plans when some of thedoma<strong>in</strong> actions can have non-determ<strong>in</strong>istic eects [Peot & Smith 1992].Among the best-known probabilistic planners are Buridan [Kushmerick, Hanks,&Weld 1995] and C-Buridan [Draper, Hanks, & Weld 1994]. Buridan uses a variantof the snlp algorithm, and a strips-like action representation with probabilisticeects. Each action has a set of triggers, each of which is a conjunction of terms thatdescribe the world state, and each is connect to a probability distribution of eectsets. Each eect set <strong>in</strong> this distribution is a collection of terms to be added to ordeleted from the state, as <strong>in</strong> a strips action. Buridan's <strong>in</strong>put <strong>in</strong>cludes a probabilitythreshold value as well as a logical goal, and it tries to nd a plan that succeeds withprobability at least equal to the threshold value. Weaver's action representation issimilar to Buridan's and is described <strong>in</strong> detail <strong>in</strong> Section 3.2.One way that Buridan extends snlp is by be<strong>in</strong>g able to propose more than one


2.4. Approaches based on Markov decision processes 13action <strong>in</strong> its plan to achieveany goal or subgoal. This is necessary s<strong>in</strong>ce each <strong>in</strong>dividualaction may fail to achieve the goal with some probability. Buridan can provably nda plan that achieves the threshold probability of success, if a non-branch<strong>in</strong>g planexists that does so. C-Buridan is an extension to Buridan that can create branch<strong>in</strong>gplans, us<strong>in</strong>g a technique similar to that <strong>in</strong>troduced with cnlp [Peot & Smith 1992].Essentially, one new way that C-Buridan can x a candidate plan <strong>in</strong> whichtwo actionsmight <strong>in</strong>terfere with each other is to add a test and restrict the actions to dierentconditional branches based on the test. Information about the branch that an actionbelongs to is then propagated to the action's descendants and ancestors <strong>in</strong> the goaltree.Cassandra [Pryor & Coll<strong>in</strong>s 1993; 1996] is another planner based on snlp anduses an explicit representation of decision steps. Bagchi et al. describe a plannerthat uses a spread<strong>in</strong>g activation technique to choose actions with highest expectedutility [Bagchi, Biswas, & Kawamura 1994]. Onder and Pollack also extend snlpwith explicit reason<strong>in</strong>g about the cont<strong>in</strong>gencies to plan for [Onder & Pollack 1997].Haddawy's thesis work [Haddawy 1991] <strong>in</strong>troduces a logic with extensions to representprobabilities of terms, <strong>in</strong>tended for represent<strong>in</strong>g plans. He and his group havedeveloped hierarchical task-network (htn) planners that use a similar representation[Haddawy & Suwandi 1994; Haddawy, Doan, & Goodw<strong>in</strong> 1995; Haddawy, Doan, &Kahn 1996].While this thesis concentrates ma<strong>in</strong>ly on extensions to classical plann<strong>in</strong>g thatsupport probabilistic plann<strong>in</strong>g <strong>in</strong> uncerta<strong>in</strong> doma<strong>in</strong>s, other work develops extensionsto enable planners to nd high-quality plans accord<strong>in</strong>g to some specied criteria.Perez [Perez & Carbonell 1994; Perez 1995] describes how to learn control knowledgefrom <strong>in</strong>teractions with a human expert that allows Prodigy to produce higher-qualityplans. Williamson and Hanks [Williamson & Hanks 1994; Williamson 1996] developa decision-theoretic extension to ucpop.The planner described <strong>in</strong> this thesis is unique <strong>in</strong> its ability to reason explicitlyabout exogenous events, and to perform a relevance analysis of those events. Vere[Vere 1983] <strong>in</strong>troduced a planner that could reason about exogenous events that wereknown to occur at some xed time, but not about uncerta<strong>in</strong> events.2.4 Approaches based on Markov decision processesIn the past ve years, much of the research activity <strong>in</strong> AI plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>tyhas built on standard techniques for solv<strong>in</strong>g Markov decision problems (mdps) andpartially-observable Markov decision problems pomdps [Dean et al. 1995; Littman1996; Boutilier, Brafman, & Geib 1997]. This formalism has the advantage of a soundmathematical <strong>under</strong>p<strong>in</strong>n<strong>in</strong>g, and the standard solution techniques aim at an optimalpolicy while most systems based on classical plann<strong>in</strong>g try to meet a lower bound.However these standard techniques are not tractable s<strong>in</strong>ce they require enumerat<strong>in</strong>g


14 Chapter 2. Related workall the states of the system. Work to make mdp approaches tractable has attemptedto <strong>in</strong>corporate ideas from classical plann<strong>in</strong>g and other areas of AI to exploit any<strong>under</strong>ly<strong>in</strong>g structure <strong>in</strong> the state space [Dearden & Boutilier 1997] as well as to relaxthe requirement of optimalty [Parr & Russell 1995].In this section I give a brief overview of Markov decision processes and surveysome of the work <strong>in</strong> this area. This thesis pursues an approach based on classicalplann<strong>in</strong>g, but uses an mdp representation to describe plann<strong>in</strong>g problems and theirsolutions.2.4.1 Overview of Markov Decision ProcessesThis description of Markov decision processes follows [Littman 1996] and [Boutilier,Dean, & Hanks 1995]. A Markov decision process M is a tuple M =< S;A;;R >where S is a nite set of states of the system.Ais a nite set of actions. :AS ! (S) is the state transition function, map<strong>in</strong>g an action and a state toa probability distribution over S for the possible result<strong>in</strong>g state. The probabilityof reach<strong>in</strong>g state s 0 by perform<strong>in</strong>g action a <strong>in</strong> state s is written (a; s; s 0 ). R:S A !Ris the reward function. R(s; a) is the reward the system receivesif it takes action a <strong>in</strong> state s.A policy for an mdp is a mapp<strong>in</strong>g :S ! A that selects an action for each state.Given a policy,we can dene its nite-horizon value function V n:S !R, where V (s) nis the expected value of apply<strong>in</strong>g the policy for n steps start<strong>in</strong>g <strong>in</strong> state s. This isdened <strong>in</strong>ductively with V 0(s) =R(s; (s)) andV m(s) =R(s; (s)) + X u2S((s);s;u)V m,1(u)Over an <strong>in</strong>nite horizon, a discounted model is frequently used to ensure policieshave a bounded expected value. For some chosen so that


2.4. Approaches based on Markov decision processes 15V that is optimal regardless of the start<strong>in</strong>g state [Howard 1960], which satises thefollow<strong>in</strong>g equation:V (s) = max fR(s; a)+X (a; s; u)V (u)gau2STwo popular methods for solv<strong>in</strong>g this equation and nd<strong>in</strong>g an optimal policy foran mdp are value iteration and policy iteration [Puterman 1994].In policy iteration, the current policy is repeatedly improved by nd<strong>in</strong>g someaction <strong>in</strong> each state that has a higher value than the action chosen by the currentpolicy for that state. The policy is <strong>in</strong>itially chosen at random, and the processterm<strong>in</strong>ates when no improvement can be found. Tha algorithm is shown <strong>in</strong> Table 2.1.This process converges to an optimal policy [Puterman 1994].Policy-Iteration(S; A; ;R;):1. For each s 2 S, (s) = RandomElement(A)2. Compute V (:)3. For each s 2 S f4. F<strong>in</strong>d some action a such thatR(s; a)+ P u2S(a; s; u)V (u) >V (s)5. Set 0 (s) =aif such anaexists,6 otherwise set 0 (s) =(s).g7. If 0 (s) 6= (s) for some s 2 S goto 2.8. Return Table 2.1: The policy iteration algorithmIn value iteration, optimal policies are produced for successively longer nite horizons,until they converge. It is relatively simple to nd an optimal policy over n steps(:), with value function V (:), us<strong>in</strong>g the recurrence relation: nn n (s) = argmax afR(s; a)+ X u2S(a; s; u)V n,1 (u)gwith start<strong>in</strong>g condition V 0 (s) =08s2S, where V m is derived from the policy m asdescribed above. Table 2.2 shows the value iteration algorithm, which takes an mdp,a discount value and a parameter and produces successive nite-horizon optimalpolicies, term<strong>in</strong>at<strong>in</strong>g when the maximum change <strong>in</strong> values between the current andprevious value functions is below . It can also be shown that the algorithm convergesto the optimal policy for the discounted <strong>in</strong>nite case <strong>in</strong> a number of steps which ispolynomial <strong>in</strong> jSj, jAj, log max s;a jR(s; a)j and 1=(1 , ).2.4.2 <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> uncerta<strong>in</strong>ty with MDPsThe algorithms described above can nd optimal policies <strong>in</strong> polynomial time <strong>in</strong> thesize of the state space of the mdp. However, this state space is usually exponentially


16 Chapter 2. Related workValue-Iteration(S; A; ;R;;):1. for each s 2 S, V 0 (s) =02. t =03. t = t +14. for each s 2 S f5. for each a 2 A6. Q t (s; a) =R(s; a)+ P u2S(a; s; u)V t,1 (u)7. t (s) = argmax aQ t (s; a)8. V t (s) =Q t (s; t (s))g9. if ( max s jV t (s) , V t,1 (s)j ) goto 310. return tTable 2.2: The value iteration algorithmlarge <strong>in</strong> the <strong>in</strong>puts to a plann<strong>in</strong>g problem, which <strong>in</strong>cludes a set of literals whose crossproduct describes the state space. Attempts to build on these and other techniquesfor solv<strong>in</strong>g mdps have concentrated on ways to ga<strong>in</strong> leverage from the structure ofthe plann<strong>in</strong>g problem to reduce the computation time require.Dean et al. used policy iteration <strong>in</strong> a restricted state space called an envelope[Dean et al. 1993]. A subset of the states is selected, and each transition <strong>in</strong> themdp that leaves the subset is replaced with a new transition to a new state OUTwith zero reward. No transitions leave the OUT state. They developed an algorithmthat alternated between solv<strong>in</strong>g the restricted-space mdp with policy iteration andexpand<strong>in</strong>g the envelope by <strong>in</strong>clud<strong>in</strong>g the n most likely elements of the state space to bereached by the optimal policy that were not <strong>in</strong> the envelope. The algorithm convergesto an optimal policy considerably more quickly than standard policy iteration on thewhole state space, but as the authors po<strong>in</strong>t out [Dean et al. 1995], it makes someassumptions which limit its applicability, <strong>in</strong>clud<strong>in</strong>g that of a sparse mdp <strong>in</strong> whicheach state has only a small number of outward transitions. Tash and Russell extendthe idea of an envelope with an <strong>in</strong>itial estimate of distance-to-goal for each state anda model that takes the time of computation <strong>in</strong>to account [Tash & Russell 1994].While the envelope extension method ignores portions of the state space, othertechniques have considered abstractions of the state space that try to group togethersets of states that behave similarly <strong>under</strong> the chosen actions of the optimal policy.Boutilier and Dearden [Boutilier & Dearden 1994] assume a representation for actionsthat is similar to that used <strong>in</strong> Buridan [Kushmerick, Hanks, & Weld 1994] described<strong>in</strong> Section 2.3 and a state utility function that is described <strong>in</strong> terms of doma<strong>in</strong> literals.They then pick a subset of the literals that account for the greatest variation <strong>in</strong> thestate utility and use the action representation to nd literals which can directly or<strong>in</strong>directly aect the chosen set, us<strong>in</strong>g a technique similar to the one developed byKnoblock for build<strong>in</strong>g abstraction hierarchies for classical planners [Knoblock 1991].This subset of literals then forms the basis for an abstract mdp by projection of theorig<strong>in</strong>al states. S<strong>in</strong>ce the state space size is exponential <strong>in</strong> the set of literals, this


2.4. Approaches based on Markov decision processes 17reduction can lead to considerable time sav<strong>in</strong>gs over the orig<strong>in</strong>al mdp. Boutilier andDearden prove bounds on the dierence <strong>in</strong> value of the abstract policy compared withan optimal policy <strong>in</strong> the orig<strong>in</strong>al mdp.L<strong>in</strong> and Dean further rene this idea by splitt<strong>in</strong>g the mdp <strong>in</strong>to subsets and allow<strong>in</strong>ga dierent abstraction of the states to be considered <strong>in</strong> each one [Dean & L<strong>in</strong> 1995].This approach can have extra power because typically dierent literals may be relevant<strong>in</strong> dierent parts of the state space. However there is an added cost to re-comb<strong>in</strong><strong>in</strong>gthe separate pieces unless they happen to decompose very cleanly. L<strong>in</strong> and Deanassume the partition of the state space is given by some external oracle.Boutilier et al. extend modied policy iteration to propose a technique calledstructured policy iteration that makes use of a structured action representation <strong>in</strong>the form of 2-stage Bayesian networks [Boutilier, Dearden, & Goldszmidt 1995]. Therepresentation of the policy and utility functions are also structured <strong>in</strong> their approach,us<strong>in</strong>g decision trees. In standard policy iteration, the value of the candidate policyis computed on each iteration by solv<strong>in</strong>g a system of jSj l<strong>in</strong>ear equations (step 2 <strong>in</strong>Table 2.1, which is computationally prohibitive for large real-world plann<strong>in</strong>g problems.Modied policy iteration replaces this step with an iterative approximation of thevalue function V by a series of value functions V 0 ;V 1 ;... given byV i (s) =R(s)+ X u2S((s);s;u)V i,1 (u)Stopp<strong>in</strong>g criteria are given <strong>in</strong> [Puterman 1994].In structured policy iteration, the value function is aga<strong>in</strong> built <strong>in</strong> a series of approximations,but <strong>in</strong> each one it is represented as a decision tree over the doma<strong>in</strong>literals. Similarly the policy is built up as a decision tree. On each iteration, newliterals might be added to these trees as a result of exam<strong>in</strong><strong>in</strong>g the literals mentioned<strong>in</strong> the action specication and utility function R. In this way the algorithm avoidsexplicitly enumerat<strong>in</strong>g the state space.Similar work has also been done with partially-observable Markov decision processesor pomdps, <strong>in</strong> which the assumption of complete observability is relaxed. Ina pomdp there is a set of observation labels O and a set of conditional probabilitiesP (oja; s);o 2O;a 2A;s 2 S, such that if the system makes a transition to state swith action a it receives the observation label o with probability P (oja; s). Cassandraet al. <strong>in</strong>troduce the witness algorithm for solv<strong>in</strong>g pomdps [Cassandra, Kaelbl<strong>in</strong>g, &Littman 1994]. A standard technique for nd<strong>in</strong>g an optimal policy for a pomdp is toconstruct the mdp whose states are the belief states of the orig<strong>in</strong>al pomdp, ieeachstate is a probability distribution over states <strong>in</strong> the pomdp, with beliefs ma<strong>in</strong>ta<strong>in</strong>edbased on the observation labels us<strong>in</strong>g Bayes' rule. A form of value iteration can beperformed <strong>in</strong> this space mak<strong>in</strong>g use of the fact that each nite-horizon policy will beconvex and piecewise-l<strong>in</strong>ear. The witness algorithm <strong>in</strong>cludes an improved techniquefor updat<strong>in</strong>g the basis of the convex value function on each iteration. Parr and Russelluse a smooth approximation of the value function that can be updated with gradient


18 Chapter 2. Related workdescent [Parr & Russell 1995]. Brafman <strong>in</strong>troduces a grid-based method <strong>in</strong> [Brafman1997].Although work on pomdps is promis<strong>in</strong>g, it is still prelim<strong>in</strong>ary, and the largestcompletely solved pomdps have about 10 states [Brafman 1997].


Chapter 3<strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>In this chapter I provide an overview of the prodigy 4.0 plann<strong>in</strong>g algorithm anddiscuss extensions made to allow it to create and evaluate plans <strong>under</strong> conditions ofuncerta<strong>in</strong>ty. I present extensions to prodigy 4.0's model to represent uncerta<strong>in</strong>ty<strong>in</strong>Section 3.2. I give the mean<strong>in</strong>g of the elements of this language <strong>in</strong> terms of a Markovdecision process <strong>in</strong> Section 3.3. I illustrate the chapter with an example doma<strong>in</strong>, aplan and the doma<strong>in</strong>'s derived mdp <strong>in</strong> Section 3.4.3.1 Overview of prodigy 4.0prodigy 4.0 takes as <strong>in</strong>put a representation of a plann<strong>in</strong>g problem and attempts toproduce a solution. The problem consists of the allowable object types, the actionsthat can be taken, a set of specic objects, an <strong>in</strong>itial state and a goal state description.A solution, if found, will be a sequence of operators that can be applied to the <strong>in</strong>itialstate to produce some state satisfy<strong>in</strong>g the goal description. prodigy 4.0 uses meansendsanalysis to search for plans. In this section I give a detailed description of thelanguage prodigy 4.0 uses to represent plann<strong>in</strong>g problems, how it constructs a planand how it searches a space of partial plans to nd a solution.3.1.1 Doma<strong>in</strong>s, problems and plansA plann<strong>in</strong>g doma<strong>in</strong> <strong>in</strong> prodigy 4.0 consists of a type hierarchy T , a set of literalsL and a set of operators O 1 , based on the strips doma<strong>in</strong> denition [Fikes & Nilsson1971]. Each operator is dened by specify<strong>in</strong>g its preconditions and eects. Thepreconditions specify conditions about the world that must be true <strong>in</strong> order for theoperator to be applied, and are made of comb<strong>in</strong>ations of literal terms, that may<strong>in</strong>clude typed variables and may be comb<strong>in</strong>ed us<strong>in</strong>g conjunction, disjunction, negationand existential and universal quantication. The eects specify literals that are either1 A doma<strong>in</strong> also conta<strong>in</strong>s <strong>in</strong>ference rules, which are very similar to operators but add <strong>in</strong>ferencesto the state rather than chang<strong>in</strong>g it. Full details can be found <strong>in</strong> [Carbonell et al. 1992].19


20 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>added to (made true <strong>in</strong>) the state or deleted from (made false <strong>in</strong>) the state. Theliterals <strong>in</strong> the eects may mention the same variables that were mentioned <strong>in</strong> thepreconditions, or new variables, <strong>in</strong> which case universal quantication is <strong>under</strong>stood.Conditional eects may also be specied, <strong>in</strong> which the conditional test has the sameform as the preconditions of the operator. An example of an operator is shown <strong>in</strong>Figure 3.1 and is discussed after a brief description of a plann<strong>in</strong>g problem. Fulldetails on the syntax of plann<strong>in</strong>g doma<strong>in</strong>s and problems <strong>in</strong> prodigy 4.0 can befound <strong>in</strong> [Carbonell et al. 1992].Move-Barge(Operator(preconds(( Barge)( Place)( (and Place ( di ))))(at ))(effects ()((del (at ))(add (at )))))Figure 3.1: An operator that can move an object of type Barge from the locationbound to variable to the location bound to variable . Variables have anassociated type and can also be constra<strong>in</strong>ed with functions.A plann<strong>in</strong>g problem consists of a plann<strong>in</strong>g doma<strong>in</strong>, a set of objects each of whichbelongs to one of the types <strong>in</strong> the doma<strong>in</strong>, an <strong>in</strong>itial state which is specied by giv<strong>in</strong>gall the literals that are true <strong>in</strong> the state and a goal description. The goal descriptionis a comb<strong>in</strong>ation of literals formed <strong>in</strong> the same way as the precondition of an operator,except that there may be no free variables | all the variables <strong>in</strong> the goal expressionmust be either existentially or universally quantied. The literals specify<strong>in</strong>g the <strong>in</strong>itialstate may conta<strong>in</strong> no variables.Figure 3.1 shows a simple operator from a doma<strong>in</strong> that will be developed as anexample <strong>in</strong> this chapter as well as later chapters <strong>in</strong> the thesis. The operator hasthree variables, , and 2 , and their types are specied <strong>in</strong> thepreconditions slot: is of type Barge and and are both of typePlace. In addition to mention<strong>in</strong>g a type, the specication of the variable usesa function to further restrict the possible values for the variable. The function call(diff ) must also be true | this function is true when the twovariableshave dierent values.The second part of the preconditions slot consists of the precondition statement,which <strong>in</strong> this case is the s<strong>in</strong>gle literal (at ). An<strong>in</strong>stantiation of thisoperator consists of the operator along with an object from the plann<strong>in</strong>g problem2 Angle brackets are used to denote variables, such as, <strong>in</strong>prodigy 4.0.


3.1. Overview of prodigy 4.0 21associated with each variable <strong>in</strong> the operator's preconditions. The object's type mustmatch the variable's type and the set of objects must agree with all the functionsused <strong>in</strong> the variable specications. The <strong>in</strong>stantiated operator is applicable <strong>in</strong> a stateif the literal made from its preconditions statement is true <strong>in</strong> that state.If an <strong>in</strong>stantiated operator is applied to a state, a new state is produced correspond<strong>in</strong>gto the result of apply<strong>in</strong>g the operator <strong>in</strong> the orig<strong>in</strong>al state. The new stateis specied as follows. First, the test correspond<strong>in</strong>g to each conditional eect set isevaluated <strong>in</strong> the state. If it is true, the correspond<strong>in</strong>g conditional eects are treatedas regular eects, otherwise they are discarded. Next the literals mentioned <strong>in</strong> eachadd or del statement are exam<strong>in</strong>ed for variables, and any variables mentioned <strong>in</strong> thepreconditions slot are replaced with their object values from the <strong>in</strong>stantiation of theoperator. If there are variables that are not mentioned <strong>in</strong> the preconditions slot, theyare treated as universally quantied. For each such variable, a variable specicationmust be given <strong>in</strong> the eects slot, mention<strong>in</strong>g a type and optionally some b<strong>in</strong>d<strong>in</strong>gsfunctions <strong>in</strong> the same way as <strong>in</strong> the preconditions slot. Each literal <strong>in</strong> the eects thatmentions one of these variables is replaced with a set of literals, correspond<strong>in</strong>g toevery object <strong>in</strong> the plann<strong>in</strong>g problem that matches the variable specication. Afterthe literals <strong>in</strong> the eects are all translated <strong>in</strong>to one or more literals that mentionno variables, the state transformation is made. First the literals mentioned <strong>in</strong> delstatements are made false <strong>in</strong> the state. Next the literals mentioned <strong>in</strong> add statementsare made true <strong>in</strong> the state. Any literal that is not mentioned <strong>in</strong> the eects of the<strong>in</strong>stantiated operator has the same truth value <strong>in</strong> the new state as <strong>in</strong> the old state.For example, suppose a plann<strong>in</strong>g problem uses a doma<strong>in</strong> that <strong>in</strong>cludes the Move-Bargeoperator as shown, and <strong>in</strong>cludes the objects barge1 of type Barge and Richmond andOakland of type Place. If the <strong>in</strong>itial state has the true literal (at barge1 Richmond),then the <strong>in</strong>stantiated operator(Move-Barge =barge1 =Richmond =Oakland)will be applicable <strong>in</strong> the <strong>in</strong>itial state. Apply<strong>in</strong>g it would lead to a new state, <strong>in</strong> which(at barge1 Richmond) would be false and (at barge1 Oakland) would be true,with all other literals unchanged.A plan <strong>in</strong> prodigy 4.0 is a totally-ordered sequence of <strong>in</strong>stantiated operators(O 1 ;O 2 ;...O n ). The plan can be re-formulated <strong>in</strong> a partial order after creation ifdesired [Veloso 1992]. I will refer to an <strong>in</strong>stantiated operator <strong>in</strong> a plan as a step <strong>in</strong>the plan. A plan is a solution to the plann<strong>in</strong>g problem if:1. O 1 is applicable <strong>in</strong> the <strong>in</strong>itial state and apply<strong>in</strong>g it leads to the state S 1 ,2. each step O i for 2 i n is applicable <strong>in</strong> the state S i,1 and apply<strong>in</strong>g it leadsto the state S i and3. the goal description is satised <strong>in</strong> state S n .For example, given the <strong>in</strong>itial state above and the goal (at barge1 Oakland), theplan consist<strong>in</strong>g of the s<strong>in</strong>gle step (Move-Barge =barge1 =Richmond=Oakland) is a solution for this plann<strong>in</strong>g problem.


22 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>3.1.2 Construct<strong>in</strong>g plansTypically, however, a plan consists of a large number of steps and construct<strong>in</strong>g aplan is not a trivial exercise. prodigy 4.0 constructs plans us<strong>in</strong>g a technique calledmeans-ends analysis, <strong>in</strong> which it uses the dierence between the current state and itsgoals to decide which steps to <strong>in</strong>clude <strong>in</strong> a plan. In order to illustrate how plans areconstructed, I <strong>in</strong>troduce two more operators <strong>in</strong>to the barge doma<strong>in</strong> and consider aproblem with a slightly harder goal. Figure 3.2 shows the operators Pump-Oil andUnload-Oil, which can be used respectively to pump oil from a vessel at sea <strong>in</strong>to abarge and to unload the oil from the barge when it is at a dock. I also <strong>in</strong>troduce twonew types, which are specializations or subtypes of the type Place that already exists<strong>in</strong> the doma<strong>in</strong>.Pump-OilUnload-Oil(operator(operator(preconds (( Barge)(preconds (( Barge)( Sea-Sector))( Dock))(and (at )(and (at )(oil-<strong>in</strong>-tanker )))(oil-<strong>in</strong>-barge )))(effects ()(effects ()((add (oil-<strong>in</strong>-barge ))((del (oil-<strong>in</strong>-barge ))(del (oil-<strong>in</strong>-tanker ))))) (add (disposed-oil)))))Figure 3.2: Two more operators <strong>in</strong> the example doma<strong>in</strong>.The new plann<strong>in</strong>g problem <strong>in</strong>cludes the objects barge1 of type Barge, Richmond oftype Dock and west-coast of type Sea-Sector. The type hierarchy and the objects<strong>in</strong> the plann<strong>in</strong>g problem are shown <strong>in</strong> Figure 3.3. The problem has an <strong>in</strong>itial state<strong>in</strong> which the literals (at barge1 Richmond) and (oil-<strong>in</strong>-tanker west-coast) arethe only true literals, and the goal (disposed-oil). This can be achieved by thefollow<strong>in</strong>g four-step plan:(Move-Barge =barge1 =Richmond =west-coast)(Pump-Oil =barge1 =west-coast)(Move-Barge =barge1 =west-coast =Richmond)(Unload-Oil =barge1 =Richmond)prodigy 4.0 searches a space of candidate plans, <strong>in</strong> which each candidate planis represented by two structures, a head plan and a tail plan [F<strong>in</strong>k & Veloso 1995].The head plan is a totally-ordered sequence of steps that can be applied to the <strong>in</strong>itialstate. The tail plan is a partially-ordered set of steps, that is, a directed acyclic graph<strong>in</strong> which there is an arc from step a to step b if a was added to the tail plan <strong>in</strong> order tosatisfy some precondition of b. If a step was added to satisfy a goal that is part of theplann<strong>in</strong>g problem itself, then there is an arc from that step to the special node Root


3.1. Overview of prodigy 4.0 23Barge(barge1)PlaceDock(Richmond)Sea-Sector(west-coast)Figure 3.3: The type hierarchy for the plann<strong>in</strong>g doma<strong>in</strong>. Objects <strong>in</strong> the exampleplann<strong>in</strong>g problem are shown <strong>in</strong> brackets below their type.1. Move-Bargebarge1Richmond west-coastPump-Oilbarge1west-coastMove-Bargebarge1west-coastRichmondUnload-Oilbarge1RichmondRootHead PlanTail PlanFigure 3.4: A candidate plan for the plann<strong>in</strong>g problem to dispose of the oil. Thevariable names have been ommitted from the <strong>in</strong>stantiated operators. The head planconta<strong>in</strong>s one step, to move the barge from Richmond to the west coast.at the top of the partial order. If the head plan is applied to the <strong>in</strong>itial state, it leadsto a new state called the current state. This current state is used to guide plann<strong>in</strong>g<strong>in</strong> two ways. First, no steps are added to the tail plan to achieve subgoals that arealready achieved <strong>in</strong> the current state. Second, steps are preferred if they have fewerpreconditions that are not true <strong>in</strong> the current state. Both of these guidance rules areheuristics, and may delay nd<strong>in</strong>g a solution. They typically speed the search process,however. Figure 3.4 shows an example candidate plan that might be generated <strong>in</strong> thesearch for a solution to the current problem.prodigy 4.0 beg<strong>in</strong>s its search with the null plan, <strong>in</strong> which the head plan isempty and the tail plan consists of the s<strong>in</strong>gle node Root. It can expand a candidateplan <strong>in</strong> one of two ways: it can either add a new step to the tail plan to achievesome open precondition, or it can move a step from the tail plan to the head plan.A step that is moved to the head plan must be applicable <strong>in</strong> the current state. Itis added to the end of the head plan, yield<strong>in</strong>g a new current state. More details onthe representation of plans and the search techniques used <strong>in</strong> prodigy 4.0 can befound <strong>in</strong> [F<strong>in</strong>k & Veloso 1995; Veloso et al. 1995; Veloso & Stone 1995].For example, <strong>in</strong> Figure 3.4, the step with operator Pump-Oil is applicable andcould be moved to the head plan. Mov<strong>in</strong>g it would yield a new current state,


24 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong><strong>in</strong> which (oil-<strong>in</strong>-barge barge1) is true. There is only one other way that theplan <strong>in</strong> Figure 3.4 can be expanded, which isbymov<strong>in</strong>g the step (Move-Barge=west-coast =Richmond) to the head plan. However, this candidateplan would <strong>in</strong>clude a state loop, the situation where the exact same state is visitedtwice while the head plan is executed. This plan is therefore pruned from the searchspace, s<strong>in</strong>ce if it leads to a solution there must be another more ecient solution thatavoids the state loop. The plan <strong>in</strong> Figure 3.4 cannot be expanded by add<strong>in</strong>g a stepto the tail plan because there are no open preconditions <strong>in</strong> the tail plan.A summary of the prodigy 4.0 algorithm is shown <strong>in</strong> Table 3.1 (taken fromal. [Veloso et al. 1995]).prodigy 4.0(G,I)1. Current state C := <strong>in</strong>itial state I,Head-Plan := null,Tail-Plan := null. 2. If the goal statement G is satised <strong>in</strong> the current state C, thenreturn Head-Plan.3. Either(A) Back-Cha<strong>in</strong>er adds an operator to the Tail-Plan, or(B) Operator-Application moves an operator from Tail-Plan to Head-Plan.Decision po<strong>in</strong>t: Decide whether to apply an operator or to add an operator to thetail.4. Goto 2.Operator-Application1. Pick an operator op <strong>in</strong> Tail-Plan such that(A) there is no operator <strong>in</strong> Tail-Plan ordered before op, and(B) the preconditions of op are satised <strong>in</strong> the current state C.Decision po<strong>in</strong>t: Choose an operator to apply.2. Move op to the end of Head-Plan and update the current state C.Table 3.1: Summary of the prodigy 4.0 plann<strong>in</strong>g algorithm.po<strong>in</strong>t is a potential backtrack<strong>in</strong>g po<strong>in</strong>t <strong>in</strong> the algorithm.Each decision3.1.3 Organization of the search spaceTable 3.1 gives a nondeterm<strong>in</strong>istic algorithm to construct a plan for a given plann<strong>in</strong>gproblem. prodigy 4.0 ma<strong>in</strong>ta<strong>in</strong>s an audit trail of the search if performs whileconstruct<strong>in</strong>g the plan, called the search tree. A description of the search tree isuseful for expla<strong>in</strong><strong>in</strong>g some extensions made to prodigy 4.0 to support conditionalplann<strong>in</strong>g and plann<strong>in</strong>g with external events. It is also used <strong>in</strong> Chapter 4 to describehow Weaver <strong>in</strong>teracts with prodigy 4.0.The nodes <strong>in</strong> the search tree represent the choices made at each decision po<strong>in</strong>t ofthe algorithm, and are typically of four types. An applied operator node represents


3.2. A representation for plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty 25the choice to move a particular step from the tail plan to the head plan. As analternative tomov<strong>in</strong>g a step to the head plan, a goal node represents the decision tofocus on an open condition <strong>in</strong> the tail plan (also known as a goal). The children of agoal node must all be operator nodes, which represent the decision to use a particularoperator schema to achieve the goal represented by the parent node. The children ofan operator node must all be b<strong>in</strong>d<strong>in</strong>gs nodes, which represent the decision to use aparticular set of b<strong>in</strong>d<strong>in</strong>gs to <strong>in</strong>stantiate the operator represented by the parent node.The comb<strong>in</strong>ation of a goal node, an operator node and a b<strong>in</strong>d<strong>in</strong>gs node togetherrepresent the act of prodigy 4.0 add<strong>in</strong>g a step <strong>in</strong>to the tail plan.While the tail plan and the head plan together represent one candidate plan, thesearch tree represents all the candidate plans that have been exam<strong>in</strong>ed and also allthe ways that new candidate plans can be expanded. A path from the root node toany other node <strong>in</strong> the search tree corresponds to a candidate partial plan, which canbe reconstructed by follow<strong>in</strong>g the decisions encoded <strong>in</strong> the nodes <strong>in</strong> the path. Forexample, the candidate plan <strong>in</strong> Figure 3.4 is actually generated by prodigy 4.0 asthe sequence of search tree nodes shown <strong>in</strong> Table 3.2. In this sequence, each nodeisthe child of the node <strong>in</strong> the l<strong>in</strong>e above 3 , so the candidate plan is produced withoutbacktrack<strong>in</strong>g over any choices. Other sequences of search nodes could produce thecandidate plan just as well: <strong>in</strong> particular the order <strong>in</strong> which prodigy 4.0 works onthe goals (oil-<strong>in</strong>-barge barge1) and (at barge1 Richmond) doesn't matter. Thechoice of this particular sequence was largely due to search heuristics that prodigy4.0 uses, which will not be discussed here (but see [Blythe & Veloso 1992] and[Carbonell et al. 1992]).3.2 A representation for plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>tyIn Section 3.1 I provided a brief description of plann<strong>in</strong>g doma<strong>in</strong>s, plann<strong>in</strong>g problemsand plans <strong>in</strong> prodigy 4.0. Here I extend those denitions to the versions usedby Weaver, <strong>in</strong> which plann<strong>in</strong>g doma<strong>in</strong>s and problems also conta<strong>in</strong> <strong>in</strong>formation aboutuncerta<strong>in</strong>ty <strong>in</strong> the doma<strong>in</strong>, and plans can specify a number of cont<strong>in</strong>gencies. Inaddition to operators, which can have more than one possible outcome, plann<strong>in</strong>gdoma<strong>in</strong>s <strong>in</strong> Weaver also conta<strong>in</strong> exogenous events, which specify ways that the worldcan be changed <strong>in</strong>dependently of the actions <strong>in</strong> a plan.In order to give a precise characterisation of the problems that Weaver can representand solve, I describe Weaver's representation scheme <strong>in</strong> terms of Markov decisionprocesses. Specically, I describe how to take a plann<strong>in</strong>g problem <strong>in</strong> Weaver's languageand construct a Markov decision process M which is equivalent <strong>in</strong> the sense thatif there is a non-loop<strong>in</strong>g policy for M with an expected value greater than 0 then there3 except n5 which is the child of a special node that prodigy 4.0 uses to add the top-level goalsto its list of goals.


26 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>n5 Goal (disposed-oil) top-leveln6 Operator unload-oiln7 B<strong>in</strong>d<strong>in</strong>gs n8 Goal (oil-<strong>in</strong>-barge barge1) for n7n9 Operator pump-oiln10 B<strong>in</strong>d<strong>in</strong>gs n11 Goal (at barge1 west-coast) for n10n12 Operator move-bargen13 B<strong>in</strong>d<strong>in</strong>gs n14 Apply apply n13n16 Goal (at barge1 richmond) for n7n17 Operator move-bargen18 B<strong>in</strong>d<strong>in</strong>gs Table 3.2: An annotated trace of prodigy 4.0 which shows a sequence of searchtree nodes created to produce the candidate plan shown <strong>in</strong> Figure 3.3. Each nodeexcept n5 isachild of the node <strong>in</strong> the l<strong>in</strong>e above. The text <strong>in</strong> italics has been addedto the trace output for clarity of presentation.is a plan <strong>in</strong> the orig<strong>in</strong>al problem with probability of success equal to that expectedvalue, and conversely if there is a plan then there is such a policy. A description ofMarkov decision processes was given <strong>in</strong> Section 2.4.Recall that <strong>in</strong> prodigy 4.0, a plann<strong>in</strong>g doma<strong>in</strong> consists of a type hierarchy T ,aset of literals L and a set of operators O. A plann<strong>in</strong>g problem consists of a plann<strong>in</strong>gdoma<strong>in</strong>, a set of objects belong<strong>in</strong>g to each type <strong>in</strong> the hierarchy, an <strong>in</strong>itial state anda goal description. Each literal <strong>in</strong> the doma<strong>in</strong> has a type signature, specify<strong>in</strong>g thenumber of arguments and the type of each argument, which is a member of T. Giventhe objects <strong>in</strong> the plann<strong>in</strong>g problem, the set of all possible ground literals L can beconstructed by ll<strong>in</strong>g <strong>in</strong> the arguments of each literal's type signature with all objectsthat match, which are those objects of the exact typeorany subtype of the type <strong>in</strong>the signature.Consider the oil-spill plann<strong>in</strong>g problem that has been used as an example <strong>in</strong> theprevious sections. The doma<strong>in</strong> <strong>in</strong>cludes the literal at with signature (at BargePlace). In the plann<strong>in</strong>g problem, there are two objects of type Barge, barge1 andbarge2 and two objects belong<strong>in</strong>g to subtypes of the type Place, Richmond of typeDock and west-coast of type Sea-Sector. Therefore there are four ground literalsderived from at <strong>in</strong> the set L: (at barge1 Richmond), (at barge1 west-coast),(at barge2 Richmond) and (at barge2 west-coast).A state <strong>in</strong> a plann<strong>in</strong>g problem <strong>in</strong> prodigy 4.0 assigns a value of true or falseto each literal <strong>in</strong> the ground literals L. Therefore there are 2 jLj possible states. Someof these assignments may not correspond to valid states <strong>in</strong> the plann<strong>in</strong>g doma<strong>in</strong> be<strong>in</strong>gmodelled. For example, (at barge1 Richmond) and (at barge1 west-coast)


3.2. A representation for plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty 27cannot both be true at the same time. However s<strong>in</strong>ce any state that is reachableby a sequence of actions <strong>in</strong> the doma<strong>in</strong> from a valid <strong>in</strong>itial state will also be valid,these <strong>in</strong>valid states are not an issue <strong>in</strong> practice. This state property ofvalidity couldbe partially derived by consider<strong>in</strong>g the states reachable by apply<strong>in</strong>g actions to anyphysically possible <strong>in</strong>itial state, or it could be enforced by add<strong>in</strong>g doma<strong>in</strong> axioms suchas those used <strong>in</strong> [Knoblock 1991]. In what follows I will ignore this dist<strong>in</strong>ction.In Weaver the plann<strong>in</strong>g doma<strong>in</strong> is generalised as follows.1. The operators <strong>in</strong> the doma<strong>in</strong> <strong>in</strong>clude a duration which isan<strong>in</strong>teger-valued functionof the b<strong>in</strong>d<strong>in</strong>gs of the operator (and therefore a <strong>in</strong>teger for an <strong>in</strong>stantiatedaction or step). This <strong>in</strong>teger may represent any time unit, for example secondsor hours, although the unit must be the same for dierent operators <strong>in</strong> the samedoma<strong>in</strong>.2. Operators may specify a discrete, conditional probability distribution of possibleoutcomes rather than the s<strong>in</strong>gle possible outcome used <strong>in</strong> prodigy 4.0. Anexample of this will be described <strong>in</strong> more detail below.3. A plann<strong>in</strong>g doma<strong>in</strong> <strong>in</strong>cludes a set of exogenous events E as well as the set ofoperators O. These are syntactically very similar to operators but are used tospecify the way that the world can change <strong>in</strong>dependently of the actions taken <strong>in</strong>a plan, as I describe below. For example, they can be used to model the actionsof other agents or natural processes.4. A total precedence order < is given over the actions and events. This is used toresolve conicts between their eects if more than one action or event produceschanges to a state. An example is given below.Weaver generalises prodigy 4.0's denition of a plann<strong>in</strong>g problem by specify<strong>in</strong>ga probability distribution of possible <strong>in</strong>itial states rather than a s<strong>in</strong>gle <strong>in</strong>itial state.The problem also <strong>in</strong>cludes a threshold probability, , a m<strong>in</strong>imum probability of successthat a plan must equal or exceed to be considered a solution. The objects and goalstatement <strong>in</strong> the plann<strong>in</strong>g problem are unchanged.In the rest of this section I make the semantics of plann<strong>in</strong>g doma<strong>in</strong>s and problems<strong>in</strong> Weaver precise <strong>in</strong> terms of an <strong>under</strong>ly<strong>in</strong>g Markov decision process M dened bya plann<strong>in</strong>g problem. While this denition is needed to prove that Weaver correctlycomputes probabilities for plans and to discuss its coverage, on a casual read<strong>in</strong>g of thethesis it can be skipped and replaced with the follow<strong>in</strong>g summary: at each time step,several events may take place simultaneously with one action as a plan is executed.When more than one event or action complete <strong>in</strong> one time step, their results areapplied to the state <strong>in</strong> parallel. If more than one possible value is specied for someground literal <strong>in</strong> the state, the value nom<strong>in</strong>ated by the event or action that is highest<strong>in</strong> the pre-specied precedence order is used. Actions are usually higher than events<strong>in</strong> the precedence order.


28 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>Actions and exogenous eventsWeaver operators are very similar to operators <strong>in</strong> prodigy 4.0, but <strong>in</strong>clude a durationfunction and generalise from a s<strong>in</strong>gle outcome to multiple possible outcomes, oneof which will take place when an <strong>in</strong>stantiated operator is executed. Thus each operatorhas an eect distribution function, edf() that takes a state as <strong>in</strong>put and returnsa probability distribution of eects, that is, a nite set edf(s) =f(p i ; i )j1 i ngfor some n, where each i is a list of add and delete statements denot<strong>in</strong>g an eect setas <strong>in</strong> the operators of prodigy 4.0, each p i is a positive number and P np i=1 i =1.The eect distribution function is represented as a b<strong>in</strong>ary tree whose <strong>in</strong>ternalnodes are labelled with literals mention<strong>in</strong>g variables from the doma<strong>in</strong> and whose leafnodes are labelled with probability distributions of outcomes. Each <strong>in</strong>ternal node hasone out-arc labelled false and one labelled true. To nd the appropriate eect distributionfor a given state, the tree is traversed from the root node, tak<strong>in</strong>g the branchlabelled true or false respectively depend<strong>in</strong>g on whether the literal at the node istrue or false <strong>in</strong> the state. Figure 3.5 showsaversion of the Move-Barge operator fromthe oil-spill doma<strong>in</strong> used <strong>in</strong> the previous sections that has a simple branch<strong>in</strong>g eectdistribution function with one <strong>in</strong>ternal node labelled with (barge-ready ).In this case the two possible eect distributions have the same set of outcomes withdierent probabilities, but this need not be true <strong>in</strong> general. The probability valuesare part of the doma<strong>in</strong> denition, assigned externally to Weaver. Deriv<strong>in</strong>g themus<strong>in</strong>g mach<strong>in</strong>e learn<strong>in</strong>g or knowledge acquisition techniques would make an feasibleextension but is not covered <strong>in</strong> this thesis.Informally, when an <strong>in</strong>stantiated operator is applied <strong>in</strong> some state s <strong>in</strong> a doma<strong>in</strong>with no exogenous events, the result<strong>in</strong>g state is determ<strong>in</strong>ed probabilistically as follows.First the eect distribution function is called to nd the eect distribution for s,edf(s). Then an eect set is chosen at random, with the probabilities given <strong>in</strong> theeect distribution. F<strong>in</strong>ally this eect set is applied to the state <strong>in</strong> the same way thatthe s<strong>in</strong>gle outcome of an <strong>in</strong>stantiated operator <strong>in</strong> prodigy 4.0 is applied to thestate. For example, if the <strong>in</strong>stantiated-operator(Move-Barge =barge1 =Richmond =west-coast) is applied <strong>in</strong>any state <strong>in</strong> which (at barge1 Richmond) and (operational barge1) are true but(barge-ready barge1) is false, (operational barge1) will be true <strong>in</strong> the result<strong>in</strong>gstate with probability 0.1 and will be false with probability 0.9, while (at barge1Richmond) will be false and (at barge1 west-coast) will be true with certa<strong>in</strong>ty.The situation is complicated signicantly by the presence of exogenous events <strong>in</strong>the plann<strong>in</strong>g doma<strong>in</strong>. S<strong>in</strong>ce events specify ways the world can change <strong>in</strong>dependentlyof an action taken <strong>in</strong> a plan, the result<strong>in</strong>g state when an action is taken depends onwhat events may take eect between the start and end of the action. S<strong>in</strong>ce some ofthese events may have begun before the action is taken, at least some aspects of thehistory of the world are needed, and <strong>in</strong> order to model such events it is necessaryto <strong>in</strong>troduce a notion of time. A variety of theoretical models <strong>in</strong> the AI literaturecan handle exogenous events and are <strong>in</strong> the same spirit as the situation calculus on


3.2. A representation for plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty 29Move-Barge(Operator(duration (/ (distance ) (speed )))(preconds(( Barge)( Place)( (and Place ( di ))))(at ))(effects(branch (barge-ready )((0.667 ()((add (at ))(del (at ))))(0.333 ()((add (at ))(del (at ))(del (operational )))))((0.1 ()((add (at ))(del (at ))))(0.9 ()((add (at ))(del (at ))(del (operational ))))))))Figure 3.5: The Move-Barge action from the oil-spill doma<strong>in</strong> used <strong>in</strong> Weaver hasa duration and a probability distribution of possible outcomes.which the prodigy 4.0 representation rests e.g. [Allen et al. 1991; Kartha 1995;Shanahan 1995; Baral 1995].In this thesis I have chosen a simple and restricted language for events that is stilladequate for plann<strong>in</strong>g with some <strong>in</strong>terest<strong>in</strong>g doma<strong>in</strong>s. While some approaches, basedon circumscription, are <strong>in</strong>dependent of an <strong>under</strong>ly<strong>in</strong>g model of the language, my approachis based on the adoption of a Markov decision process as an <strong>under</strong>ly<strong>in</strong>g model.This approach was chosen partly because the exist<strong>in</strong>g techniques did not handle probabilisticreason<strong>in</strong>g well, although the random-worlds approach is promis<strong>in</strong>g [Bacchuset al. 1997], and partly to make it easier to compare the plann<strong>in</strong>g system developed<strong>in</strong> this thesis with other approaches to plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty, many of whichare based on Markov decision processes [Boutilier & Dearden 1994; Dean & L<strong>in</strong> 1995;Littman 1996].Exogenous events haveavery similar representation to operators. Figure 3.6 shows


30 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>an example of a simple exogenous event called Weather-Brightens. It can only takeeect <strong>in</strong> a state <strong>in</strong> which (poor-weather) is true, and its application leads to a state<strong>in</strong> which (poor-weather) is false and (fair-weather) is true. Unlike operators, theagent that executes plans is not free to apply this external event at will. The eventwill take place randomly <strong>in</strong> any state that satises its preconditions, with probabilitygiven <strong>in</strong> the probability slot of the event. Each event can take place any time thatits preconditions are satised, <strong>in</strong>dependently of its previous occurrences and with thesame probability.As an example, Figure 3.7 shows the set of states that can result from the given<strong>in</strong>itial state when the action(Move-Barge =barge1 =Richmond =west-coast) is executed <strong>in</strong>a doma<strong>in</strong> with Weather-Brightens as the sole external event, and the action has aduration of one time unit. The probability that the event takes place <strong>in</strong> this state is<strong>in</strong>dependent of the action outcome <strong>in</strong> the state and this fact is used to compute theprobabilities shown on each arc <strong>in</strong> the gure. The probability that (poor-weather) istrue on completion of the action, for example, is found by summ<strong>in</strong>g the probabilitiesof the two states on the left of the diagram, giv<strong>in</strong>g 0.75. However if the action wereto have a duration of two time units, the probability of(poor-weather) would bereduced to 0.5625, because the event might take place <strong>in</strong> either of two states whilethe action is executed.Weather-Brightens(event(probability 0.25)(duration 1)(preconds ()(poor-weather))(effects ()((del (poor-weather))(add (fair-weather)))))Figure 3.6: An exogenous event.Suppose that the event and action may be <strong>in</strong> conict over the value of someliteral. For example, suppose that <strong>in</strong> some of its possible outcomes, Move-Bargedeletes (fair-weather) and adds (poor-weather), perhaps because of pollution. Inthe cases where both the event and action would aect these literals, the precedenceorder over events and operators is used to decide their values. If Move-Barge comesbefore Weather-Brightens <strong>in</strong> the order, (fair-weather) will be false, otherwise itwill be true. More sophisticated schemes for reason<strong>in</strong>g about simultaneous events arepossible, such as the one proposed <strong>in</strong> [Shanahan 1995].


3.3. A Markov decision process model of uncerta<strong>in</strong>ty 31(at barge1 Richmond)(operational barge1)(poor-weather)True <strong>in</strong> all states:(distance Richmond west-coast 1)(ready barge1)0.5 0.250.167 0.083(at barge1 west-coast) (at barge1 west-coast) (at barge1 west-coast) (at barge1 west-coast)(operational barge1)(operational barge1)(poor-weather)(poor-weather)(fair-weather)(fair-weather)Figure 3.7: The set of states reachable when the action Move-Barge is executedfrom the <strong>in</strong>itial state shown, when Weather-Brightens is the only exogenous event<strong>in</strong> the doma<strong>in</strong> and the action has a duration of one time unit. The probability ofreach<strong>in</strong>g each state is shown along its arc.3.3 A Markov decision process model of uncerta<strong>in</strong>ty.In the previous section I showed the result when an action is applied and one eventwith the same duration may take place simultaneously. In general, actions and eventsmay have dierent durations and events may take eect after an action beg<strong>in</strong>s butbefore it ends. Multiple events may take place at the same time, and they may haveeects that alter the same literals <strong>in</strong> dierent ways. In order to precisely state theprobability distributions of state histories that can result from a sequence of actions<strong>in</strong> a plann<strong>in</strong>g problem, I <strong>in</strong>troduce its <strong>under</strong>ly<strong>in</strong>g Markov decision process (MDP).Recall from Chapter 2, on related work, that an MDP is dened by its statespace, its state transition function and its reward function. A state <strong>in</strong> the state spaceof the <strong>under</strong>ly<strong>in</strong>g MDP M conta<strong>in</strong>s more <strong>in</strong>formation than a state <strong>in</strong> the plann<strong>in</strong>gproblem, s<strong>in</strong>ce it conta<strong>in</strong>s a limited history of actions and exogenous events that are<strong>in</strong> progress. This is because the MDP models the <strong>in</strong>teraction of actions and eventsthat have dierent lengths and whose time <strong>in</strong>tervals may overlap, and it is useful tochoose the state space and transition function such that each successor state occurs axed unit of time after the predecessor state. For this reason, a state <strong>in</strong> M conta<strong>in</strong>s ahistory of the events and actions tak<strong>in</strong>g place <strong>in</strong> the world as well as the current statedescribed <strong>in</strong> terms of literals from the plann<strong>in</strong>g problem. Individual state transitionsthen correspond to the passage of a unit of time rather than an action or event viewedas an atomic unit.The action and event history conta<strong>in</strong>ed <strong>in</strong> a state <strong>in</strong> M consists of a set of pairsn: where n is an <strong>in</strong>teger denot<strong>in</strong>g the time before the action or event will take place,and is an eect set, a list of add and delete statements that specify the changes to


32 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>be made to the state when the action or event will take place. There is no dist<strong>in</strong>ction<strong>in</strong> the history list between actions and events, s<strong>in</strong>ce once they are added to the statethere is no need for a dist<strong>in</strong>ction. These pairs are called pend<strong>in</strong>g eects.Figure 3.8 shows the sequence of states <strong>in</strong> M that is traversed if a determ<strong>in</strong>isticversion of Move-Barge is executed <strong>in</strong> the beg<strong>in</strong>n<strong>in</strong>g state. In each state, literals fromthe state space of the plann<strong>in</strong>g problem are shown above the dotted l<strong>in</strong>e, while belowthe l<strong>in</strong>e is shown a list of pend<strong>in</strong>g eects. The <strong>in</strong>itial state and the two nal states<strong>in</strong> the diagram have no pend<strong>in</strong>g eects, while each <strong>in</strong>termediate state has exactlyone pend<strong>in</strong>g eect. When the action is performed <strong>in</strong> the left-most state, it gives riseto two possible successor states, with dierent pend<strong>in</strong>g eects correspond<strong>in</strong>g to thedierent possible outcomes of the action. S<strong>in</strong>ce the length of the action is the same<strong>in</strong> each outcome, the count-down on the pend<strong>in</strong>g eects <strong>in</strong> each state is the same,3. Whenever the pend<strong>in</strong>g eect has a count-down higher than 1, the successor statesimply decrements the count-down. When the count-down reaches 1, the pend<strong>in</strong>geect is not present <strong>in</strong> the successor state, but the eects themselves are applied to theportion of the MDP state correspond<strong>in</strong>g to the literals <strong>in</strong> the plann<strong>in</strong>g problem. Thus<strong>in</strong> the nal states, (at barge1 west-coast) is true and (at barge1 Richmond) isfalse. In the lower of the two nal states, (operational barge1) has become false,while it is still true <strong>in</strong> the upper nal state.σ= (del (at barge1 Richmond))(add (at barge1 west-coast))(at barge1 Richmond)(operational barge1)(speed barge1 1)(ready barge1)(distance Richmond west-coast 4)(at barge1 Richmond)(operational barge1)(speed barge1 1)(ready barge1)(distance Richmond west-coast 4)(at barge1 Richmond)(operational barge1)(speed barge1 1)(ready barge1)(distance Richmond west-coast 4)(at barge1 west-coast)(operational barge1)(speed barge1 1)(ready barge1)(distance Richmond west-coast 4)(at barge1 Richmond)(operational barge1)(speed barge1 1)(ready barge1)(distance Richmond west-coast 4)2/31/33: σ2: σ1: σ(at barge1 Richmond)(operational barge1)(at barge1 Richmond)(operational barge1)(at barge1 Richmond)(operational barge1)(at barge1 west-coast)NOT (operational barge1)(speed barge1 1)(ready barge1)(distance Richmond west-coast 4)(speed barge1 1)(ready barge1)(distance Richmond west-coast 4)(speed barge1 1)(ready barge1)(distance Richmond west-coast 4)(speed barge1 1)(ready barge1)(distance Richmond west-coast 4)υ=(del (at barge1 Richmond))(add (at barge1 west-coast))(del (operational barge1))3: υ2: υ1: υFigure 3.8: The two possible sequences of states <strong>in</strong> the <strong>under</strong>ly<strong>in</strong>g Markov decisionprocess aris<strong>in</strong>g from execut<strong>in</strong>g the Move-Barge operator <strong>in</strong> the state shown on theleft <strong>in</strong> the absence of exogenous events.The pend<strong>in</strong>g eects <strong>in</strong> Figure 3.8 were added by choos<strong>in</strong>g the action Move-Barge<strong>in</strong> the <strong>in</strong>itial state. Exogenous events are also modelled by add<strong>in</strong>g pend<strong>in</strong>g eects tothe state <strong>in</strong> the MDP. For example, Figure 3.9 shows the states aris<strong>in</strong>g from tak<strong>in</strong>gthe same action <strong>in</strong> an <strong>in</strong>itial state that also has no pend<strong>in</strong>g eects but <strong>in</strong>cludesthe literal (poor-weather) and <strong>in</strong> a doma<strong>in</strong> that also <strong>in</strong>cludes the exogenous eventWeather-Brightens shown <strong>in</strong> Figure 3.6, but with a duration of 2 time units. S<strong>in</strong>cethe presence of the exogenous event leads to considerably more states, I only show thedoma<strong>in</strong>-level state features that dier from those of the parent state. The probabilitythat each state is reached is shown above the state. The probabilities of the state


3.3. A Markov decision process model of uncerta<strong>in</strong>ty 33transitions are calculated from the event and action outcome probabilities, which areconditionally <strong>in</strong>dependent given the states <strong>in</strong> which the correspond<strong>in</strong>g pend<strong>in</strong>g eectsbegan.0.50.3750.2810.211σ= (del (at barge1 Richmond))(add (at barge1 west-coast))3: σ0.752: σ0.751: σ0.75(at barge1 west-coast)ε= (del (poor-weather)).05(add (fair-weather))3: σ 1: ε0.1670.252: σ 1: ε0.1250.251: σ1: ε0.0940.250.07(at barge1 west-coast)1: ε1.0(at barge1 Richmond)(operational barge1)(poor-weather)(speed barge1 1)(ready barge1)(distance Richmond west-coast 4)0.1670.253: υ0.250.75(fair-weather)2: σ2: υ0.1670.1870.751: σ1: υ0.292(fair-weather)0.140.750.386(at barge1 west-coast)(fair-weather)0.1(at barge1 west-coast)NOT (operational barge1)υ= (del (at barge1 Richmond))(add (at barge1 west-coast))(del (operational barge1))0.0830.0833: υ 1: ε0.252: υ2: υ1: ε(fair-weather)0.0630.0830.251: υ1: υ1: ε0.0470.146(fair-weather)0.250.04(at barge1 west-coast)NOT (operational barge1)1: ε0.193(at barge1 west-coast)NOT (operational barge1)(fair-weather)Figure 3.9: The possible sequences of states <strong>in</strong> the <strong>under</strong>ly<strong>in</strong>g Markov decisionprocess aris<strong>in</strong>g from execut<strong>in</strong>g the Move-Barge operator <strong>in</strong> the state shown on theleft, when the exogenous event Weather-Brightens can also take place. Only literalsthat are changed from a parent state are shown.Formal description of the <strong>under</strong>ly<strong>in</strong>g MDPEach state s of the <strong>under</strong>ly<strong>in</strong>g mdp M consists of two parts:1. a truth assignment to the ground literals L, correspond<strong>in</strong>g to a state <strong>in</strong> theplann<strong>in</strong>g doma<strong>in</strong> and referred to as the literal state .2. a set of pend<strong>in</strong>g eects .Each element of the pend<strong>in</strong>g eects is a triple (t; a; ) where t is an <strong>in</strong>teger denot<strong>in</strong>ga time <strong>in</strong>terval, a is an operator or exogenous event and is a set of \eects", aset of add and delete statements over ground literals <strong>in</strong> L. The pend<strong>in</strong>g eectsrepresent the events and actions that are currently tak<strong>in</strong>g place <strong>in</strong> the given state ofM. Intuitively, t denotes the time left before the eect \completes" or is applied tochange the state.


34 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>We beg<strong>in</strong> by den<strong>in</strong>g a \successor function" (:) on states of M. This functionshows how a state leads to its successor if there are no exogenous events, and is usedas the basis for build<strong>in</strong>g the transition function for the MDP.Let s be a state <strong>in</strong> M and write s =[ where and are the literal stateand the pend<strong>in</strong>g eects as described above. To calculate the successor state (s),rst nd those pend<strong>in</strong>g eects whose time-to-complete value is 1: I = f =(1;a;)j 2g I can be thought of as the \imm<strong>in</strong>ent" pend<strong>in</strong>g eects, and is applied to theliteral state to give a new literal state 0 , the literal state <strong>in</strong> the successor MDPstate (s). 0 is dened by specify<strong>in</strong>g a truth value for each ground literal: 0 (l) = true; when there is some pend<strong>in</strong>g eect (1;a;) 2 I , such that conta<strong>in</strong>s (add l) and a is the least operator or event <strong>in</strong> thetotal order with a pend<strong>in</strong>g eect that mentions l. 0 (l) = false; when there is some pend<strong>in</strong>g eect (1;a;) 2 I , such that conta<strong>in</strong>s (del l) and a is the least operator or event <strong>in</strong>thetotal order with a pend<strong>in</strong>g eect that mentions l. 0 (l) = (l); if no such pend<strong>in</strong>g eect exists.The rema<strong>in</strong><strong>in</strong>g pend<strong>in</strong>g eects, R =, I , contribute to the pend<strong>in</strong>g eectsof the new state (s) with their time count reduced by one:(s) = 0 [f(t,1;a;)j(t; a; ) 2 R gThis successor function forms the basis of the MDP M describ<strong>in</strong>g the actions andexternal events <strong>in</strong> the plann<strong>in</strong>g doma<strong>in</strong> P . As described below, changes to the worlddue to actions or external events are expressed by <strong>in</strong>sert<strong>in</strong>g pend<strong>in</strong>g eects <strong>in</strong>to theset for the state that is changed. At each time step, the successor function is alsoperformed. Note that the successor function is determ<strong>in</strong>istic: (s) is uniquely denedfor each state s 2 M.In the special case when there are no external events, the transition function(a; s) that maps an action a and a state s <strong>in</strong> M to a probability distribution of newstates is dened as follows:(a; s) =(s[f(d; a; i )g) with probability p i ; with 1 i nWhere d is the duration of the action <strong>in</strong> the state s, and the eects i and correspond<strong>in</strong>gprobabilities p i are computed from the actions eect distribution function,edf(s) =f(p i ; i ) such that 1 i ng. Thus if the duration d of action a is 1, theeects take place <strong>in</strong> the next state after a is performed, otherwise there is a delaycorrespond<strong>in</strong>g to d before a's eects are realised.When exogenous events may take place, they modify the possible transitions fromthe states <strong>in</strong> which their preconditions are satised. For example, the presence of


3.3. A Markov decision process model of uncerta<strong>in</strong>ty 35the event Weather-Brightens leads to four possible transitions from the <strong>in</strong>itial state<strong>in</strong> Figure 3.9, while the same state has two transitions when there are no exogenousevents <strong>in</strong> Figure 3.8. Essentially, each comb<strong>in</strong>ation of occurrences of the applicableexogenous events and each possible outcome of the chosen action leads to a possibletransition for the state and the action. The probability of the transition is the productof the probabilities correspond<strong>in</strong>g to the events and the action outcome.Recall that exogenous events have a precondition, duration and eect distributionsimilar to actions, and also have a probability of occurrence p. Let E be the set of ofexogenous events and let H(E;s) be the set of external events whose preconditionsare satised <strong>in</strong> the state s. Intuitively, any comb<strong>in</strong>ation of the events <strong>in</strong> H(E;s)might take place when an action is performed <strong>in</strong> state s. The probability that eventh takes place and has eect ed(s) i is p p i .For each subset H 0 H(E;s), apply<strong>in</strong>g action a <strong>in</strong> state s can lead to transitionswith the follow<strong>in</strong>g probabilities:(a; s) = (s [f(d; a; i )g [with prob p iYh2H 0 (d h ;h;edf(s) ih ))h2H 0 p h p ihYh 0 2H,H 0 (1 , p h 0)This expression comb<strong>in</strong>es the probabilities that (1) action a has its ith possibleoutcome, that (2) each external event <strong>in</strong>H 0 takes place and has its i h th outcome,and that (3) none of the external events <strong>in</strong> H , H 0 takes place. These outcomesare treated as <strong>in</strong>dependent. If R is the maximum number of dierent outcomes thatan event can have and jEj is the number of exogenous events, then the worst casecomplexity of computer (a; s) is exponential <strong>in</strong> jEjR s<strong>in</strong>ce each comb<strong>in</strong>ation ofthe outcomes must be considered. In the next chapter I show howWeaver evaluates aplan typically without comput<strong>in</strong>g (a; s), us<strong>in</strong>g the plan to constra<strong>in</strong> the events andoutcomes considered.Now that the state space and the state transition function have been dened forthe MDP that models a particular plann<strong>in</strong>g problem, it rema<strong>in</strong>s only to specify thereward function for the MDP. Recall that <strong>in</strong> an MDP, a reward R(a; s) isgiven formak<strong>in</strong>g a transition to state s us<strong>in</strong>g action a. An optimal policy on the MDP isachoice of an action for each state <strong>in</strong> the MDP that will maximise the expectedreward. The orig<strong>in</strong>al plann<strong>in</strong>g problem species a goal description, a logical sentenceover the literals L of the doma<strong>in</strong>. In order for the optimal policies on the MDP tohave the eect of reach<strong>in</strong>g a goal state and stay<strong>in</strong>g there, I specify a reward of 0 for alltransitions between goal state, and reward of 1 for each transition from a non-goal toa goal state and a reward of -2 for each transition from a goal to a non-goal state. Anew action is also added to the MDP, dist<strong>in</strong>guished from the actions of the plann<strong>in</strong>gproblem, called stop. This action always connects a state to the same state with areward of 0.


36 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>It is easy to see that any optimal policy will choose the stop action (or an equivalent)at a goal state, s<strong>in</strong>ce the negative reward for leav<strong>in</strong>g the state dom<strong>in</strong>ates thepositive reward for subsequently arriv<strong>in</strong>g at a goal state. It is also easy to see that theexpected reward of any policy is equal to the probability that the policy will reachsome goal state. A plan <strong>in</strong> the orig<strong>in</strong>al plann<strong>in</strong>g problem can easily be mapped to apolicy <strong>in</strong> the MDP <strong>in</strong> the manner described <strong>in</strong> the follow<strong>in</strong>g example. It is reasonableto then dene the probability of success of the plan as the conditional expected valueof that policy, conditioned on the <strong>in</strong>itial state distribution of the plann<strong>in</strong>g problem.3.4 Putt<strong>in</strong>g it together: a complete exampleIn the previous sections I have developed operators and events from an oil-spill doma<strong>in</strong>to illustrate elements of the representation for plann<strong>in</strong>g problems used <strong>in</strong> this thesis.In this section I will extend the doma<strong>in</strong>, show the model for a simple plann<strong>in</strong>g problemand evaluate a plan as a policy on the model. I will use the plann<strong>in</strong>g problem<strong>in</strong>troduced <strong>in</strong> Section 3.1 with the extensions <strong>in</strong>troduced <strong>in</strong> Section 4.1. This exampleis a simplication of the oil-spill clean-up doma<strong>in</strong> described <strong>in</strong> chapter 7.Figure 3.10 shows all the operators from the plann<strong>in</strong>g doma<strong>in</strong>, and Figure 3.11shows all the exogenous events. The object types <strong>in</strong> the doma<strong>in</strong> are the top-leveltypes Barge and Place, and the two sub-types of Place, Dock and Sea-Sector.The typed literals <strong>in</strong> the doma<strong>in</strong> are (at Barge Place), (oil-<strong>in</strong>-barge Barge),(disposed-oil), (oil-<strong>in</strong>-tanker Sea-Sector), (operational Barge),(barge-ready Barge), (distance Place Place), (poor-weather)and (fair-weather).As <strong>in</strong> the previous sections, the plann<strong>in</strong>g problem considered has one object,barge1,oftype Barge, one object, Richmond,oftype Dock and one object, west-coast,of type Sea-Sector. Thus the 10 literals <strong>in</strong> L yield 11 ground literals <strong>in</strong> L, for a statespace size of 2 11 = 2048. In fact only 48 of these states are physically possible, butthis is not explicitly modelled as discussed <strong>in</strong> Section 3.2. The plann<strong>in</strong>g <strong>in</strong>itial stateand goal are shown <strong>in</strong> Table 3.3.In Section 4.1 I showed how a conditional planner based on Prodigy can computea plan for this problem with one conditional branch. In this section I will describethe <strong>under</strong>ly<strong>in</strong>g MDP for this plann<strong>in</strong>g problem, show the expected reward for thisplan <strong>in</strong>terpreted as a policy on the MDP and expla<strong>in</strong> the relationship between theexpected reward and the plan's probability of success.Figure 3.12 shows a reachability graph of the states reachable from the <strong>in</strong>itialstate of the example plann<strong>in</strong>g problem through sequences of operators and exogenousevents. These are states <strong>in</strong> the state space of the plann<strong>in</strong>g problem, not the <strong>under</strong>ly<strong>in</strong>gMDP, because they do not conta<strong>in</strong> any <strong>in</strong>formation about pend<strong>in</strong>g eects. Only 12states are reachable because (barge-ready barge1) is true <strong>in</strong> the <strong>in</strong>itial state andcannot be made false, and because the weather can only be altered by exogenousevents. For brevity, the object barge1 is not shown <strong>in</strong> the literals <strong>in</strong> which itisan


3.4. Putt<strong>in</strong>g it together: a complete example 37(operator Unload-Oil(preconds (( Barge)( Dock))(and (at )(effects ()(oil-<strong>in</strong>-barge )))((del (oil-<strong>in</strong>-barge ))(add (disposed-oil)))))(operator Pump-Oil(preconds (( Barge)( Sea-Sector))(and (at )(effects ()(operational )(fair-weather)(oil-<strong>in</strong>-tanker )))((add (oil-<strong>in</strong>-barge ))(del (oil-<strong>in</strong>-tanker )))))(operator Make-Ready(preconds (( Barge))(effects ()(true))((add (barge-ready )))))(operator Move-Barge(duration (/ (distance )(preconds(( Barge)(((speed )))Place)(and Place ( di ))))(at ))(effects(branch (barge-ready )((0.667 ()((add (at ))(del (at ))))(0.333 ()((add (at ))(del (at ))(del (operational )))))((0.1 ()((add (at ))(del (at ))))(0.9 ()((add (at ))(del (at ))(del (operational ))))))))Figure 3.10: The operators from the simple oil-spill doma<strong>in</strong> used <strong>in</strong> this chapter.When the duration of an operator isn't mentioned, it defaults to 1.argument, only true literals are shown and the literals (barge-ready barge1) and(fair-weather), which are always true, are not shown.If the doma<strong>in</strong> <strong>in</strong>cluded no exogenous events, then the <strong>under</strong>ly<strong>in</strong>g MDP for theproblem would be similar to the reachability graph except that each transition correspond<strong>in</strong>gto the action move-barge would lead to a sequence of 3 transitions encod<strong>in</strong>gthe count-down of the pend<strong>in</strong>g eects. A part of this MDP, restricted onlyto the four states reachable from the <strong>in</strong>itial state <strong>in</strong> which (oil-<strong>in</strong>-tanker) is true,is shown <strong>in</strong> Figure 3.13. In this diagram, literal <strong>in</strong>formation is not shown when astate's literals match those of its parents. There are 4 dierent sets of pend<strong>in</strong>g eectsthat are marked with letters. For example, \a" corresponds to f(del (at barge1


38 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>Weather-BrightensWeather-Darkens(event(event(probability 0.25)(probability 0.25)(duration 1)(duration 1)(preconds ()(preconds ()(poor-weather))(fair-weather))(effects ()(effects ()((del (poor-weather)) ((del (fair-weather))(add (fair-weather))))) (add (poor-weather)))))Figure 3.11: The exogenous events from the oil-spill doma<strong>in</strong>.(at barge1 Richmond)(at barge1 west-coast)(oil-<strong>in</strong>-barge barge1)(disposed-oil)(oil-<strong>in</strong>-tanker west-coast)(operational barge1)(barge-ready barge1)(distance Richmond west-coast 4)(distance west-coast Richmond 4)(poor-weather)(fair-weather)tffftttttftGoal: (disposed-oil)Table 3.3: The <strong>in</strong>itial state and goal description. Here t and f stand for \true"and \false".Richmond)), (add (at barge1 west-coast))g and \c" corresponds to the eects <strong>in</strong>\a" together with (del (operational barge1)). The transitions that correspond to<strong>in</strong>itiat<strong>in</strong>g an action are labelled with their probabilities, but <strong>in</strong>termediate transitionsare not labelled.The extension of this MDP to one with the full graph <strong>in</strong>clud<strong>in</strong>g the eects ofexogenous events is shown <strong>in</strong> Figure 3.14. The MDP is composed of two layers. Theliteral (fair-weather) is true <strong>in</strong> each state <strong>in</strong> the top layer and (poor-weather) istrue <strong>in</strong> each state of the lower layer. With<strong>in</strong> eachlayer there are three groups of statesthat are similar to the states <strong>in</strong> Figure 3.13. In the rst, the literal (oil-<strong>in</strong>-tankerwest-coast) is true, <strong>in</strong> the second, (oil-<strong>in</strong>-barge barge1) is true and <strong>in</strong> the third,(disposed-oil) is true. Each layer is therefore similar to the reachability graphshown <strong>in</strong> Figure 3.12. Includ<strong>in</strong>g the other actions and the two exogenous events doesnot <strong>in</strong>crease the number of states because they all have durations of 1 time unitand so do not give rise to <strong>in</strong>termediate states. Therefore the result<strong>in</strong>g MDP has 96states. For each transition with<strong>in</strong> a layer, one of the exogenous events causes anothertransition, whichis1=3 as likely, with the same start<strong>in</strong>g po<strong>in</strong>t but the equivalent state<strong>in</strong> the other layer as the end<strong>in</strong>g po<strong>in</strong>t. A few of these transitions are illustrated <strong>in</strong>Figure 3.14 by the dotted l<strong>in</strong>es towards the bottom of the gure.


3.4. Putt<strong>in</strong>g it together: a complete example 39move(at Richmond)(oil-<strong>in</strong>-tanker)(operational)move(at west-coast)(oil-<strong>in</strong>-tanker)(operational)pump-oilmovemove(at Richmond)(oil-<strong>in</strong>-tanker)NOT (operational)movemove(at west-coast)(oil-<strong>in</strong>-tanker)NOT (operational)(at Richmond)(oil-<strong>in</strong>-barge)(operational)movemove(at west-coast)(oil-<strong>in</strong>-barge)(operational)move(at Richmond)(disposed-oil)(operational) movemove(at west-coast)(disposed-oil)(operational)move(at Richmond)move(at west-coast)(disposed-oil)(disposed-oil)NOT (operational)NOT (operational)moveunload-oilunload-oilmove(at Richmond)(oil-<strong>in</strong>-barge)NOT (operational)movemovemove(at west-coast)(oil-<strong>in</strong>-barge)NOT (operational)Figure 3.12: Reachability graph of literal states <strong>in</strong> the plann<strong>in</strong>g problem from the<strong>in</strong>itial state, consider<strong>in</strong>g only operators. The <strong>in</strong>itial state is shown <strong>in</strong> the top left-handcorner, and the four goal states are shown <strong>in</strong> bold <strong>in</strong> the bottom left-hand corner.Actions correspond<strong>in</strong>g to the plan that is produced for this problem are shown <strong>in</strong>bold.The plan found by the conditional planner described <strong>in</strong> Section 4.1 is:(Move-Barge =barge1 =Richmond =west-coast)(Pump-Oil =barge1 =west-coast)(Move-Barge =barge1 =west-coast =Richmond)(Unload-Oil =barge1 =Richmond)This plan is illustrated by the bold l<strong>in</strong>es <strong>in</strong> Figure 3.14. These bold l<strong>in</strong>es alsorepresent a partial policy on the MDP, achoice of action for some of the states <strong>in</strong> theMDP. The plan can be extended to a full policy byany arbitrary assignment of actionsto the rema<strong>in</strong><strong>in</strong>g states. I choose the follow<strong>in</strong>g assignment, which is guaranteed notto <strong>in</strong>crease the expected reward of the policy: A new action called fail is added tothe set of actions <strong>in</strong> the MDP. This action does not appear <strong>in</strong> the plann<strong>in</strong>g doma<strong>in</strong>.The action is dened <strong>in</strong> every state of the MDP to loop back to the same state withprobability 1. The reward function assigns a reward of 0 for each such transition.The partial policy is extended by choos<strong>in</strong>g this action <strong>in</strong> every state where the plandoes not choose an action.The plan used for this problem does not branch, consist<strong>in</strong>g of a l<strong>in</strong>ear sequenceof actions. The partial policy is made from the plan by choos<strong>in</strong>g the rst action <strong>in</strong>each state that has non-zero probability <strong>in</strong> the <strong>in</strong>itial state distribution, choos<strong>in</strong>g the


40 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>(at Richmond)(oil-<strong>in</strong>-tanker)(operational)(a,3) (a,2) (a,1)move, 2/3move, 2/3(at west-coast)(oil-<strong>in</strong>-tanker)(operational)move, 1/3(b,1)(b,2)(b,3)move, 1/3(c,3)(c,2)(d,3)(d,1)(d,2)(c,1)(at Richmond)(oil-<strong>in</strong>-tanker)NOT (operational)move, 1(a,3) (a,2) (a,1)(at west-coast)(oil-<strong>in</strong>-tanker)NOT (operational)(b,1) (b,2) (b,3)move, 1Figure 3.13: Part of the MDP that would correspond to the example problemif it had no exogenous events, restricted to the states <strong>in</strong> which (oil-<strong>in</strong>-tanker) istrue. All transitions and <strong>in</strong>termediate states correspond to a move-barge action. Thetransitions that correspond to <strong>in</strong>itiat<strong>in</strong>g an action are labelled with their probabilities.Intermediate transitions are not labelled.second action <strong>in</strong> each state that is reached when the rst action completes, and so on.If a state that is reached already has an action choice, <strong>in</strong> other words the plan hasvisited this state previously with non-zero probability, the action chosen rst rema<strong>in</strong>s.If every possible sequence of states through the MDP that arose from follow<strong>in</strong>g theplan had a loop, this algorithm would not be able to assign every action <strong>in</strong> the plan.However, Prodigy <strong>in</strong>cludes a mechanism for avoid<strong>in</strong>g loops and, <strong>in</strong> the absence ofexogenous events, at least one path through its plan will complete. I will discuss thecase for plann<strong>in</strong>g with exogenous events <strong>in</strong> the follow<strong>in</strong>g chapter.


3.4. Putt<strong>in</strong>g it together: a complete example 41(at Richmond)(oil-<strong>in</strong>-tanker)(operational)move, 0.25move, 0.5move, 0.5(at west-coast)(oil-<strong>in</strong>-tanker)(operational)move, 0.25unload-oil, 0.75move, 0.75(at Richmond)(oil-<strong>in</strong>-tanker)NOT (operational)move, 0.75(at west-coast)(oil-<strong>in</strong>-tanker)NOT (operational)pump-oil, 0.75(at Richmond)(oil-<strong>in</strong>-tanker)(operational)(at west-coast)(oil-<strong>in</strong>-tanker)(operational)(at Richmond)(oil-<strong>in</strong>-tanker)(operational)(at west-coast)(oil-<strong>in</strong>-tanker)(operational)pump-oil, 0.75(at Richmond)(oil-<strong>in</strong>-tanker)NOT (operational)move, 0.75(at west-coast)(oil-<strong>in</strong>-tanker)NOT (operational)(at Richmond)(oil-<strong>in</strong>-tanker)NOT (operational)(at west-coast)(oil-<strong>in</strong>-tanker)NOT (operational)move, 0.25(fair-weather)(at Richmond)(oil-<strong>in</strong>-tanker)NOT (operational)(at west-coast)(oil-<strong>in</strong>-tanker)NOT (operational)(at Richmond)(oil-<strong>in</strong>-tanker)NOT (operational)(at west-coast)(oil-<strong>in</strong>-tanker)NOT (operational)(poor-weather)Figure 3.14: The full set of states for the MDP for the example problem. Thetransitions due to exogenous events are shown with dotted l<strong>in</strong>es, connect<strong>in</strong>g thetwo planes <strong>in</strong>to which the states are divided. In the upper plane the state variable(fair-weather) is true and <strong>in</strong> the lower plane (poor-weather) is true.


42 Chapter 3. <strong>Plann<strong>in</strong>g</strong> <strong>under</strong> <strong>Uncerta<strong>in</strong>ty</strong>


Chapter 4The Weaver AlgorithmIn the previous chapter I presented a model of plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty, describ<strong>in</strong>gthe representation of uncerta<strong>in</strong> plann<strong>in</strong>g problems and the semantics of the representation<strong>in</strong> terms of a Markov decision process. In this chapter I provide a broadoverview of Weaver's algorithm. Weaver operates <strong>in</strong> a loop, shown <strong>in</strong> Figure 4.1. Oneach iteration it creates a candidate plan, evaluates its probability of success and usesthe evaluation to suggest ways to improve the candidate plan. Each of these threeactivities corresponds to a module <strong>in</strong> the system and is described below.ConditionalPlannerPlanEvaluatorFailure if allplans triedPlan criticReduced doma<strong>in</strong> modelBelief netSuccess if thresholdprobability met.Figure 4.1: Weaver's ma<strong>in</strong> loop has three modules: a conditional planner, a planevaluator and a plan critic.Given a plann<strong>in</strong>g problem, one can always produce its <strong>under</strong>ly<strong>in</strong>g Markov decisionprocess as shown <strong>in</strong> the last chapter and, <strong>in</strong> pr<strong>in</strong>ciple, solve it us<strong>in</strong>g standardtechniques such as policy iteration to nd a plan. This approach is computationally<strong>in</strong>tractable <strong>in</strong> practice because the model grows rapidly as objects and actions areadded to the doma<strong>in</strong>. The idea beh<strong>in</strong>d Weaver is to use goal-directed plann<strong>in</strong>g tofocus attention on a version of the orig<strong>in</strong>al problem that ignores irrelevant features,<strong>in</strong> which a plan can be formulated without explicitly reason<strong>in</strong>g about the full Markovdecision process. After creat<strong>in</strong>g such a plan, Weaver creates a parsimonious modelof its probability of success, us<strong>in</strong>g the \evaluator" module <strong>in</strong> Figure 4.1. This model43


44 Chapter 4. The Weaver Algorithmis used by the \plan critic" module to <strong>in</strong>crease the subset of features considered to<strong>in</strong>clude all the relevant ones, that is, all the features that are used to compute theplan's probability of success. Once the feature set has been <strong>in</strong>creased a new plan isproduced if necessary and the process is repeated, allow<strong>in</strong>g the model of the plann<strong>in</strong>gproblem to grow <strong>in</strong>crementally.The smaller version of the plann<strong>in</strong>g problem that ignores many features is referredto <strong>in</strong> Figure 4.1 as the \reduced doma<strong>in</strong> model". As the gure shows, the conditionalplanner works <strong>in</strong> this reduced doma<strong>in</strong> model while the plan evaluator and plan criticdo not. However, the latter modules still take a constra<strong>in</strong>ed view of the plann<strong>in</strong>gproblem, because they take the plan as their <strong>in</strong>put and only consider doma<strong>in</strong> featuresthat aect it. The precise way <strong>in</strong> which each module constra<strong>in</strong>s the next is mademore clear as the modules are described below. In the next section I describe theconditional planner and <strong>in</strong> Section 4.2 I describe how the plan is evaluated with aprobabilistic model that captures only the relevant features. In Section 4.3 I show howthe plan critic updates the plan by <strong>in</strong>creas<strong>in</strong>g the number of sources of uncerta<strong>in</strong>tymade visible to the planner.4.1 Conditional plann<strong>in</strong>gPlanners that deal with uncerta<strong>in</strong>ty must be able to create and evaluate conditionalplans, which conta<strong>in</strong> branches. For example, suppose that the Move-Barge operatorcan have two possible outcomes, <strong>in</strong> one of which the barge is made <strong>in</strong>operable evenif it is ready. If the plann<strong>in</strong>g problem conta<strong>in</strong>s more than one barge, then a planto transfer the oil with high probability should be conditional on the state of thebarge be<strong>in</strong>g used. Several plann<strong>in</strong>g systems are capable of solv<strong>in</strong>g problems like this,such ascnlp [Peot & Smith 1992], Cassandra [Pryor & Coll<strong>in</strong>s 1996] and C-Buridan[Draper, Hanks, & Weld 1994]. In this section I rst describe some extensions madeto prodigy 4.0's set of search nodetypes <strong>in</strong> order to support plann<strong>in</strong>g <strong>in</strong> nondeterm<strong>in</strong>isticdoma<strong>in</strong>s, then describe an algorithm that extends prodigy 4.0 to beable to create conditional plans.4.1.1 Extensions to the prodigy 4.0 node set to supportprobabilistic plann<strong>in</strong>gWeaver <strong>in</strong>teracts with Prodigy by analys<strong>in</strong>g a candidate plan, <strong>in</strong>sert<strong>in</strong>g new nodes<strong>in</strong>to the search tree and re-start<strong>in</strong>g Prodigy's search tree at some node of its choos<strong>in</strong>g.The details of this process are provided <strong>in</strong> Chapter 4. In order to support this process,two new types of nodes are added to the vocabulary of prodigy 4.0 to represent newdecisions that can be made about the plan. These are context nodes and protectionnodes. A context node represents the decision to treat a particular step <strong>in</strong> the tailplan as hav<strong>in</strong>g a set of alternative possible outcomes, rather than just one possible


4.1. Conditional plann<strong>in</strong>g 45outcome as is the standard. The use of these nodes is described <strong>in</strong> Section 4.1.2, whilehere I cover protection nodes.A protection node represents a decision to augment the preconditions of a step thatis already <strong>in</strong> the tail plan with extra preconditions. Weaver uses protection nodes tospecify extra preconditions to ensure that either a possible outcome of the step willnot occur or that some exogenous event will not occur simultaneously with the step.Prodigy can make use of protection nodes <strong>in</strong>dependently of Weaver to ensure thatsome chosen conditional eects of a step will not occur. In fact, some mechanism ofthis k<strong>in</strong>d is required for planners <strong>in</strong> the Prodigy family to be complete [Blythe & F<strong>in</strong>k1997]. As an illustration, suppose that the Move-Barge and Pump-Oil operators areextended so that Pump-Oil requires that the barge be operational, and Move-Bargemay conditionally delete that the barge is operational if the literal barge-ready isfalse. These altered operators are shown <strong>in</strong> Figure 4.2.Pump-Oil(Operator(preconds (( Barge)( Sea-Sector))(and (at )(operational )(oil-<strong>in</strong>-tanker )))(effects ()((add (oil-<strong>in</strong>-barge ))(del (oil-<strong>in</strong>-tanker )))))Make-Ready(Operator(preconds (( Barge))(true))(effects ()((add (barge-ready )))))Move-Barge(Operator(preconds(( Barge)( Place)( (and Place ( di ))))(at ))(effects ()((del (at ))(add (at ))(if (~ (barge-ready ))((del (operational )))))))Figure 4.2: The Pump-Oil and Move-Barge operators are modied respectivelyto require and conditionally delete (operational ). The new operatorMake-Ready can add barge-ready.If the <strong>in</strong>itial state is unchanged from that of Section 3.1.2, so that (barge-readybarge1) is false, then prodigy 4.0 as described is not able to solve the problemwith the new operators. This is because it has no way to add the step (Make-Ready=barge1) to the tail plan, s<strong>in</strong>ce it does not achieve either the top-level goal


46 Chapter 4. The Weaver Algorithmn5 Goal (disposed-oil) top-leveln6 Operator unload-oiln7 B<strong>in</strong>d<strong>in</strong>gs n8 Goal (oil-<strong>in</strong>-barge barge1) for n7n9 Operator pump-oiln10 B<strong>in</strong>d<strong>in</strong>gs n11 Goal (at barge1 west-coast) for n10n12 Operator move-bargen13 B<strong>in</strong>d<strong>in</strong>gs n14 Protect (operational barge1) from n13 with (barge-ready barge1)n15 Goal (barge-ready barge1)n16 Operator make-readyn17 B<strong>in</strong>d<strong>in</strong>gs n18 Apply apply n17n19 Apply apply n13n16 Goal (at barge1 richmond) for n7n17 Operator move-bargen18 B<strong>in</strong>d<strong>in</strong>gs Table 4.1: An annotated trace of Prodigy, show<strong>in</strong>g the use of a protection node(n14) to add a precondition to the step at node n13 which will stop its conditionaleect from tak<strong>in</strong>g place.or any open precondition <strong>in</strong> the problem. However the step is necessary becauseotherwise the rst step of the plan,(Move-Barge =barge1 =Richmond =west-coast),will render the barge <strong>in</strong>operable.The key fact about the orig<strong>in</strong>al plan is that a literal is made false by a conditionaleect that is later required to be true but cannot be made true. This <strong>in</strong>dicates thata plan might be found by stopp<strong>in</strong>g the conditional eect from tak<strong>in</strong>g place, whichmay be done by requir<strong>in</strong>g that the conditions of the conditional eect be false whenthe step is moved to the head plan. I created a version of Prodigy that is able tonotice this condition and to add a protection node to the search tree that replaces thepreconditions of this step with the conjunction of its preconditions and the negationof the conditions of the conditional eect, similar <strong>in</strong> spirit to the confrontation step ofucpop-style planners [McAllester & Rosenblitt 1991]. This is described <strong>in</strong> [Blythe &F<strong>in</strong>k 1997]. Table 4.1 shows the sequence of nodes <strong>in</strong> the search tree that correspondto a solution based to the new problem based on the previous solution sequence <strong>in</strong>Table 3.2.


4.1. Conditional plann<strong>in</strong>g 474.1.2 B-ProdigyWeaver's conditional planner, B-prodigy, is a modied version of prodigy 4.0described <strong>in</strong> Section 3.1, that creates branch<strong>in</strong>g plans. B-prodigy, which standsfor Branch<strong>in</strong>g Prodigy, allows steps <strong>in</strong> the tail plan to have more than one possibleoutcome, and its plann<strong>in</strong>g algorithm has two pr<strong>in</strong>cipal modications to prodigy4.0. First, steps <strong>in</strong> both the head plan and the tail plan are associated with contexts,which represent the branch of the plan <strong>under</strong> which the step is proposed to be used.A step's context is a rst-order logical expression over variable-value pairs for contextvariables. When a step with more than one possible outcome is <strong>in</strong>troduced <strong>in</strong>to thetail plan, a new context variable is <strong>in</strong>troduced <strong>in</strong>to the plan, with possible valuesrepresent<strong>in</strong>g the possible outcomes of the step, s<strong>in</strong>ce each outcome may lead to aseparate branch <strong>in</strong> the plan. The step is termed the producer of the context variable.The context associated with a step may <strong>in</strong>volve several context variables. For eachvariable the context may <strong>in</strong>clude a s<strong>in</strong>gle value, the negation of a value or a disjunctionof several values. The dierent variables are comb<strong>in</strong>ed as a conjunction.The second modication to the prodigy 4.0 algorithm takes eect when abranch<strong>in</strong>g step is moved from the tail plan to the head plan, at which time theB-prodigy rout<strong>in</strong>e is called recursively for each possible outcome of the step. Oneach call, the rema<strong>in</strong><strong>in</strong>g steps <strong>in</strong> the tail plan that do not match the context correspond<strong>in</strong>gto the chosen outcome of the branch<strong>in</strong>g step are ignored. The several callstogether produce a branch<strong>in</strong>g head plan. These two alterations are sucient to createbranch<strong>in</strong>g plans <strong>in</strong> uncerta<strong>in</strong> doma<strong>in</strong>s. Table 4.2 shows the top-level algorithm. Theparts pr<strong>in</strong>ted <strong>in</strong> bold are only present<strong>in</strong>B-prodigy, the others are from the orig<strong>in</strong>alprodigy 4.0 algorithm.Figure 4.3 shows a simplied version of the Move-Barge operator from the plann<strong>in</strong>gdoma<strong>in</strong> used <strong>in</strong> the previous chapter <strong>in</strong> which mov<strong>in</strong>g a barge has a 1/3 probabilityof render<strong>in</strong>g it <strong>in</strong>operable. The effects slot of the operator conta<strong>in</strong>s two lists of addand delete changes labelled with their probability of occurrence <strong>in</strong>stead of the s<strong>in</strong>glelist used before. When the operator is applied, exactly one of these lists is chosenrandomly with the given probabilities and used to specify the eects of the operator.The other operators are unchanged from the doma<strong>in</strong> of Section 3.4.Figure 4.4 shows a tail plan constructed for a candidate plan for a plann<strong>in</strong>g problem<strong>in</strong> which two barges, barge1 and barge2 are at the dock Richmond, the oil is <strong>in</strong>a tanker at west-coast as <strong>in</strong> the previous chapter, and the goal is (oil-disposed).Step s3, (Move-Barge =barge1 =Richmond =west-coast),is a context-produc<strong>in</strong>g step that has two possible sets of eects. Mark<strong>in</strong>g a stepas context-produc<strong>in</strong>g is done externally to B-prodigy by Weaver as described <strong>in</strong>Section 4.3. The choice to suppress the fact that some steps have multiple outcomesis part of the denition of the reduced doma<strong>in</strong> model. Us<strong>in</strong>g a reduced model cangreatly reduce the complexity of build<strong>in</strong>g a plan by direct<strong>in</strong>g the planner's eortto the sources of nondeterm<strong>in</strong>ism that impact the plan's probability of success. Theeects marked with context s3 = correspond to the situation where barge1 rema<strong>in</strong>s


48 Chapter 4. The Weaver AlgorithmB-Prodigy1. If the goal statement G is satised <strong>in</strong> the current state C, then return Head-Plan.2. Either(A) Back-Cha<strong>in</strong>er adds an operator with the same context as the step itis l<strong>in</strong>ked to to the Tail-Plan, or(B) Operator-Application moves an operator from Tail-Plan to Head-Plan.Decision po<strong>in</strong>t: Decide whether to apply an operator or to add an operator to thetail.3. Goto 1.If an operator with multiple possible consequences was moved to Head-Plan,call B-Prodigy on each result<strong>in</strong>g plan, <strong>in</strong> each case remov<strong>in</strong>g from Tail-Planall operators that do not match the appropriate context.Operator-Application1. Pick an operator op <strong>in</strong> Tail-Plan such that(A) there is no operator <strong>in</strong> Tail-Plan ordered before op, and(B) the preconditions of op are satised <strong>in</strong> the current state C.Decision po<strong>in</strong>t: Choose an operator to apply.2. Move op to the end of Head-Plan and update the current state C.Table 4.2: Algorithm for B-prodigy, based on prodigy 4.0. Steps which onlyappear <strong>in</strong> B-prodigy are shown <strong>in</strong> bold font.operational when it is moved, and the eects marked with context s3 = correspondto the situation where it is no longer operational.Step s2 is marked as belong<strong>in</strong>g to context s3 = . This means that the plannerchose to add a new l<strong>in</strong>k for the goal it achieves, oil-pumped, <strong>in</strong> the context s3 6= .Another choice open to the planner would be to use step s1 <strong>in</strong> context as well as, and then it would have toachieve the precondition that barge1 be operational<strong>in</strong> that context. S<strong>in</strong>ce no actions are available to make the barge operational, thisoption was rejected.In Figure 4.5 the step s3, mov<strong>in</strong>g barge1, has been moved to the head plan. Tworecursive calls to B-prodigy will now be made, one for each context, <strong>in</strong> which B-prodigy will proceed to create a l<strong>in</strong>ear plan for the context. In context s3 = , stepss4, s5 and s6 will be removed from the tail plan, and <strong>in</strong> context s3 = , steps s1 ands2 will be removed. Note that no steps are removed from the head plan, whatevertheir context. Each recursive call produces a valid l<strong>in</strong>ear plan, and the result is avalid conditional plan that branches on the context produced by step s3.The nal plan produced after the planner solves the and contexts is as follows:(Move-Barge =barge1 =Richmond =west-coast)If (operational barge1)(Pump-Oil =barge1 =west-coast)(Move-Barge =barge1 =west-coast =Richmond)


4.1. Conditional plann<strong>in</strong>g 49Move-Barge(Operator(preconds(( Barge)( Place)( (and Place ( di ))))(at ))(effects(0.667 () ;; context ((add (at ))(del (at ))))(0.333 () ;; context ((add (at ))(del (at ))(del (operational ))))))Figure 4.3: Aversion of the operator Move-Barge with two possible outcomes.s3 +α,βMove-Bargebarge1Richmondwest-coasts2Pump-Oilbarge1west-coasts3= αs1s3=Unload-Oilbarge1Richmondαs6s3=βs5s3=βs4s3=βRootMove-Bargebarge2Richmondwest-coastPump-Oilbarge2west-coastUnload-Oilbarge2RichmondFigure 4.4: An <strong>in</strong>itial tail plan to solve the example problem, show<strong>in</strong>g two alternativecourses of action. Step s3 has two dierent possible outcomes, labelled and. The steps along the top row are restricted to outcome of s3, and use barge1 <strong>in</strong>the situation when it is operational. The steps along the bottom row are restrictedto outcome of s3 and use barge2.(Unload-Oil =barge1 =Richmond)otherwise(Move-Barge =barge2 =Richmond =west-coast)(Pump-Oil =barge2 =west-coast)(Move-Barge =barge2 =west-coast =Richmond)(Unload-Oil =barge2 =Richmond)


50 Chapter 4. The Weaver Algorithmα1. Move-Bargebarge1 Richmondwest-coasts3βs2Pump-Oilbarge1west-coasts3= αs1s3= αUnload-Oilbarge1Richmonds5s3= βs4s3= βRoots6 s3= βMove-Bargebarge2Richmondwest-coastPump-Oilbarge2west-coastUnload-Oilbarge2RichmondHead PlanTail PlanFigure 4.5: A more developed version of the candidate plan. One step has beenmoved to the head plan and now determ<strong>in</strong>es two possible current states, given contextlabels and .When there is only one, universal context, this algorithm is identical to that ofprodigy 4.0. As with prodigy 4.0, it is easy to see that the algorithm is sound,yield<strong>in</strong>g only correct plans. Central to the completeness of back-cha<strong>in</strong>er is the use ofcontexts <strong>in</strong> the tail plan. Contexts allow steps to appear <strong>in</strong> the head plan that arerequired for parts of the plan with a specic context before that context is resolved(for example, tak<strong>in</strong>g an umbrella because it might ra<strong>in</strong>).The completeness of B-prodigy rests on the completeness of the <strong>under</strong>ly<strong>in</strong>gclassical planner. If this is complete, <strong>in</strong> the sense that if there is a non-branch<strong>in</strong>gplan for a problem <strong>in</strong> any classical doma<strong>in</strong> then the planner will nd a plan, thenB-prodigy is complete for the nondeterm<strong>in</strong>istic problems that can be representedto it. Here I make an <strong>in</strong>formal argument that this is the case. Consider a branch<strong>in</strong>gplan that achieves its goal with certa<strong>in</strong>ty. It can be viewed as a merge of several l<strong>in</strong>earplans, one for each leaf of the tree-structured plan, with the steps before some branchpo<strong>in</strong>t possibly establish<strong>in</strong>g preconditions of operators on more than one branch afterthe branch po<strong>in</strong>t. If the classical planner is complete, the tail plan for each <strong>in</strong>dividualbranch of the plan could <strong>in</strong> pr<strong>in</strong>ciple be constructed. For each branch po<strong>in</strong>t <strong>in</strong> theorig<strong>in</strong>al plan, a context-produc<strong>in</strong>g action can then be designated allow<strong>in</strong>g the requiredsteps from the component plans to be present <strong>in</strong> the tail-plan for the branch<strong>in</strong>g plan,designated with dierent contexts. Thus, B-prodigy is complete for branch<strong>in</strong>g plansif the classical planner on which it is based is complete.In general it is not possible to guarantee the completeness of Prodigy because itallows arbitrary functions to be used to determ<strong>in</strong>e b<strong>in</strong>d<strong>in</strong>gs for operators and <strong>in</strong> theantecedents of control rules. In earlier work, completeness is shown for a version ofprodigy 4.0 <strong>in</strong> a limited case when no lisp functions are used <strong>in</strong> the b<strong>in</strong>d<strong>in</strong>gs ofoperators and no control rules are used [Blythe & F<strong>in</strong>k 1997]. Extensions to control


4.2. Calculat<strong>in</strong>g the probability of success 51rules and functions known to conform to certa<strong>in</strong> conditions would be possible.4.2 Calculat<strong>in</strong>g the probability of successTo compute the probability of success of a plan produced by B-prodigy <strong>in</strong> theframework of Section 3.2, we can appeal to the <strong>under</strong>ly<strong>in</strong>g Markov decision processdescribed there. If each barge has an <strong>in</strong>dependent probability of 1/3 of becom<strong>in</strong>gunoperational when it is moved, the probability of success of the <strong>in</strong>itial plan dropsfrom 5/8 to 5/12, and the probability of success of the branch<strong>in</strong>g plan is approximately0.535. The improvement from the cont<strong>in</strong>gent plan that uses barge2 is reduced by thecont<strong>in</strong>u<strong>in</strong>g chances of deterioration <strong>in</strong> the weather conditions.When the plan is implemented as a policy <strong>in</strong> the mdp, it generates a simpleMarkov process, part of which is illustrated <strong>in</strong> Figure 4.6. The probability of successcan be found by summ<strong>in</strong>g the probabilities of each of the goal states <strong>in</strong> the Markovprocess, which are shown <strong>in</strong> the gure with thick borders. In this gure, the thickdotted l<strong>in</strong>e separates the state nodes for the two dierent branches of the cont<strong>in</strong>gentplan. The nodes above the dotted l<strong>in</strong>e correspond to the case <strong>in</strong> which barge1is operational after it is moved, and below the l<strong>in</strong>e to the case <strong>in</strong> which barge1is not operational. For brevity, Figure 4.6 omits the steps of mov<strong>in</strong>g either bargeback to the dock and unload<strong>in</strong>g the oil, and the states marked as goals satisfy (not(oil-<strong>in</strong>-tanker west-coast)). S<strong>in</strong>ce the omitted steps do not fail unless previoussteps have failed, the probability of reach<strong>in</strong>g one of these states is the same as theprobability of reach<strong>in</strong>g a goal state <strong>in</strong> the larger MDP correspond<strong>in</strong>g to the orig<strong>in</strong>algoal. Thus, the plan evaluated <strong>in</strong> Figure 4.6 is the follow<strong>in</strong>g:(Move-Barge =barge1 =Richmond =west-coast)If (operational barge1)(Pump-Oil =barge1 =west-coast)otherwise(Move-Barge =barge2 =Richmond =west-coast)(Pump-Oil =barge2 =west-coast)This plan will be used as an example throughout this section.In order to perform the probability calculations eciently, aBayesian belief netis automatically constructed from the plan. This representation has a number ofadvantages for reason<strong>in</strong>g probabilistically about plans. It makes ecient use of thedependency structure between doma<strong>in</strong> features when comput<strong>in</strong>g probabilities, andhas a sound theoretical basis [Pearl 1988]. It can also eciently ma<strong>in</strong>ta<strong>in</strong> beliefsabout unobservable world features based on observations dur<strong>in</strong>g plan execution.


52 Chapter 4. The Weaver Algorithm0.4710.3530.75m0barge1 = dockbarge2 = dockoil = tankerweather = goodbarge1-op = truebarge2-op = true0.250.75m1-1barge1 = dockbarge2 = dockoil = tankerweather = goodbarge1-op = truebarge2-op = true0.25m1-2barge1 = dockbarge2 = dockoil = tankerweather = badbarge1-op = truebarge2-op = truem2-1barge1 = spillbarge2 = dockoil = tankerweather = goodbarge1-op = truebarge2-op = true0.25m2-2barge1 = spillbarge2 = dockoil = tankerweather = badbarge1-op = truebarge2-op = true0.208m2-3barge1 = spillbarge2 = dockoil = tankerweather = goodbarge1-op = falsebarge2-op = true0.125m2-4barge1 = spillbarge2 = dockoil = tankerweather = badbarge1-op = falsebarge2-op = true0.750.25m3-1barge1 = spillbarge2 = dockoil = barge1weather = goodbarge1-op = truebarge2-op = truem3-2barge1 = spillbarge2 = dockoil = barge1weather = badbarge1-op = truebarge2-op = truepump <strong>in</strong>to barge10.1180.187m3-3barge1 = spillbarge2 = dockoil = tankerweather = goodbarge1-op = falsebarge2-op = true0.146m3-4barge1 = spillbarge2 = dockoil = tankerweather = badbarge1-op = falsebarge2-op = truebarge1 is operational0.118m4-1barge1 = spillbarge2 = spilloil = tankerweather = goodbarge1-op = falsebarge2-op = true0.104m4-2barge1 = spillbarge2 = spilloil = tankerweather = badbarge1-op = falsebarge2-op = true0.059m4-3barge1 = spillbarge2 = spilloil = tankerweather = goodbarge1-op = falsebarge2-op = falsebarge1 is not operational0.0520.089m5-1barge1 = spillbarge2 = spilloil = barge2weather = goodbarge1-op = falsebarge2-op = true0.03m5-2barge1 = spillbarge2 = spilloil = barge2weather = badbarge1-op = falsebarge2-op = truem4-4barge1 = spillbarge2 = spilloil = tankerweather = badbarge1-op = falsebarge2-op = falsemove barge 1move barge 2pump <strong>in</strong>to barge2Time po<strong>in</strong>t: 0 1 2 3 4 5Figure 4.6: A reachability graph of states <strong>in</strong> the plann<strong>in</strong>g problem correspond<strong>in</strong>gto the conditional plan to transfer the oil with one of two barges. Goal states aremarked with a thick border, and the probability of arriv<strong>in</strong>g at each state is shownabove it. Time <strong>in</strong>creases from left to right, and the actions taken are shown withtheir <strong>in</strong>tervals below the nodes.Overview of Bayesian belief networksBefore describ<strong>in</strong>g <strong>in</strong> detail the algorithm used to construct a belief network thatmodels the plan's probability of success, I give here a brief overview of Bayesianbelief networks, based on [Pearl 1988]. In the rest of the thesis I will refer to Bayesian


4.2. Calculat<strong>in</strong>g the probability of success 53belief networks as \belief networks" or \belief nets".ABayesian belief network is a directed acyclic graph (or dag) whose nodes representrandom variables over which it encodes a probability distribution. A simpleexample of a belief net is shown <strong>in</strong> Figure 4.7. Essentially, the arcs <strong>in</strong> a belief netencode <strong>in</strong>formation about local dependencies between variables that can be used toconstruct the full, global probability distribution. Algorithms for comput<strong>in</strong>g the probabilitydistribution and for propagat<strong>in</strong>g observations on the variables as evidence toupdate the posterior distributions of the other variables can make use of the dependency<strong>in</strong>formation to improve eciency [Pearl 1988; Lauritzen & Spiegelhalter 1988].This problem is, however, NP-hard [Cooper 1990].Co<strong>in</strong>1Co<strong>in</strong>2BellFigure 4.7: A small example Bayesian belief network.Formally, if a belief net models a probability distribution, then dependence relationshipsamong variables <strong>in</strong> the distribution can be read from the graph us<strong>in</strong>g thecriterion of d-separation. Let D be a dag that is a belief net, and let X, Y and Z bedisjo<strong>in</strong>t subsets of the variables <strong>in</strong> D. Then Z is said to d-separate X from Y if alongevery path from a node <strong>in</strong> X toanode<strong>in</strong>Y there is a node w satisfy<strong>in</strong>g one of thefollow<strong>in</strong>g conditions: (1) w has converg<strong>in</strong>g arrows and none of w or its descendantsare <strong>in</strong> Z or (2) w does not have converg<strong>in</strong>g arrows and w is <strong>in</strong> Z. This is written D . In Figure 4.7, the variable bell d-separates co<strong>in</strong>1 and co<strong>in</strong>2.Let D be a dag and let I(X; Y; Z) P stand for the statement \Zis conditionally<strong>in</strong>dependent ofXgiven knowledge of Y <strong>in</strong> the probability distribution P ", whereX, Y and Z are subsets of the random variables <strong>in</strong> D. Then D is an I-map of theprobability distribution P if every d-separation displayed <strong>in</strong> D corresponds to a validconditional <strong>in</strong>dependence relationship <strong>in</strong> P , i.e. for every disjo<strong>in</strong>t sets of vertices X,Y and Z, D !I(X; Z; Y ) P :The dag D is a Bayesian belief network of probability distribution P if and onlyif it is a m<strong>in</strong>imal I-map of P . That is, if D is an I-map of P , but remov<strong>in</strong>g any arrowfrom D will make it cease to be an I-map of P .


54 Chapter 4. The Weaver AlgorithmIt would be cumbersome to use this denition to verify that some dag constructedfor a probability distribution P is <strong>in</strong>deed a belief net for P . The follow<strong>in</strong>g theoremdue to Verma [86] makes this considerably easier:TheoremGiven a probability distribution P (x 1 ;x 2 ;...;x n ) and any order<strong>in</strong>g d ofthe variables, the dag created by designat<strong>in</strong>g as parents of X i any m<strong>in</strong>imal set Xiof predecessors satisfy<strong>in</strong>gP (x i j Xi )=P(x i jx 1 ;...;x i,1 ); XifX 1 ;X 2 ;...;X i,1 gis a Bayesian network of P .Construct<strong>in</strong>g a belief net to evaluate a planThe belief net constructed to evaluate a plan has time-stamped nodes of two types:those represent<strong>in</strong>g the belief that some fact about the world is true at a certa<strong>in</strong> timeand those represent<strong>in</strong>g the belief that an executed action or exogenous event has aparticular outcome at a certa<strong>in</strong> time. The belief net is constructed <strong>in</strong> two stages.First, nodes are created represent<strong>in</strong>g beliefs <strong>in</strong> the state variables that are relevantto the plan regardless of any external events that may take place. In the secondstage nodes are added that represent external events and the state features that arerelevant to them. An example of such a net can be found <strong>in</strong> Figure 4.9.The algorithm for creat<strong>in</strong>g the topology of the belief net is shown <strong>in</strong> Figure 4.8. Beforenodes represent<strong>in</strong>g the state of the world are created, a transformation is made tothe state representation so that literals that are functionally related are grouped <strong>in</strong>toone random variable <strong>in</strong> the belief net. For example, s<strong>in</strong>ce a barge can only be <strong>in</strong> oneplace at one time, only one of (at barge1 Richmond) and (at barge1 west-coast)is true <strong>in</strong> a state. These literals are grouped together to contribute to one state variable,location(barge1) which has the value Richmond when (at barge1 Richmond)is true and the value west-coast when (at barge1 west-coast) is true. Thisgroup<strong>in</strong>g of literals <strong>in</strong>to variables is specied through doma<strong>in</strong> axioms added by theuser. For an example see Appendix B.The rst stage <strong>in</strong> the construction of the belief net adds nodes represent<strong>in</strong>g statevariables that are relevant to the plan based on the actions. This is done by rstadd<strong>in</strong>g nodes that represent the actions taken <strong>in</strong> the plan. The assumption is madethat the actions are taken sequentially with no pauses <strong>in</strong> between, although this is notnecessary to the belief net construction algorithm. Thus <strong>in</strong> the example plan, a nodeis added represent<strong>in</strong>g the action (Move-Barge =barge1 =Richmond=west-coast) at time 0, which takes 2 time units, and a node is added represent<strong>in</strong>g(Pump-Oil =barge1 =west-coast) at time 2. S<strong>in</strong>ce theplanner represents its top-level goals as the preconditions to a nal action f<strong>in</strong>ish, anode is added to represent that action at time 3. The probability of success of the planis the probability that this nal node, has the value true. After actions are added,nodes are added to represent their preconditions and eects. For each variable thatcorresponds to one or more of the literals <strong>in</strong> the preconditions of the action, a node


4.2. Calculat<strong>in</strong>g the probability of success 55T =0,S=0Stage 1:For each action A <strong>in</strong> the planCreate a node N A to represent A at time T .For each precondition P of AF<strong>in</strong>d or create a node N P for P at time T .L<strong>in</strong>k N P to N A .T = T + duration(A)For each eect E of AF<strong>in</strong>d or create a node N P for P at time T .L<strong>in</strong>k N P to N A .Stage 2:For each nodeN<strong>in</strong> the belief net represent<strong>in</strong>g a state featureIf N is not the eect of an action,l<strong>in</strong>k it to the most recent previous node of the same type.F<strong>in</strong>d the set E N of events that can aect NFor each event <strong>in</strong> E NAdd nodes for the event and its preconditions<strong>in</strong> the same way as stage 1.If new events were added, go back toStage 2.Figure 4.8: The two-stage algorithm to construct a belief net to evaluate a plan.In the rst stage, nodes are added to the belief net to represent actions and theirpreconditions and eects. In the second stage, persistence assumptions are replacedwith nodes represent<strong>in</strong>g exogenous events and their preconditions and eects whenappropriate.is added represent<strong>in</strong>g belief <strong>in</strong> the variable's value at the time the action is begun,and an arc is added from this node to the action. For each variable that correspondsto one or more literals mentioned <strong>in</strong> the eect distribution of the action, a node isadded represent<strong>in</strong>g belief <strong>in</strong> the variable's value at the time the action completes, andan arc is added from the action to the node. A node represent<strong>in</strong>g a variable at sometime can be both the eect of some action and the precondition of the next action.In what follows I will refer to a node represent<strong>in</strong>g a state variable at a particular timeas a uent node.The rst of the two nets shown <strong>in</strong> Figure 4.9 is the belief net created to evaluatethe example plan before any nodes are added for exogenous events. The time stampof each node is represented by its position along the x-axis. The time value is shownon the x-axis, assum<strong>in</strong>g the rst action <strong>in</strong> the plan beg<strong>in</strong>s at time 0. The type of thenode is represented by its position along the y-axis. For example, all the nodes <strong>in</strong>the top row of the diagram represent the location of the oil, which <strong>in</strong> this plan canbe <strong>in</strong> the tanker or <strong>in</strong> barge1. The action nodes are all <strong>in</strong> the same row tokeep the


56 Chapter 4. The Weaver Algorithmdiagram small, and the particular actions are labelled at the node.Most of the arcs <strong>in</strong> this belief net represent the preconditions and eects of actionsas described above, but while all such arcs would l<strong>in</strong>k a uent node and an actionnode, the net also has ve arcs that directly l<strong>in</strong>k uent nodes. These l<strong>in</strong>ks reectpersistence assumptions <strong>in</strong> the belief net. The algorithm <strong>in</strong>itially makes a persistenceassumption for every uent node with a time value greater than 0. After preconditionand eect nodes are created, each uent node is l<strong>in</strong>ked to the latest node before it ofits type, or if there is no such node a new node of its type is created at time 0 and itis l<strong>in</strong>ked to that. The <strong>in</strong>tention is to encode the belief that each state variable at acerta<strong>in</strong> time has the same value as this parent, which is the most recent state variableof the same type, unless some action changes its value. However the conditionalprobabilities are not added to the net until the possible exogenous events that couldchallenge these persistence assumptions are exam<strong>in</strong>ed.(oil)(operational barge1)(:action)(location barge1)move−bargepump−oilf<strong>in</strong>ish(weather)(oil)0 2 3(operational barge1)(:action)(location barge1)move−bargepump−oilf<strong>in</strong>ish(weather)(weather darkens)(weather brightens)0 12 3Figure 4.9: Belief nets represent<strong>in</strong>g the rst plan created by the conditional plannerfor the example problem, before and after event nodes are added to justify thepersistence of the (weather) variable. In the rst graph, the dotted l<strong>in</strong>es show theunexplored persistence <strong>in</strong>tervals. In the second graph, gray nodes represent possibleoccurrences of external events dur<strong>in</strong>g one <strong>in</strong>terval. No exogenous events aect theother <strong>in</strong>tervals.In the second stage of belief net construction nodes are added to represent exoge-


4.2. Calculat<strong>in</strong>g the probability of success 57nous events that can aect the plan's probability of success. For each l<strong>in</strong>k between twouent nodes, nodes are added represent<strong>in</strong>g the occurrence of any exogenous eventsthat may aect the variable <strong>in</strong>volved. These nodes are added at each time po<strong>in</strong>tthat the persistence <strong>in</strong>terval spans. The second net <strong>in</strong> Figure 4.9 shows the exampleplan after this process. Event nodes, shown <strong>in</strong> gray, have been added for events oftype Weather-Darkens and Weather-Brightens at time po<strong>in</strong>ts 0 and 1, aect<strong>in</strong>g theweather before the Pump-Oil action.As with action nodes, uent nodes are added for the preconditions and eects ofevent nodes. Persistence <strong>in</strong>tervals are created <strong>in</strong> turn for these uent nodes. Newevent nodes are then recursively sought that can aect these persistence <strong>in</strong>tervalsand the process is repeated. This stage term<strong>in</strong>ates because each new <strong>in</strong>terval mustbe shorter than the one for which the previous event nodewas <strong>in</strong>troduced and s<strong>in</strong>cethe plan length was nite the possible <strong>in</strong>tervals over which events can take place willeventually have zero length.At this po<strong>in</strong>t, the conditional probability tables are lled <strong>in</strong>. Any node with noparents is an \<strong>in</strong>itial state" node, a uent node with time stamp 0. These nodes haveprobability distributions determ<strong>in</strong>ed by the <strong>in</strong>itial state distribution of the problem.Any uent node with just a uent node parent isgiven the identity distribution, s<strong>in</strong>ceno exogenous events or planned actions aect the value. An example of this is the(oil) node at time 2. An action or event node is given a probability distribution thatreects its preconditions. If it is determ<strong>in</strong>istic, so is the distribution: all the probabilitiesare 1 or 0. If the action has probabilistic outcomes, each possible outcome islabelled and the conditional probabilities <strong>in</strong> the belief net are assigned from the eectdistribution function. In either case the action's value is false with probability 1whenever its preconditions aren't satised. F<strong>in</strong>ally, if a uent has an event or actionnode parent (or both), then if the action node has a value other than false, the uentnode takes the <strong>in</strong>dicated value with probability 1. Otherwise if an event takes place,the uent takes the value <strong>in</strong>dicated by the event. If neither an action nor an event issuccessfully executed, the uent takes the same value as its parent uent node. Notethat Weaver cannot handle two or more parent events occur<strong>in</strong>g simultaneously due tothe language restriction that events which aect the same literals must havemutuallyexclusive preconditions. Figure 4.10 shows the marg<strong>in</strong>al probabilities for the nodes<strong>in</strong> the belief net for the example plan.When a conditional plan is evaluated, separate nets are created for each l<strong>in</strong>ear setof actions that can be executed <strong>in</strong> the plan. In pr<strong>in</strong>ciple it is quite possible to representa branch<strong>in</strong>g plan with a s<strong>in</strong>gle belief net, but <strong>in</strong> practice it was found that the separatenets are more ecient toevaluate. The test node and value are chosen such that theorig<strong>in</strong>al plan is known to fail if the value is such that the alternative branch of theplan is chosen. In the belief net for the alternative branch, any uent node that hasa time stamp after the test node but whose parents are time-stamped before the testnode is made to have the test node as an extra parent. When the test node takes avalue such that the branch of the plan is not taken, the uent nodetakes on a newvalue branch-not-taken, which is guaranteed to make any subsequent action fail.


58 Chapter 4. The Weaver Algorithm(oil)Tanker: 1Tanker: 0.417Tanker: 1 Barge: 0.583(operational barge1)True: 1True: 0.667False: 0.333(action)Move-BargePump-OilF<strong>in</strong>ishα : 0.667β : 0.333True: 0.417False: 0.583True: 0.417False: 0.583(location barge1)Richmond: 1west-coast: 1Fair: 0.75Fair: 0.625(weather)Fair: 1Poor: 0.25Poor: 0.375(weather darkens)True: 0.25True: 0.19(weather brightens)False: 1 True: 0.0601 23Figure 4.10: The belief net for the rst conditional branch of this section's exampleplan, with marg<strong>in</strong>al probabilities shown.This is done so that the two nets succeed <strong>under</strong> mutually exclusive conditions andthe probabilities of the two f<strong>in</strong>ish step nodes can be added to get the probability ofsuccess of the plan. The nets that are computed for the example plan are shown <strong>in</strong>Figure 4.11.This algorithm produces a belief net D which is concise <strong>in</strong> the sense that onlystate variables and events known to be relevant to the plan are represented. Thebenets of this concise representation are discussed <strong>in</strong> Section 4.4. The algorithmby which the belief net is constructed will ignore exogenous events that can makevariables have desired values, so the probability distribution encoded by the beliefnet is not the same as the true probability distribution of the set of variables as found


4.3. Incrementally improv<strong>in</strong>g a plan with the plan critic 59(oil)(operational barge1)(:action)(location barge1)move−bargepump−oilf<strong>in</strong>ish(weather)(weather darkens)(weather brightens)(oil)0 12 3(operational barge1)(:action)(location barge1)move−bargemove−bargepump−oilf<strong>in</strong>ish(weather)(weather darkens)(weather brightens)(location barge2)(operational barge2)0 1 2 34 5Figure 4.11: The two nets constructed for the two conditional branches of thissection's example plan.from the Markov decision process. However, the belief net is guaranteed not to beover-optimistic: the true probability of the plan's success may be greater than theprobability predicted by the net but it will not be less. The reason that it may begreater is that some chance events may fortuitously improve the plan's probabilityof success, and these events are not necessarily represented <strong>in</strong> the net. A proof thatthe predicted probability cannot be higher than the true probability is given <strong>in</strong> detail<strong>in</strong> Appendix A. Essentially, for each complete assignment to the variables <strong>in</strong> D forwhich plan success is true, a set of paths through the <strong>under</strong>ly<strong>in</strong>g Markov decisionprocess is found whose comb<strong>in</strong>ed probability mass is at least equal to the marg<strong>in</strong>alprobability of the assignment <strong>in</strong>D.4.3 Incrementally improv<strong>in</strong>g a plan with the plancriticWeaver iterates between creat<strong>in</strong>g a plan, evaluat<strong>in</strong>g it, nd<strong>in</strong>g po<strong>in</strong>ts where the plancan be improved and creat<strong>in</strong>g an improved plan, and so on, as shown <strong>in</strong> Figure 4.1.


60 Chapter 4. The Weaver AlgorithmThe plan critic, discussed <strong>in</strong> this section, makes use of the <strong>in</strong>formation about thecurrent plan's probability of success, represented <strong>in</strong> the belief net, to identify po<strong>in</strong>tswhere the current plan can be improved. It then calls the conditional planner tosearch for a new plan that might address one of these po<strong>in</strong>ts.The belief net conta<strong>in</strong>s not only the probability of the plan succeed<strong>in</strong>g, but also theprobabilities of each possible outcome of each <strong>in</strong>termediate step <strong>in</strong> the plan. The plancritic exam<strong>in</strong>es each step whose probability ofhav<strong>in</strong>g a desired outcome is less thanone. For each one, possible explanations for the undesired outcomes are extractedfrom the belief net. These may <strong>in</strong>clude external events that aect the preconditionsof the chosen step as well as previous steps hav<strong>in</strong>g several outcomes, one or more ofwhich causes the plan to fail. The plan critic chooses one suchawtowork on, pick<strong>in</strong>garbitrarily by default. Once a aw ischosen, the plan critic may select one of twoways to attempt to remove the aw. It can either cause the conditional planner to adda new branch which will specically plan for the goals from a situation correspond<strong>in</strong>gto the aw, or it can cause the planner to attempt to make the event or undesiredstep outcome impossible. These actions are described <strong>in</strong> more detail <strong>in</strong> this section.Weaver can backtrack over all of these choice po<strong>in</strong>ts.4.3.1 Analys<strong>in</strong>g the belief net and choos<strong>in</strong>g a awOn each iteration, Weaver exam<strong>in</strong>es the belief net for its candidate plan and uses itto nd a way to improve the plan. Figure 4.10 shows the belief net that is createdto analyse the rst plan produced to solve the example problem. This plan has justtwo steps, to move barge1 to the spill site, and pump oil <strong>in</strong>to it from the tanker.The node represent<strong>in</strong>g the rst action has three possible values, correspond<strong>in</strong>g tothe two possible outcomes and as shown <strong>in</strong> Figure 4.4 and nally the possibilitythat the action is not successfully executed. These values have the probabilities2=3, 1=3 and 0 respectively. The second action has two possible values, represent<strong>in</strong>gwhether the action is successfully or unsuccessfully applied, and their probabilitiesare respectively 5=12 and 7=12. S<strong>in</strong>ce only outcome for the rst action can leadto the plan succeed<strong>in</strong>g, both actions are <strong>in</strong>vestigated by the plan critic for possibleaws.For each action, each uent node <strong>in</strong> the belief net that has an arc to the actionnode is <strong>in</strong>vestigated as a possible aw. The set of desired values for the uent nodeis computed as the subset of its possible values for which the plan will lead to a goalstate, assum<strong>in</strong>g that the other nodes <strong>in</strong> the belief net take on appropriate values.This set is computed as the union across each conditional branch of the plan of the<strong>in</strong>tersection with<strong>in</strong> each conditional branch of the values for the variable required byeach child action node, computed from the action specication. For example, the setof desired values for the uent node correspond<strong>in</strong>g to weather at time 2 is ffairg s<strong>in</strong>ceit forms part of the precondition to the pump-oil action. Similarly the desired valuefor the node correspond<strong>in</strong>g to (operational barge1) is true. Once the set of desiredvalues is computed for a possible aw, their comb<strong>in</strong>ed probability mass is computed


4.3. Incrementally improv<strong>in</strong>g a plan with the plan critic 61as the sum of their <strong>in</strong>dividual probabilities. If this is less than one, the uent nodeis labelled a aw. Table 4.3 shows the three aws found for the <strong>in</strong>itial plan <strong>in</strong> theexample problem, as well as the set of desired values for each aw node and theprobability that the node has a desired value.For each aw node, the plan critic computes a set of explanations. Each explanationspecies either an external event or an action that is a parent of the aw node<strong>in</strong> the belief net, and can take onavalue such that the aw nodemayhaveavalueoutside its set of desired values. All such explanations are produced for each awnode. For example, the one explanation for the aw node correspond<strong>in</strong>g to weatherat time 2 is the event nodeWeather-Darkens at time 1. Each explanation comprisesaaw node, an event of action outcome that can lead to a set of undesired values,the result<strong>in</strong>g set of undesired values and their probability mass.Flaw node Time Desired values P Explanations(operational barge1) 2 ftrueg 0.667 Move-Barge = (weather) 2 ffairg 0.625 Weather-Darkens = true(oil) 3 fbarge1g 0.417 noneTable 4.3: The aws uncovered by the plan critic for the rst candidate plan forthe example problem. For each aw a set of desired values is computed from thepreconditions of the child action node, and the probability mass of this set is show.The explanations column shows parent nodevalues that can lead to an undesiredvalue.Once each explanation is produced for each aw node <strong>in</strong> the plan, the plan criticselects one explanation and attempts to nd an improvement to the current plan thatreduces the probability attached to it. If no such improvement can be found, the plancritic backtracks and selects another aw explanation. For each explanation, the plancritic may choose to add a conditional branch or add preventive steps, as describedbelow.4.3.2 Add<strong>in</strong>g a conditional branchIf the plan critic chooses to address a aw explanation by creat<strong>in</strong>g a new conditionalbranch, it achieves this by add<strong>in</strong>g a new outcome to some step that is already present<strong>in</strong> the candidate plan and mak<strong>in</strong>g a call to the conditional planner. S<strong>in</strong>ce the conditionalplanner searches for a plan that covers every action outcome that it is awareof, it will add the new branch as it creates the plan. Although the plan critic reasonsabout aws with explanations that can consist of action outcomes or exogenousevents, it will represent both of these to the conditional planner as a new action outcome.If the explanation names an action outcome, the same action will be chosenby the plan critic for this process. The outcome <strong>in</strong> the explanation was previously


62 Chapter 4. The Weaver Algorithmnot shown to the conditional planner as a simplication, or otherwise it could nothave beenaaw explanation s<strong>in</strong>ce the candidate plan would have accounted for thisvalue. If the explanation names an exogenous event, the aected step is the earliestthat completes after or at the same time as the event. In this case each outcome ofthe action is made <strong>in</strong>to two outcomes, one represent<strong>in</strong>g the case when the event takesplace and the other represent<strong>in</strong>g the case when it does not.For example, if the plan critic works on the rst aw <strong>in</strong>Table 4.3, (operationalbarge1), there is only one possible explanation, that action Move-Barge has outcome. Both possible outcomes of this action are now shown to the conditional planner,which will produce the branch<strong>in</strong>g plan described <strong>in</strong> Section 4.1. If, on the other hand,the plan critic works on the second aw, (weather), the only possible explanation isthat event Weather-Darkens has value true. In this case the plan critic also altersthe Move-Barge action s<strong>in</strong>ce it completes at the same time as the event. The oneoutcome that is represented to the conditional planner is now replaced by two, onethe same as the previous outcome and one <strong>in</strong> which the weather changes from fair topoor <strong>in</strong> addition to the movement of the barge. The conditional planner will fail tosolve this problem, however, s<strong>in</strong>ce there is no solution <strong>in</strong> the new alternative outcomeof Move-Barge, so the plan critic will backtrack, try to prevent the aw as described<strong>in</strong> the next section and nally try to address another aw.Once the outcome to be added to an action <strong>in</strong> the plan is chosen, the plan criticforces the conditional planner to backtrack to the po<strong>in</strong>t where the action is added tothe tail plan, and repeat its earlier plann<strong>in</strong>g episode with the new action. A contextnode is added to the search space, as described <strong>in</strong> Section 4.1.1, that represents thedecision to add the extra outcomes to the step. The plan produced by the conditionalplanner has one or more extra branches at the po<strong>in</strong>t where the branch<strong>in</strong>g action isadded to the head plan. If the extra branch is unsatisable, the conditional plannermay choose to alter the order <strong>in</strong> which actions are applied or may choose to adddierent actions to the tail plan to achieve goals that are outstand<strong>in</strong>g when thealtered action was added to the tail plan, but it may not backtrack toany po<strong>in</strong>t <strong>in</strong>its search before the new action was added. This \ceil<strong>in</strong>g" <strong>in</strong> its search space ensuresthat the branch<strong>in</strong>g step is <strong>in</strong>cluded <strong>in</strong> the plan.4.3.3 Add<strong>in</strong>g preventive stepsIn some cases a aw explanation can be addressed without add<strong>in</strong>g a conditionalbranch, by add<strong>in</strong>g steps to the plan to reduce the probability of the aw explanationproduc<strong>in</strong>g undesired eects. The plan critic causes the planner to add these steps bychoos<strong>in</strong>g an action from the current plan and augment<strong>in</strong>g its preconditions. If theaw explanation is an action outcome, the plan critic selects a probability distributionfor the action that can lead to an undesired value, and adds the negation of thedistribution's path preconditions to the preconditions of the operator. It then causesthe conditional planner to backtrack to the po<strong>in</strong>t where the action was added to thetail plan, and repeat its earlier planner episode with the new action, just as when a


4.3. Incrementally improv<strong>in</strong>g a plan with the plan critic 63new conditional branch is added. The extra preconditions for the action, if satisable,will ensure that the eect probability distribution chosen by the critic cannot takeplace, and this <strong>in</strong> turn will reduce the probability of the plan aw, assum<strong>in</strong>g thatthe plan critic has made a good choice of eect distribution. The plan critic canbacktrack and try a new eect distribution for the chosen action and can also addthe negations of multiple eect distributions.If the chosen aw explanation is an external event, the plan critic selects an eectprobability distribution for the event that can lead to an undesired value <strong>in</strong> the chosenplan aw. The negation of the path preconditions of the eect distribution are addedto the preconditions of the action <strong>in</strong> the plan dur<strong>in</strong>g which the event beg<strong>in</strong>s. Theplan critic then causes the conditional planner to backtrack and nd a new plan <strong>in</strong>exactly the same way asforaaw explanation that is an action outcome.The plann<strong>in</strong>g doma<strong>in</strong> that has been used as an example for this chapter does notprovide any opportunities to add preventive steps. Consider the more complicatedversion of the Move-Barge operator shown <strong>in</strong> Figure 4.12, however, <strong>in</strong> which the undesirableoutcome that deletes (operational barge1) can only take place if (barge-readybarge1) is false. Suppose that (barge-ready barge1) is false <strong>in</strong> the <strong>in</strong>itial state. Thenafter produc<strong>in</strong>g the same <strong>in</strong>itial plan and the same aws, the plan critic could addressthe aw that (operational barge1) is false, with the explanation that Move-Bargehas outcome by add<strong>in</strong>g preconditions to Move-Barge to prevent this outcome fromhappen<strong>in</strong>g, <strong>in</strong> this case add<strong>in</strong>g (barge-ready barge1) to its preconditions.Move-Barge(preconds(Operator(( Barge)((Place)(and Place ( di ))))(at ))(effects ()(branch (barge-ready )((add (at ))(del (at )))((0.667 ()((add (at ))(del (at ))))(0.333 ()((add (at ))(del (at ))(del (operational ))))))))Figure 4.12: Aversion of the Move-Barge operator that depends on the literal(barge-ready ).


64 Chapter 4. The Weaver Algorithm4.4 DiscussionIn this chapter I have presented the basic algorithms used <strong>in</strong> Weaver to produce aplan that meets a given threshold probability of success. At a high level, Weaver alternatesbetween stages of plan creation and plan evaluation. A plan critic repeatedlycalls a conditional planner with an <strong>in</strong>creas<strong>in</strong>gly faithful representation of the doma<strong>in</strong>although it is <strong>in</strong>itially ignorant of all forms of uncerta<strong>in</strong>ty. The plan critic thereforereasons about what sources of uncerta<strong>in</strong>ty topay attention to dur<strong>in</strong>g plan creation,guided by knowledge of which ones have a signicant impact on the current plan'sprobability of success.This separation of responsibilities allows Weaver to create plans <strong>in</strong> doma<strong>in</strong>s withexogenous events that essentially add many alternative outcomes to every action <strong>in</strong>the doma<strong>in</strong>. Without the ability to ignore these alternatives until they are shown tobe important the plann<strong>in</strong>g task would be <strong>in</strong>tractable. The constra<strong>in</strong><strong>in</strong>g eect betweenthe modules is important <strong>in</strong> both directions, both from the critic on the conditionalplanner and from the planner on the plan evaluator. I illustrate this with two smallsynthetic doma<strong>in</strong>s, <strong>in</strong> each of which the <strong>in</strong>itial state is empty and the goal is theliteral (goal).In the rst doma<strong>in</strong> shown <strong>in</strong> Figure 4.13, the literal (sub-0) is true <strong>in</strong> the <strong>in</strong>itialstate and a sequence of at least n + 1 steps is required to solve the goal: Op-1, Op-2...Op-n, F<strong>in</strong>al. Each step of the form Op-i <strong>in</strong> the plan has a 0.5 probability of add<strong>in</strong>gthe literal (bad-luck), so the success probability of this plan is 1=2 n . Either of twoimprovements to the plan can raise the probability to 1. A conditional branch couldbe added after the step Op-n, test<strong>in</strong>g the literal (bad-luck) and us<strong>in</strong>g the nal stepAlt-F<strong>in</strong>al <strong>in</strong>stead of F<strong>in</strong>al if it is true. Alternatively the undesired outcomes of thenondterm<strong>in</strong>istic steps could be prevented by add<strong>in</strong>g the step Protect to the start ofthe plan.When Weaver solves this problem, the conditional planner <strong>in</strong>itially ignores thenondeterm<strong>in</strong>ism of the Op-i operators and produces the rst plan as shown. Thecritic will then choose one of these operators and choose whether to improve the planwith a conditional branch or a protection step. It is possible for the protection stepor conditional branch to be added <strong>in</strong> the middle of the sequence of Op-i operators, <strong>in</strong>which case several iterations of improvement may be required. By default, however,it will try to put a protection step as early as possible and a conditional branch aslate as possible.If the planner was aware of every source of nondeterm<strong>in</strong>ism <strong>in</strong> its rst call, itwould attempt to create a branch<strong>in</strong>g plan to account for each of the comb<strong>in</strong>ations ofoutcomes for the nondeterm<strong>in</strong>istic operators, produc<strong>in</strong>g a tree with 2 n leaves. Otherconditional planners can represent branch<strong>in</strong>g plans whose branches merge later, andwould be able to nd a conditional plan account<strong>in</strong>g for all the outcomes <strong>in</strong> time l<strong>in</strong>ear<strong>in</strong> n. However <strong>in</strong> the case of causal-l<strong>in</strong>k planners such as Buridan or Cassandra, theplan would require n improvements to be made, as each of the alternative outcomesmust be l<strong>in</strong>ked to the alternative step <strong>in</strong> the case where a branch is used, or have a


4.4. Discussion 65l<strong>in</strong>k from the protect<strong>in</strong>g step <strong>in</strong> the other case.(Operator F<strong>in</strong>al(preconds ()(and (sub-n)(not (bad-luck))))(effects () ((add (goal)))))(Operator Alt-F<strong>in</strong>al(preconds ()(and (sub-n)(bad-luck)))(effects () ((add (goal)))))Op-i(preconds () (sub-i , 1))(Operator(effects ()(branch (protected)((add (sub-i)))((0.5 () ((add (sub-i))))(0.5 () ((add (sub-i))(Operator Protect(preconds () (true))(add (bad-luck))))))))(effects () ((add (protected)))))Figure 4.13: Operators <strong>in</strong> the synthetic doma<strong>in</strong> to illustrate the utility of theconstra<strong>in</strong>ts on the planner from the critic. The <strong>in</strong>itial plan found by the conditionalplanner will have a success probability of1=2 n . Either one conditional branch or oneprotection step can <strong>in</strong>crease the probability to1.The essential details of this example, a sequence of operators that can with someprobability aect a literal that is required after the end of the sequence, would arisefrequently if exogenous events were handled by \compil<strong>in</strong>g" them <strong>in</strong>to the eectsof actions. This is a simple way to represent exogenous events by represent<strong>in</strong>g allthe potential eects of the events explicitly as possible outcomes of each operatordur<strong>in</strong>g which they can take place. For example, if the operators of the form Op-i<strong>in</strong> Figure 4.13 are replaced with determ<strong>in</strong>istic operators and the exogenous eventIt-Happens as shown <strong>in</strong> Figure 4.14, and the Op-i operators all have a duration of 1unit, the compiled operators are the same as before. In this case Weaver can makean extra sav<strong>in</strong>gs <strong>in</strong> time us<strong>in</strong>g techniques for compress<strong>in</strong>g the belief net described <strong>in</strong>the next chapter.The second synthetic doma<strong>in</strong>, shown <strong>in</strong> Figure 4.15 requires only two steps foran <strong>in</strong>itial plan to achieve (goal), but there are n <strong>in</strong>dependent exogenous events,each of which can take place while the rst step is taken. Without an abstractionbarrier, this would be represented with 2 n dierent outcomes for the action. Howeveronly m


66 Chapter 4. The Weaver Algorithmleads to signicantly more ecient plann<strong>in</strong>g. Weaver does this by us<strong>in</strong>g the beliefnet construction process to test which events are relevant, so this doma<strong>in</strong> providesan example of how the plan can constra<strong>in</strong> the size of the probabilistic model built forthe doma<strong>in</strong>.(Operator F<strong>in</strong>al(preconds ()(and (sub)(not (bad-luck))))(Operator Alt-F<strong>in</strong>al(preconds ()(and (sub)(bad-luck)))(effects () ((add (goal)))))(effects () ((add (goal)))))(Event Event-i(Operator Op(preconds () (true))(preconds () (true))(effects () ((add (e-i))(effects ()(add (bad-luck)))) ;; If i m((add (sub)))))(probability 0.5))Figure 4.15: Operators <strong>in</strong> the synthetic doma<strong>in</strong> to illustrate the utility of theconstra<strong>in</strong>ts on the evaluator and critic from the planner.The conditional planner builds the <strong>in</strong>itial plan Op, F<strong>in</strong>al. When this is evaluated,the m events Event-i, where 1 i m are identied as potentially aect<strong>in</strong>g the planand the belief net <strong>in</strong>cludes them. One is chosen and the critic adds an alternativeoutcome to the plan step Op, <strong>in</strong> which the literal (bad-luck) is added (as well as(e-j) for some j). The planner then produces the same conditional plan as <strong>in</strong> thelast doma<strong>in</strong>, which solves the problem. Only one alternative outcome is exam<strong>in</strong>ed bythe planner.In contrast, without the plan critic to restrict the outcomes considered by theconditional planner, it would have to exam<strong>in</strong>e eachof2 n possible outcomes for the rststep <strong>in</strong> the plan. This is true regardless of the plann<strong>in</strong>g method used unless it restrictsthe outcomes considered <strong>in</strong> some way. By ignor<strong>in</strong>g the rest of the external events,Weaver forms an abstraction over the outcomes. The two outcomes it considers areaggregates of many <strong>in</strong>divisible outcomes, that specify the outcome for each exogenousevent. Some conditional planners, such as C-Buridan, term<strong>in</strong>ate before plann<strong>in</strong>g foreach outcome even though they consider each <strong>in</strong>divisible outcome explicitly. Suchplanners will not have to consider all 2 n possible outcomes to Op, but <strong>in</strong> order to nd ahigh-probability plan these planners will still have to consider a signicant proportionof the <strong>in</strong>divisible outcomes where the <strong>in</strong>itial plan fails. There are 2 n,m (2 m , 1) ofthese, each with probability 1=2 n .Similarly to the previous doma<strong>in</strong>, although this is an articial doma<strong>in</strong> the samesituation can occur <strong>in</strong> real doma<strong>in</strong>s. It occurs when there is a large number of possibleexogenous events but only a small number aect the outcome of the candidate plan.In the oil-spill doma<strong>in</strong> discussed <strong>in</strong> Chapter 7, there are many exogenous eventsgovern<strong>in</strong>g the weather change <strong>in</strong> dierent sectors of the doma<strong>in</strong>, and other eventscould be added govern<strong>in</strong>g, for example, equipment failures. These events must bepruned dynamically, us<strong>in</strong>g the plan as it is created, because it is not possible to tellwhich exogenous events are relevant to the plan before a partial plan is developed.


4.4. Discussion 67In Chapter 7 I develop measures of the benet of the mutual constra<strong>in</strong>t betweenthe planner, plan evaluator and critic that consider both the number of steps dur<strong>in</strong>gwhich events can take place and the number of exogenous events that are not relevantto the current plan.In spite of these benets, the strategy of complete separation of plan evaluationand plan creation does not produce the best behaviour <strong>in</strong> most plann<strong>in</strong>g doma<strong>in</strong>s.In particular, faster convergence to a high-probability plan can be obta<strong>in</strong>ed if someestimations of the probability of plan completions are used to choose between dierentcandidate actions <strong>in</strong> a partial plan. In Section 6.1 I present some doma<strong>in</strong>-<strong>in</strong>dependentsearch heuristics that improve the planner by do<strong>in</strong>g local probability estimations.The basic Weaver algorithm also makes some approximations <strong>in</strong> the <strong>in</strong>termediatestates <strong>in</strong> branch<strong>in</strong>g plans. This is because it represents the branch to the conditionalplanner by merg<strong>in</strong>g a s<strong>in</strong>gle new outcome with some exist<strong>in</strong>g step (Section 4.3.2).This can produce an impossible state, however, if the action outcome or event be<strong>in</strong>gadded is only realisable as the end of a cha<strong>in</strong> of events or alternative action outcomes.In theory this will not cause the algorithm to make <strong>in</strong>correct probabilistic evaluationss<strong>in</strong>ce the new plan will be evaluated <strong>in</strong> a context where all the relevant sources ofuncerta<strong>in</strong>ty are considered. Nor will it cause a solution to be missed, s<strong>in</strong>ce if the<strong>in</strong>consistency aects the plan's probability of success it will show up as a directaw. Still, it can aect the rate of convergence to a plan that passes the thresholdprobability, which <strong>in</strong> cases with hard computational resource constra<strong>in</strong>ts this canreduce the best probability of success found.Weaver has three basic mechanisms for improv<strong>in</strong>g a plan's probability of success:standard backtrack<strong>in</strong>g, add<strong>in</strong>g preventive steps and add<strong>in</strong>g a conditional branch. Thequestion arises whether these are either necessary or adequate for nd<strong>in</strong>g plans thatpass a probability threshold. Backtrack<strong>in</strong>g is obviously needed, and <strong>in</strong> fact simpleexamples can show that both conditional branches and preventive steps, or someth<strong>in</strong>glike them, are needed to be able to nd good plans <strong>in</strong> every doma<strong>in</strong>.The ma<strong>in</strong> example used <strong>in</strong> this chapter solves a problem <strong>in</strong> which oil is to beremoved from a tanker with a conditional plan, mov<strong>in</strong>g barge1 and us<strong>in</strong>g it if it isoperational, otherwise us<strong>in</strong>g barge2. This example does not support the claim thatconditional branches are necessary, though, s<strong>in</strong>ce a possible plan is to move bothbarges and always attempt to pump oil <strong>in</strong>to each, one after the other. At least oneof the Pump-Oil steps will always fail, either because a barge is not operational orbecause the oil has already been pumped. However, the plan has the same probabilityof success as the orig<strong>in</strong>al branch<strong>in</strong>g plan.As an example of a problem where a branch<strong>in</strong>g plan is needed, suppose that apump must be deployed <strong>in</strong> a barge <strong>in</strong> order to pump oil <strong>in</strong>to it, and that the pump canonly be deployed once. If the problem is otherwise identical to the example problem,then a branch<strong>in</strong>g plan is required to achieve a probability of success above 0.5, s<strong>in</strong>cethe second barge could not be used after the pump had been deployed <strong>in</strong> the rstbarge. The modied operators for this example are shown <strong>in</strong> Figure 4.16. Althoughthis simple example is contrived, it follows a pattern that often occurs naturally: a


68 Chapter 4. The Weaver Algorithmconditional plan is necessary because limited resources must be used spar<strong>in</strong>gly.(operator Pump-Oil(preconds (( Barge)( Sea-Sector))(and (at )(effects ()(operational )(fair-weather)(pump-deployed )(oil-<strong>in</strong>-tanker )))((add (oil-<strong>in</strong>-barge ))(del (oil-<strong>in</strong>-tanker )))))(operator Deploy-Pump(preconds (( Barge))(~ (exists (( Barge))(pump-deployed ))))(effects ()((add (pump-deployed )))))Figure 4.16: A modied Pump-Oil operator and a new Deploy-Pump operatorthat make a conditional plan necessary to achieve a probability of success greaterthan 0.5 <strong>in</strong> the example problem.An example of a plann<strong>in</strong>g problem where prevention steps are necessary can befound by replac<strong>in</strong>g the Move-Barge operator with one that has dierent eect probabilitiesdepend<strong>in</strong>g on the literal schema (barge-ready ), as <strong>in</strong> Figure 3.10,and remov<strong>in</strong>g (barge-ready barge1) from the <strong>in</strong>itial state. Now the step (Make-Ready=barge1) must be added to the plan to achieve the required probability, andthis can be found by address<strong>in</strong>g the aw that the barge is not ready us<strong>in</strong>g prevention.An <strong>in</strong>formal argument can be made that these mechanisms, add<strong>in</strong>g a conditionalbranch, add<strong>in</strong>g preventive steps and backtrack<strong>in</strong>g, are together adequate for nd<strong>in</strong>gplans that can be represented as policies on the <strong>under</strong>ly<strong>in</strong>g mdp that do not haveloops. A version of Prodigy that can add preventive steps to avoid undesired conditionaleects <strong>in</strong> plans can be shown to be complete when there are no lisp functionsused <strong>in</strong> operator b<strong>in</strong>d<strong>in</strong>gs or control rules [Blythe & F<strong>in</strong>k 1997] 1 . Thus, suppos<strong>in</strong>gthat a plann<strong>in</strong>g problem admits a policy on the <strong>under</strong>ly<strong>in</strong>g mdp which has no loopsand reaches the threshold probability of success, a version of Prodigy could nd aplan that corresponds to at least one path from an <strong>in</strong>itial state to a goal node <strong>in</strong> .If this plan does not meet the threshold probability of success, there must be otherpaths <strong>in</strong> that are not present <strong>in</strong> the plan. The po<strong>in</strong>ts where follow<strong>in</strong>g the plan isless desirable than the action chosen by the policy will be found as aws <strong>in</strong> the planand a preferable path can be added with a branch<strong>in</strong>g plan. While the mechanismsdeveloped here can lead <strong>in</strong> pr<strong>in</strong>ciple to a complete planner for plans without loops,I do not argue that Weaver is complete. In real-world plann<strong>in</strong>g doma<strong>in</strong>s, b<strong>in</strong>d<strong>in</strong>gsfunctions and control rules are frequently used to control the search, and the abilityto call arbitrary functions makes the plann<strong>in</strong>g problem undecidable.1 This version of Prodigy also plans for subgoals that are true <strong>in</strong> the current state, which isnotdone <strong>in</strong> the version used <strong>in</strong> Weaver


Chapter 5Eciency improvements <strong>in</strong> planevaluationIn the previous chapter I <strong>in</strong>troduced the methods used for probabilistic evaluation ofWeaver's candidate plans. These plans are produced by a conditional planner that isignorant ofmany of the sources of uncerta<strong>in</strong>ty <strong>in</strong> its plann<strong>in</strong>g doma<strong>in</strong>. The task of theplan evaluator is then to identify the relevant sources of uncerta<strong>in</strong>ty and assess theirimpact on the plan. Its output is a belief net structure that encodes dependenciesamong steps <strong>in</strong> the plan and external events and which when evaluated computeslower boundaries on the probabilities of success for each step <strong>in</strong> the plan.Although there are methods for evaluat<strong>in</strong>g belief nets that can make ecient useof the structure that they show for the probability distribution, evaluat<strong>in</strong>g belief netsis NP-hard <strong>in</strong> the general case [Cooper 1990]. Evaluat<strong>in</strong>g the belief net can quicklybecome the bottleneck <strong>in</strong> the process described <strong>in</strong> Chapter 4. In this chapter I describesome techniques that can reduce the computation required to evaluate the plan. Ifocus on represent<strong>in</strong>g the plan with an accurate but smaller belief network, ratherthan seek<strong>in</strong>g algorithms to evaluate belief nets more eciently, an area which hasalready received considerable attention [Lauritzen & Spiegelhalter 1988; Pearl 1988;Draper & Hanks 1994; Darwiche 1995].In particular, I focus on remov<strong>in</strong>g the nodes <strong>in</strong> the belief net that correspondto <strong>in</strong>dividual occurrences of external events, replac<strong>in</strong>g them with l<strong>in</strong>ks that reectaggregates of several possible events. Consider the belief net shown <strong>in</strong> Figure 5.1,which captures the probability of success of the plan developed <strong>in</strong> Chapter 4 toillustrate Weaver. The shaded nodes represent <strong>in</strong>dividual occurrences of the externalevents Weather-Brightens and Weather-Darkens, account<strong>in</strong>g for possible changes<strong>in</strong> the weather between time 0 and time 4. These nodes represent <strong>in</strong>formation at aner level of detail than is required to evaluate the plan, s<strong>in</strong>ce the weather conditionsat times 1 and 3 are not important. Only the weather conditions at times 2 and 4eect the outcome of any actions <strong>in</strong> the plan.The belief net shown <strong>in</strong> Figure 5.2 does not have the redundant nodes, and <strong>in</strong>steadcomputes the weather change over the <strong>in</strong>tervals of <strong>in</strong>terest us<strong>in</strong>g a Markov cha<strong>in</strong>. Al-69


70 Chapter 5. Eciency improvements <strong>in</strong> plan evaluationthough <strong>in</strong> this example avoid<strong>in</strong>g the nodes correspond<strong>in</strong>g to external events does notsignicantly reduce the size of the belief net, notice that the number of these nodes<strong>in</strong>creases l<strong>in</strong>early with the length of the time <strong>in</strong>terval over which the weather canchange. The time to compute the probability of success also <strong>in</strong>creases l<strong>in</strong>early whenevent nodes are <strong>in</strong>cluded, but only logarithmically us<strong>in</strong>g a Markov cha<strong>in</strong> representationas shown below.The rest of this chapter describes how a suitable Markovcha<strong>in</strong> can be constructed,discusses the benets and pitfalls of the approach and provides further examples.(oil)(operational barge1)(:action)(location barge1)move−bargepump−oilf<strong>in</strong>ish(weather)(weather darkens)(weather brightens)(oil)0 12 3(operational barge1)(:action)(location barge1)move−bargemove−bargepump−oilf<strong>in</strong>ish(weather)(weather darkens)(weather brightens)(location barge2)(operational barge2)0 1 2 34 5Figure 5.1: Belief nets from Chapter 4, <strong>in</strong> which shaded nodes represent specicoccurrences of external events.5.1 The Markovcha<strong>in</strong> representation for externaleventsConsider the evolution of the random variable (weather) over time <strong>in</strong> the graph ofFigure 5.1. The conditional probability of the two events aect<strong>in</strong>g the weather isthe same at each time step, so the dierent probability distributions for the variablecorrespond to four steps of the Markov cha<strong>in</strong> shown to the right <strong>in</strong> Figure 5.3. The


5.1. The Markov cha<strong>in</strong> representation for external events 71(oil)(operational barge1)(:action)(location barge1)move−bargepump−oilf<strong>in</strong>ish(weather)(oil)0 2 3(operational barge1)(:action)(location barge1)move−bargemove−bargepump−oilf<strong>in</strong>ish(weather)(location barge2)(operational barge2)02 4 5Figure 5.2: A belief net represent<strong>in</strong>g the same plan but with Markov cha<strong>in</strong>s usedto compute changes <strong>in</strong> the weather over the required time <strong>in</strong>tervals rather than aseries of event nodes.Markov cha<strong>in</strong> is described by a transition matrix M whose value <strong>in</strong> row i and columnj is the conditional probability P (w(1) = jjw(0) = i), where w(n) is the value of theweather at time n. This representation can provide a more ecient way to computethe probability distribution for the weather at time N given the distribution at time0, where N is large, s<strong>in</strong>ce this is given by apply<strong>in</strong>g the transition matrix M N to the<strong>in</strong>itial distribution, and this transition matrix can be found <strong>in</strong> time logarithmic <strong>in</strong>N. The simple technique of add<strong>in</strong>g event nodes directly to the belief net described <strong>in</strong>Section 4.8 takes time at least l<strong>in</strong>ear <strong>in</strong> N, both for construct<strong>in</strong>g and evaluat<strong>in</strong>g thebelief net.In general the evolution of any state variable while an action is be<strong>in</strong>g executed canbe described by a Markov cha<strong>in</strong>. This is because the plann<strong>in</strong>g language itself can bedescribed as a Markov decision process, which reduces to a Markov cha<strong>in</strong> when thereare no action choices. However, it would be too expensive to use the <strong>under</strong>ly<strong>in</strong>g MDPas the Markov cha<strong>in</strong> to compute probability distributions of state variables because ofits size. <strong>Plann<strong>in</strong>g</strong> problems <strong>in</strong> the oil spill doma<strong>in</strong> described <strong>in</strong> Chapter 7 frequentlyhave literal state spaces conta<strong>in</strong><strong>in</strong>g more than 2 1000 states, for example. So <strong>in</strong> orderto take advantage of the Markov cha<strong>in</strong> representation, a small Markov cha<strong>in</strong> must befound to compute the values of <strong>in</strong>terest.In the case of the state variable describ<strong>in</strong>g the weather, the Markov cha<strong>in</strong> whosestates are just the values of this variable is adequate to compute the right probabilities,and the transition matrix M has only 4 entries. In general one must consider the


72 Chapter 5. Eciency improvements <strong>in</strong> plan evaluation(weather brightens)(weather)(weather darkens)0.75poor-weather0.25 0.25fair-weather1 2 3 40.75Figure 5.3: The nodes govern<strong>in</strong>g weather from the plan's belief net follow theMarkov cha<strong>in</strong> on the right.set of state variables whose values need to be known at a particular time <strong>in</strong> orderto see how to build an adequate Markov cha<strong>in</strong>, and this cha<strong>in</strong> may reference statevariables other than those of <strong>in</strong>itial <strong>in</strong>terest. To see this, consider add<strong>in</strong>g the eventOil-Spills shown <strong>in</strong> Figure 5.4 to the plann<strong>in</strong>g doma<strong>in</strong> and problem described <strong>in</strong>Chapter 4. This event requires the weather to be poor and the oil to be <strong>in</strong> thetanker. If this is true, with probability 0.1 the oil will be spilled <strong>in</strong> the sea <strong>in</strong> the nextstate, represented with the new predicate (oil-<strong>in</strong>-sea west-coast), and will nolonger be <strong>in</strong> the tanker. If this event takes place, it will prevent the Pump-Oil action<strong>in</strong> either branch of the plan from work<strong>in</strong>g s<strong>in</strong>ce the action's preconditions <strong>in</strong>clude(oil-<strong>in</strong>-tanker west-coast).(event oil-spills(params )(probability 0.1)(preconds(( Sea-sector))(and (oil-<strong>in</strong>-tanker )(poor-weather)))(effects ()((del (oil-<strong>in</strong>-tanker ))(add (oil-<strong>in</strong>-sea )))))Figure 5.4: The event Oil-Spills aects the location of the oil and is partlydeterm<strong>in</strong>ed by the weather.S<strong>in</strong>ce the event depends on the weather, changes <strong>in</strong> the weather must be taken<strong>in</strong>to account to compute the probability distribution for the state variable (oil).Otherwise the probability that the oil is spilled <strong>in</strong>to the sea might be wrongly computedas 0, s<strong>in</strong>ce <strong>in</strong> the <strong>in</strong>itial state (fair-weather) is true with probability 1.In fact, s<strong>in</strong>ce the Pump-Oil action depends on the conjunction (fair-weather) ^(oil-<strong>in</strong>-tanker west-coast), the two state variables must be computed together


5.2. Us<strong>in</strong>g an event graph to guarantee a correct Markov cha<strong>in</strong> 73<strong>in</strong> a s<strong>in</strong>gle Markov cha<strong>in</strong>. If two separate cha<strong>in</strong>s were used, even if the one used tocompute the probability distribution for (oil) correctly modelled the weather, theprobability of the conjunction would be wrong s<strong>in</strong>ce it would eectively model the(oil) and (weather) state variables as <strong>in</strong>dependently evolv<strong>in</strong>g. The next sectionshows how totake a conjunction of literals whose probability distribution is neededat time N and automatically build a set of small but adequate Markov cha<strong>in</strong>s tocompute this.5.2 Us<strong>in</strong>g an event graph to guarantee a correctMarkov cha<strong>in</strong>Given a set of literals of <strong>in</strong>terest, we would like to build a submodel that is smallenough to answer queries about these literals eciently, but <strong>in</strong>cludes all the literalsnecessary to ensure the result is correct. Event graphs enable us to do this bycaptur<strong>in</strong>g the dependencies between the events <strong>in</strong> the doma<strong>in</strong>.The event graph for a set of event schemas E is a directed graph whose nodes areeither event schemas or literal schemas <strong>in</strong> the doma<strong>in</strong>. Event and literal schemas aredist<strong>in</strong>guished from events and literals <strong>in</strong> that they have variables rather than doma<strong>in</strong>objects as argument. There is an arc from each event schema e 2 E to every literalschema <strong>in</strong> its eects, and from every literal schema <strong>in</strong> any of the branches of e toe (so the graph may have cycles) Variables are unied when two dierent schemasmention the same literal with match<strong>in</strong>g variable types. Figure 5.5 shows the eventgraph for the example doma<strong>in</strong>. A simple algorithm that adds arcs for each eventschema <strong>in</strong> turn can produce the event graph <strong>in</strong> time l<strong>in</strong>ear <strong>in</strong> the number of eventschemas, the maximum number of branches <strong>in</strong> an event schema and the maximumnumber of eects <strong>in</strong> an eect distribution.(oil-<strong>in</strong>-tanker )poor-weatherOil-SpillsWeather-BrightensWeather-Darkens(oil-<strong>in</strong>-sea )fair-weatherFigure 5.5: The event graph for the example doma<strong>in</strong>, with boxes for eventschemasand ovals for literal schemas.Given a query, such asoil = tanker ^ weather = fair, that conta<strong>in</strong>s a set of literals,V , a set of Markov cha<strong>in</strong>s is created <strong>in</strong> two steps. First, create the subgraph of the


74 Chapter 5. Eciency improvements <strong>in</strong> plan evaluation0.75(oil) = tanker(weather) = fair0.25 0.2250.0250.75(oil) = west-coast(weather) = fair0.25 0.25(oil) = tanker(weather) = poor0.6750.075(oil) = west-coast(weather) = poor0.75Figure 5.6: The Markov model built to model the evolution of the state variable(oil), <strong>in</strong>stantiated by the literal (oil-<strong>in</strong>-tanker west-coast).event graph, restricted to the nodes <strong>in</strong> V and all their ancestors <strong>in</strong> the graph. Second,create one Markov cha<strong>in</strong> for each component of the result<strong>in</strong>g graph. The state spacefor each cha<strong>in</strong> is the cross product of the literals that appear <strong>in</strong> the correspond<strong>in</strong>gcomponent, and the transitions are formed from the events <strong>in</strong> the component <strong>in</strong>thesame way as the full-size Markovcha<strong>in</strong> is created from all the events, described above.The probability of the query over literals <strong>in</strong> V after n stages can then be computed bymultiply<strong>in</strong>g together the probability from each cha<strong>in</strong> of the projection of V on thatcha<strong>in</strong> after n stages. I refer to this set of Markov cha<strong>in</strong>s as the submodel of the fullmodel <strong>in</strong>duced byV.For example, the reduced model <strong>in</strong> Figure 5.6 is the submodelof the full model for the doma<strong>in</strong> <strong>in</strong>duced by f(oil)g.Intuitively, no literal that is not conta<strong>in</strong>ed <strong>in</strong> the subgraph of the event graphcreated for V can aect the way any literal <strong>in</strong> V changes over time, s<strong>in</strong>ce if it couldaect anyevent that could aect any of the literals <strong>in</strong> V , either directly or <strong>in</strong>directly,itwould be conta<strong>in</strong>ed <strong>in</strong> the subgraph by its denition. Similarly the literals <strong>in</strong> dierentcomponents of the subgraph change <strong>in</strong>dependently of each other, s<strong>in</strong>ce the value ofone can never aect the way the value of the other changes. This <strong>in</strong>tuition <strong>under</strong>liesthe proof of the follow<strong>in</strong>g theorem. A sketch of the proof is given <strong>in</strong> Appendix A.Theorem: Let Q be a logical expression mention<strong>in</strong>g a set of literals V L.Comput<strong>in</strong>g the conditional probability of Q after some xed number of stages n givenan <strong>in</strong>itial state probability distribution over the full state space will yield the samevalue <strong>in</strong> the submodel <strong>in</strong>duced byV as <strong>in</strong> the full model of the doma<strong>in</strong>.Us<strong>in</strong>g this theorem we can compute the answers to queries about small subsets ofthe doma<strong>in</strong> variables without consult<strong>in</strong>g the full model of the doma<strong>in</strong>. In fact, thefull model is typically never constructed.


5.3. Us<strong>in</strong>g the event graph <strong>in</strong> Weaver 755.3 Us<strong>in</strong>g the event graph <strong>in</strong> WeaverThe event graph is used to create smaller belief nets represent<strong>in</strong>g plans <strong>in</strong> Weaver. Theplans themselves provide the queries that are used with the event graph to choosethe Markov cha<strong>in</strong>s that provide <strong>in</strong>put to the belief net. The lack of specic eventnodes <strong>in</strong> the result<strong>in</strong>g net means that the aw set found by the plan critic is slightlydierent. This section describes the changes to the Weaver plan evaluator and plancritic that are necessary to use belief nets built with the event graph.First, the algorithm to create the belief net to evaluate a plan is altered to makeuse of the Markov cha<strong>in</strong>s that are created from the event graph. Stage 2 of thealgorithm from Table 4.8 described <strong>in</strong> Section 4.2 is replaced by stages 2 and 3 asshown <strong>in</strong> Table 5.1. A new node is created for each group of nodes that share aMarkov cha<strong>in</strong>, that represents the conjunction of the group of nodes. The persistence<strong>in</strong>terval l<strong>in</strong>ks to this new node and it has an arc to each node <strong>in</strong> the group specify<strong>in</strong>gits value determ<strong>in</strong>istically.This new conjunction node could be avoided by cluster<strong>in</strong>g the nodes <strong>in</strong> the groupdirectly, but us<strong>in</strong>g it preserves the condition that each precondition node for anaction represents a s<strong>in</strong>gle state variable. S<strong>in</strong>ce the Markov cha<strong>in</strong> is built <strong>under</strong> theassumption that no actions that aect the variables take place dur<strong>in</strong>g the time <strong>in</strong>tervalof the cha<strong>in</strong>, the algorithm splits a persistence <strong>in</strong>terval at each po<strong>in</strong>t where an actionaects any of the variables <strong>in</strong>volved. The eect of the action is a direct <strong>in</strong>tervention<strong>in</strong>to the natural evolution of the variables represented by the Markov cha<strong>in</strong>, result<strong>in</strong>g<strong>in</strong> a model similar to that of Pearl <strong>in</strong> [Pearl 1994].Figure 5.7 shows the belief net that is created by this algorithm for the rst branchof the example plan along with some of the marg<strong>in</strong>al probabilities. The node shown<strong>in</strong> gray is added by the algorithm to allow the Markov cha<strong>in</strong> to be expressed as aconditional probability table <strong>in</strong> the belief net.Once the belief net has been created, Weaver's plan critic analyses it for possibleaws which are used to nd ways to improve the plan. The basic algorithm to ndaws, described <strong>in</strong> Section 4.3, exam<strong>in</strong>es each uent node that is a parent of an actionnode and if the uent node has probability less than 1 of hav<strong>in</strong>g a desired value itlooks for either an action or event node as an explanation for a subset of the undesiredvalues. S<strong>in</strong>ce the belief nets created us<strong>in</strong>g the event graph do not have event nodes,the set of explanations is modied.Rather than look for an event node parent, the modied algorithm tests if theuent node has a conditional distribution that has been created us<strong>in</strong>g a Markovcha<strong>in</strong>.If this is the case, the cha<strong>in</strong> is analysed for exogenous events that are responsible forthe undesired values. For each member of the set of desired values, the transitions <strong>in</strong>the Markov cha<strong>in</strong> that move from a state with the desired value to a state with anundesired value are checked and the events responsible for the change are returnedas possible explanations for the aw.For example, when the uent node for (weather) at time 2 is analysed <strong>in</strong> the belief


76 Chapter 5. Eciency improvements <strong>in</strong> plan evaluationT =0,S=0Stage 1: For each action A <strong>in</strong> the planCreate a node N A to represent A at time T .For each precondition P of AF<strong>in</strong>d or create a node N P for P at time T .L<strong>in</strong>k N P to N A .T = T + duration(A)For each eect E of AF<strong>in</strong>d or create a node N P for P at time T .L<strong>in</strong>k N P to N A .Stage 2: For each uent nodeN<strong>in</strong> the belief netIf N is not the eect of an action,l<strong>in</strong>k it to the most recent previous node of the same type.Stage 3: For each time po<strong>in</strong>t T with uent nodes <strong>in</strong> the planLet F be the set of uent nodes with this time po<strong>in</strong>tLet S be the subsets of F grouped <strong>in</strong>to the same Markov cha<strong>in</strong>s us<strong>in</strong>g the event graphFor each subset G of SLet T 0be the time of the most latest uent node before T that matches a type <strong>in</strong> GConstruct nodes for all types <strong>in</strong> G at time T 0 , mak<strong>in</strong>g a set G 0If jGj > 1, add a new node N G at time T with an arc to each node<strong>in</strong>GThe values of N G are the cross product of those of Gthe arcs from N G implement projection.Otherwise set N G to the s<strong>in</strong>gle node <strong>in</strong> G.Add an arc from each node<strong>in</strong>G 0to N G , with probability table given by the Markov cha<strong>in</strong>.If new nodes were added, go to Stage 3.Table 5.1: Algorithm to construct the belief net to evaluate a plan.net <strong>in</strong> Figure 5.7, the Markov model <strong>in</strong> Figure 5.6 is checked for events that lead totransitions from states with (weather) = fair to states with (weather) = poor. Theevent Weather-Darkens is then considered as a possible explanation for the aw.S<strong>in</strong>ce the explanation for the aw does not mention a specic event node, theplan critic chooses an action from the entire <strong>in</strong>terval of the Markov cha<strong>in</strong> to adjust,either to prevent the event or to add a conditional branch. This choice is equivalentto choos<strong>in</strong>g the dierent event nodes that span the <strong>in</strong>terval <strong>in</strong> the belief net createdwithout us<strong>in</strong>g Markov cha<strong>in</strong>s. In this example, there is only one action to choose tocreate a conditional branch, the Move-Barge action that was also used <strong>in</strong> the earlierversion. By default, the plan critic adds a conditional branch as late as possible,subject to the condition that backtrack<strong>in</strong>g <strong>in</strong> Prodigy's search space should undo asfew other context and protection nodes as possible.


5.3. Us<strong>in</strong>g the event graph <strong>in</strong> Weaver 77(operational barge1)True: 0.667True: 1 False: 0.333(oil)Tanker: 0.975 Tanker: 0.412Tanker: 1 Sea: 0.025 Barge: 0.588(weather)Fair: 0.625Fair: 1 Poor: 0.375(action)Move-Bargeαβ: 0.667: 0.333Pump-OilTrue: 0.412False: 0.588F<strong>in</strong>ishTrue: 0.412False: 0.588(location barge1)Richmond: 1 west-coast: 10123Figure 5.7: The belief net created to evaluate the rst branch of the example planwith exogenous events aect<strong>in</strong>g both the weather and the oil modelled with a Markovcha<strong>in</strong>.


78 Chapter 5. Eciency improvements <strong>in</strong> plan evaluation


Chapter 6Eciency improvements <strong>in</strong>plann<strong>in</strong>g6.1 Doma<strong>in</strong>-<strong>in</strong>dependent search heuristicsIn any search-based plann<strong>in</strong>g system, heuristics to control search are extremely importantif the planner is to be ecient. Prodigy <strong>in</strong>cludes a rich language for specify<strong>in</strong>gsearch control knowledge <strong>in</strong> the form of explicit, doma<strong>in</strong>-dependent controlrules [M<strong>in</strong>ton et al. 1989; Veloso et al. 1995]. Doma<strong>in</strong>-<strong>in</strong>dependent search heuristicsare also vital <strong>in</strong> generic plann<strong>in</strong>g systems where complete doma<strong>in</strong>-specic controlknowledge is not always available [Stone, Veloso, & Blythe 1994].In this chapter I <strong>in</strong>troduce some pr<strong>in</strong>ciples for den<strong>in</strong>g doma<strong>in</strong>-<strong>in</strong>dependent searchheuristics that are specic to the problem of plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty. The \footpr<strong>in</strong>t"pr<strong>in</strong>ciple leads to a family of heuristics for probabilistic planners producedby attempt<strong>in</strong>g to make subsequent renements to a plan apply to a set of possibleexecution traces that is disjo<strong>in</strong>t from the set for which the plan be<strong>in</strong>g rened is alreadysuccessful. S<strong>in</strong>ce Weaver's architecture separates the phases of plan creationand plan evaluation, other heuristics can be derived from <strong>in</strong>tegrat<strong>in</strong>g the two moreclosely by do<strong>in</strong>g small amounts of probabilistic reason<strong>in</strong>g dur<strong>in</strong>g plan search. Therest of this chapter describes these approaches and illustrates them with syntheticdoma<strong>in</strong>s that help del<strong>in</strong>eate the conditions <strong>under</strong> which they work well. Experiments<strong>in</strong> a real-world doma<strong>in</strong> are described <strong>in</strong> Chapter 7.6.1.1 Local estimations of probabilitiesOne source of heuristics for probabilistic planners is to perform local estimationsof probabilities, used <strong>in</strong> greedy maximum-gradient heuristics. For example one canchoose the operator for a goal that achieves it with the highest conditional probability,given that its trigger is satised. This heuristic is an extremely cheap estimate of theprobability that the operator will satisfy its goal as part of the plan, requir<strong>in</strong>g just79


80 Chapter 6. Eciency improvements <strong>in</strong> plann<strong>in</strong>g<strong>in</strong>spection of the operator denition. A more expensive heuristic that uses gradientascenttochoose between <strong>in</strong>stantiated operators <strong>in</strong>cludes an estimate of the probabilitythat the preconditions can be satised, for example estimat<strong>in</strong>g their probability <strong>in</strong>the set of possible current states. This can be viewed as a probabilistic version ofthe m<strong>in</strong>imum-unsolved-preconditions heuristic used <strong>in</strong> Prodigy [Blythe & Veloso 1992]although a more faithful version would m<strong>in</strong>imize the expected number of unsolvedpreconditions. These heuristics all make estimates us<strong>in</strong>g the current state, whicharises from apply<strong>in</strong>g the operators whose order <strong>in</strong> the plan prex is already chosen.In the oil-spill doma<strong>in</strong>, several control rules can be seen as compiled versions ofthe heuristic to maximise the probability of the step's preconditions <strong>in</strong> the currentstate. These control rules prefer pieces of equipment with higher tolerance to weatherconditions. S<strong>in</strong>ce the ma<strong>in</strong> sources of uncerta<strong>in</strong>ty <strong>in</strong> the oil-spill doma<strong>in</strong> are exogenousevents that cause the situation to deteriorate over time, control rules that preferoperators that take less time, for <strong>in</strong>stance us<strong>in</strong>g closer equipment or faster transport,also maximise the estimated probability of success, but this is based on predictionsabout how the world will change rather than on the current state. The eect of theseheuristics is explored <strong>in</strong> Section 7.5.6.1.2 The footpr<strong>in</strong>t pr<strong>in</strong>cipleProbabilistic planners such asWeaver, Buridan [Kushmerick, Hanks, & Weld 1995]and drips [Haddawy&Suwandi 1994] build plans by <strong>in</strong>crementally ren<strong>in</strong>g somecandidate plan to <strong>in</strong>crease its probability of success, stopp<strong>in</strong>g when the thresholdprobability is reached and backtrack<strong>in</strong>g if no extensions achieve the desired probability.This <strong>in</strong>cremental renement might for example be the addition of a newconditional branch of the plan, or some extra steps to improve the probability ofsome subgoal <strong>in</strong> the plan. The heuristics based on the footpr<strong>in</strong>t pr<strong>in</strong>ciple aim to<strong>in</strong>crease the rate of convergence to a good plan, by choos<strong>in</strong>g improvements to anexist<strong>in</strong>g plan that are more likely to signicantly <strong>in</strong>crease the probability of success.This is similar to the maximum expected utility approach of Russell and Wefald [91] .While the exact heuristics may dier between dierent planners, the <strong>under</strong>ly<strong>in</strong>g pr<strong>in</strong>ciplesare the same. I beg<strong>in</strong> with two observations about the process of improv<strong>in</strong>gthe probability of success of plans <strong>in</strong> uncerta<strong>in</strong> doma<strong>in</strong>s.CoherenceFirstly, while plans <strong>in</strong> strips-style doma<strong>in</strong>s are created for <strong>in</strong>dividual <strong>in</strong>itial states,plans <strong>in</strong> probabilistic doma<strong>in</strong>s are created for a set of states, those with sucientlyhigh probability <strong>in</strong> the <strong>in</strong>itial distribution to <strong>in</strong>uence the probability of plan success.I dene a trace of execution of a plan <strong>in</strong> a stochastic doma<strong>in</strong> to be a completeassignment for each of the choice po<strong>in</strong>ts dur<strong>in</strong>g the plan's execution: the <strong>in</strong>itial stateis specied and so is each action outcome and potential occurrence of an exogenousevent. In Weaver, a trace corresponds to a path through the <strong>under</strong>ly<strong>in</strong>g mdp that


6.1. Doma<strong>in</strong>-<strong>in</strong>dependent search heuristics 81can be realised when the plan is executed. The probability of success of a plan isequal to the probability mass of the set of traces of the plan that lead to a goal state.A simple strategy for a planner may be to try to rene a plan to cover an <strong>in</strong>dividualtrace that has high probability and is not currently covered by the plan. Pick<strong>in</strong>g themost likely <strong>in</strong>itial state or the most likely action outcome that lead to plan failure towork on are examples of this strategy.However it may be that a number of dist<strong>in</strong>ct <strong>in</strong>itial states exist that each havelowprobability but when comb<strong>in</strong>ed form a set with higher probability that also sharesfeatures allow<strong>in</strong>g a s<strong>in</strong>gle plan or plan renement to solve them all. I refer to such aset as a coherent set of traces with respect to the plan. It can be a valuable strategyfor a probabilistic planner to expend computation look<strong>in</strong>g for a coherent set of traceswith a high probability mass, and this I refer to as the \coherence" strategy.As an example, consider the family of doma<strong>in</strong>s \Mop(n,)" parameterised by an<strong>in</strong>teger n and some small number whose operators are <strong>in</strong> Figure 6.1. The <strong>in</strong>itial statedistribution <strong>in</strong> an <strong>in</strong>stance of Mop(n, ) consists of n possible states, S i ,1<strong>in</strong>,where (S i ) is the sole true fact <strong>in</strong> each state S i . State S i has a probability of1=2 i if1 i n , 1 and state S n has a probability of1=2 n,1 , br<strong>in</strong>g<strong>in</strong>g the total to 1. Thegoal is always the literal (goal). There are n operators S i which all add (goal) withcerta<strong>in</strong>ty if their preconditions are met. Each S i has the s<strong>in</strong>gle precondition (S i ).Inaddition the s<strong>in</strong>gle operator Mop achieves (goal) <strong>in</strong> any state with probability 1,,and with probability has no eect.(Operator S i(Operator Mop(preconds () (S i )) (preconds () (true))(effects ()(effects ()((add (goal))))) (1 , () (add (goal)))( () nil)))Figure 6.1: Operators <strong>in</strong> the parameterised doma<strong>in</strong> \Mop(n,)".operators S i for 1 i nThere are nConsider nd<strong>in</strong>g a plan to achieve (goal) with probability above some threshold. If


82 Chapter 6. Eciency improvements <strong>in</strong> plann<strong>in</strong>gO1O2O30.5 0.5q~qp ~pg, q, ~p ~g, ~q, p110.50.5 1αβggγδFigure 6.2: The operators for an example doma<strong>in</strong>.states is coherent with respect to the Mop action, but not with respect to any oftheactions S i . A practical way to estimate this probabilityistomake use of the belief netcreated to evaluate plan success. The marg<strong>in</strong>al probabilities of the action preconditionscan be accessed and comb<strong>in</strong>ed <strong>under</strong> the heuristic assumption of <strong>in</strong>dependence.Mop also leads to shorter plans when >1,, as long as


6.1. Doma<strong>in</strong>-<strong>in</strong>dependent search heuristics 83Turn-ignitionPoke-<strong>under</strong>-hoodInitialGoal~eng<strong>in</strong>e-wreckedeng<strong>in</strong>e-ok0.8started0.8eng<strong>in</strong>e-ok0.2~eng<strong>in</strong>e-okeng<strong>in</strong>e-wreckedeng<strong>in</strong>e-ok~eng<strong>in</strong>e-wrecked~started0.9 0.1~eng<strong>in</strong>e-ok~eng<strong>in</strong>e-wrecked~startedstarted1.0successFigure 6.3: Operators <strong>in</strong> the car-start<strong>in</strong>g doma<strong>in</strong>fails. In the general case, comparisons of operators based on their \average" abilityto achieve goals tell us noth<strong>in</strong>g about their usefulness for ren<strong>in</strong>g plans.This simple observation about <strong>in</strong>dependence becomes important when we considerheuristics that estimate the <strong>in</strong>crease <strong>in</strong> probability of success provided by ren<strong>in</strong>g aplan. I dene the footpr<strong>in</strong>t of a plan as the set of traces <strong>in</strong> which it succeeds. Notethat when we rene a plan by add<strong>in</strong>g a step, each trace of the new plan is an extensionof some trace <strong>in</strong> the old plan, so the traces <strong>in</strong> the footpr<strong>in</strong>t of the new plan can beclassied <strong>in</strong>to two categories: those that extend traces <strong>in</strong> the footpr<strong>in</strong>t of the orig<strong>in</strong>alplan and those that extend other traces. The new traces that extend traces <strong>in</strong> theorig<strong>in</strong>al footpr<strong>in</strong>t form the <strong>in</strong>tersection of the renement's footpr<strong>in</strong>t with that of theorig<strong>in</strong>al plan, and do not improve the probability of success.Some heuristics estimate the goodness of the renement without regard to theexist<strong>in</strong>g plan, such as not<strong>in</strong>g the maximum or average probability that some step willadd a desired subgoal. Such heuristics are likely to over-estimate the ecacy of therenement when its footpr<strong>in</strong>t has a signicant <strong>in</strong>tersection with the footpr<strong>in</strong>t oftheorig<strong>in</strong>al plan, <strong>in</strong> terms of its probability mass. This was the case for O 2 | the footpr<strong>in</strong>tof [O 1 ; O 2 ] is completely conta<strong>in</strong>ed with<strong>in</strong> the footpr<strong>in</strong>t of[O 1 ]. If <strong>in</strong>stead relativelycheap heuristics can be designed that take some of the overlap <strong>in</strong>to account, theycan be considerably more powerful than those that assume <strong>in</strong>dependence. I describeheuristics built on this pr<strong>in</strong>ciple as belong<strong>in</strong>g to the \footpr<strong>in</strong>t" family. Comb<strong>in</strong><strong>in</strong>gthe two observations <strong>in</strong> this section, by discount<strong>in</strong>g the overlap between old and newparts of the plan these heuristics can be viewed as seek<strong>in</strong>g renements to the planthat have a footpr<strong>in</strong>t with high probability mass that is disjo<strong>in</strong>t from the footpr<strong>in</strong>tof the orig<strong>in</strong>al plan. Alternatively, these renements can be viewed as hav<strong>in</strong>g a highprobabilityfootpr<strong>in</strong>t, conditional on the orig<strong>in</strong>al plan fail<strong>in</strong>g.Case study: Start<strong>in</strong>g a carWeaver has several choice po<strong>in</strong>ts <strong>in</strong> its search, <strong>in</strong>clud<strong>in</strong>g which of several potentialfailure pairs to address, whether to plan to avoid the failure or address it after itsoccurrence, which extra preconditions to add if avoid<strong>in</strong>g the failure and which stateto pick if plann<strong>in</strong>g after a failure's occurrence. By default, Weaver deals with failuresthat occur earlier before those that occur later. It prefers to plan to avoid a failurerather than to branch <strong>in</strong> case of failure. If it branches, it creates a plann<strong>in</strong>g <strong>in</strong>itialstate for Prodigy by assum<strong>in</strong>g the undesirable action outcome is the only one tooccur.


84 Chapter 6. Eciency improvements <strong>in</strong> plann<strong>in</strong>gI illustrate a heuristic based on the footpr<strong>in</strong>t pr<strong>in</strong>ciple on the \car-start<strong>in</strong>g doma<strong>in</strong>".The operators for this doma<strong>in</strong> are shown <strong>in</strong> gure 6.3, and the goal is toachieve started with probability at least 0.95. In addition, the literal eng<strong>in</strong>e-ok is notobservable, mean<strong>in</strong>g that a branch<strong>in</strong>g plan based on this test is not allowed.There is one plan of length 4 that achieves the goal with the required probability:turn the ignition twice, and if the car has not started, poke <strong>under</strong> the hood and thenturn the ignition once more. If the ignition key is turned only once before the hood isopened, the problem cannot be solved because with too high a probability the car iswrecked 1 ,soWeaver's default heuristic will not work. If the agent never pokes <strong>under</strong>the hood the threshold probability cannot be reached.I focus on the choice po<strong>in</strong>t of the plann<strong>in</strong>g <strong>in</strong>itial state that Weaver passes toProdigy when creat<strong>in</strong>g a branch<strong>in</strong>g plan, and derive a heuristic from the footpr<strong>in</strong>tpr<strong>in</strong>ciple . Recall that <strong>in</strong> this case, Weaver has chosen a literal l that has a lowprobability of tak<strong>in</strong>g a value required for the current plan, and an action that canlead to an undesired value for l. The default heuristic creates the state that wouldbe produced if the operators <strong>in</strong> the plan gave the outcome nom<strong>in</strong>ated by Prodigy upto the suspect action, which then takes the most likely outcome that produces anundesired value for l.I takeaBayesian view of the footpr<strong>in</strong>t pr<strong>in</strong>ciple described earlier <strong>in</strong> this section andreplace the default heuristic by one that attempts to create a likely state given thatthe orig<strong>in</strong>al plan fails <strong>in</strong> a way predicted by l and a. The <strong>in</strong>itial state distribution P Iis taken as a prior distribution, and the posterior, given that a produces an undesiredvalue for l, is computed by propagat<strong>in</strong>g the undesired outcome as evidence <strong>in</strong> the beliefnet. Then one of the most likely states from the posterior distribution is chosen, andthe plan simulated, each time pick<strong>in</strong>g one of the most likely outcomes given the newdistribution.When Weaver is run with the default heuristic, it beg<strong>in</strong>s solv<strong>in</strong>g the start-carproblem with the plan turn-ignition, and then attempts to create a branch<strong>in</strong>g planbased on the test started after the action is executed. With the default heuristic,Prodigy plans for a state with eng<strong>in</strong>e-ok true, produc<strong>in</strong>g the branch<strong>in</strong>g planturn-ignitionIf notstartedturn-ignitionNext Weaver creates a branch<strong>in</strong>g plan based on the test started after the secondtime the ignition is turned. The default heuristic produces exactly the same state,and Weaver appears caught <strong>in</strong> an endless loop 2 , produc<strong>in</strong>g a plan whose probabilityof success asymptotically approaches 0.9.With the footpr<strong>in</strong>t heuristic for choos<strong>in</strong>g a state, Weaver starts <strong>in</strong> the same way,and tries to improve the plan [turn-ignition] when started is false. Now the probability1 This is only a slight exaggeration of my skills as a mechanic.2 This is the behaviour with depth-rst search, and it can be avoided us<strong>in</strong>g a scheme based ondepth-rst iterative deepen<strong>in</strong>g.


6.2. Us<strong>in</strong>g derivational analogy <strong>in</strong> a conditional planner 85of the <strong>in</strong>itial state hav<strong>in</strong>g eng<strong>in</strong>e-ok given that started is false drops from 0.9 to roughly0.64. S<strong>in</strong>ce this is still the most likely state it is chosen and the new plan <strong>in</strong>volvestwo turns of the ignition key as before. When Weaver creates a state for Prodigygiven that the car fails to start twice, however, the probability of the <strong>in</strong>itial statewith eng<strong>in</strong>e-ok drops from 0.64 to 0.26, and it becomes more likely that the eng<strong>in</strong>eis broken. This state is chosen, the current plan is simulated on it, and Prodigyreturns the plan poke-<strong>under</strong>-hood ; turn-ignition. The branch<strong>in</strong>g plan that results hasprobability just over 0.95 of success, and is returned.In general, propagat<strong>in</strong>g evidence through the Bayes net to compute the posteriorstate distribution could take time exponential <strong>in</strong> the size of the net. In practice,updates are typically fast s<strong>in</strong>ce the net can often be constructed to avoid large cyclesand nodes with many parents. In other cases, a heuristic based on approximat<strong>in</strong>g theposterior distribution may still prove valuable.6.2 Us<strong>in</strong>g derivational analogy <strong>in</strong> a conditionalplannerThe work presented <strong>in</strong> this chapter has two purposes. First, I focus on improv<strong>in</strong>g theeciency of the conditional planner described <strong>in</strong> Chapter 3. In some cases the plannercan duplicate search eort unnecessarily when the same goals must be planned for <strong>in</strong>dierent branches of a conditional plan. Analogical replay can be used to share thesearch eort between the dierent branches. The approach of derivational analogy,which reconstructs the plan's justication rather than follow<strong>in</strong>g the plan at a surfacelevel, helps to ensure that the search is shared only when appropriate.The second purpose of this chapter is to demonstrate that the mach<strong>in</strong>e learn<strong>in</strong>gtechniques developed for Prodigy 4.0, a classical planner that does not handleuncerta<strong>in</strong>ty, are still useful and applicable to Weaver, an extension of Prodigy 4.0as a conditional planner that handles uncerta<strong>in</strong>ty. Prodigy's algorithm for derivationalanalogy was orig<strong>in</strong>ally developed by Manuela Veloso [Veloso 1994] for No-Limit, a predecessor to Prodigy 4.0, and subsequently ported. Weaver is able tomake use of the algorithm with very few changes because it is a conservative extensionto Prodigy, itself hav<strong>in</strong>g as few changes as are needed to perform conditionalplann<strong>in</strong>g. Given the amount of eort that has been spent on ecient classicalplanners over the past decade e.g [M<strong>in</strong>ton 1988; Etzioni 1990; Knoblock 1991;Gil 1992], it is signicant that many of the ideas can be re-used.I beg<strong>in</strong> this chapter with a discussion of the motivation for us<strong>in</strong>g analogical replay<strong>in</strong> the conditional planner, then discuss the algorithm and illustrate its performanceon some synthetic doma<strong>in</strong>s. Further experiments are described <strong>in</strong> Chapter 7, andsome comparisons are made with other probabilistic plann<strong>in</strong>g systems <strong>in</strong> Chapter 2.Some of the work described <strong>in</strong> this section was done jo<strong>in</strong>tly with Manuela Veloso[Blythe & Veloso 1997].


86 Chapter 6. Eciency improvements <strong>in</strong> plann<strong>in</strong>g6.2.1 The need to share search eort with<strong>in</strong> the conditionalplannerConsider a simple plann<strong>in</strong>g problem <strong>in</strong> which a package is to be loaded <strong>in</strong>to a truck.In the <strong>in</strong>itial state, the package is at the depot and the truck is at the warehouse.However, consider also that, <strong>in</strong> the time it takes to drive the truck to the depot, thepackage can be misplaced with probability 0.5. When this happens, the package istransferred from the depot to the lost-property department, from where it cannot belost. The follow<strong>in</strong>g branch<strong>in</strong>g plan solves this problem: drive the truck to the depotand if the package is still there load it <strong>in</strong>to the truck, otherwise drive to lost propertyand load the package <strong>in</strong>to the truck.Figure 6.4 shows a tail plan with four steps that may be constructed by Weaver'sconditional planner to solve the problem described at the beg<strong>in</strong>n<strong>in</strong>g of this section.The truck must be made ready, with step 1 \start-truck" before it can be driven, andthe nal goal requires that the truck is put away, with step 4 \stop-truck." The arcfrom \drive to depot" to \load package at depot" <strong>in</strong>dicates that the former step is anestablisher of the latter. We have omitted the preconditions from the diagram.Step 2 \drive to depot," is a context-produc<strong>in</strong>g step that has two possible sets ofeects. Contexts and correspond respectively to the situations where the packageis still at the depot and where it is <strong>in</strong> lost property, when the truck arrives. A stepis marked as context-produc<strong>in</strong>g externally to B-prodigy (<strong>in</strong> fact by Weaver as wedescribed).+α, βDrive to depotLoad package at depotRootFigure 6.4: Initial tail plan to solve example problem. The directed arcs are causall<strong>in</strong>ks show<strong>in</strong>g that a step establishes a necessary precondition of the step it po<strong>in</strong>ts to.When a branch<strong>in</strong>g step is <strong>in</strong>troduced <strong>in</strong>to the tail plan, produc<strong>in</strong>g new contexts,the other steps are <strong>in</strong>itially assumed to belong to all contexts. This is true of step 3,\load package at depot." However each step's contexts can be restricted to a subset,and new steps can be added to achieve the same goal <strong>in</strong> the other contexts. In thisexample, new operators could be <strong>in</strong>troduced both for the top-level goal and the goalfor the truck to be at the depot. The branch<strong>in</strong>g step is always <strong>in</strong>troduced with acommitment that one of its outcomes will be used to achieve its goal, and all theancestors of the step <strong>in</strong> the tail plan must always apply to that context. In thisexample step 2 was <strong>in</strong>troduced to establish step 3 us<strong>in</strong>g context , sostep3may notbe restricted to context .In Figure 6.5, step 3 has been restricted to context , new steps have been addedto achieve the top-level goal <strong>in</strong> context , and steps 1 and 2 been moved to the headplan. Two recursive calls to B-prodigy will now be made, one for each context,<strong>in</strong> each of which B-prodigy will proceed to create a totally-ordered (but possiblynonl<strong>in</strong>ear) plan. In context , the steps labelled with , driv<strong>in</strong>g to and load<strong>in</strong>g at


6.2. Us<strong>in</strong>g derivational analogy <strong>in</strong> a conditional planner 87lost-property, will be removed from the tail plan. In context , the step labelledwith , \load package at depot," will be removed. Step 4, stop-truck, has not beenrestricted and rema<strong>in</strong>s <strong>in</strong> the tail plan <strong>in</strong> both contexts. No steps are removed fromthe head plan, whatever their context. Each recursive call produces a valid totallyorderedplan, and the result is a valid conditional plan that branches on the contextproduced by the step \drive to depot."I+α, βDrive to depotαLoad package at depotαRootβDrive to lost-propertyβLoad package at lost-propertyβFigure 6.5: A second stage of the planner. There are now two possible currentstates for contexts and .The ability to create conditional plans is vital to planners that deal with uncerta<strong>in</strong>ty,however creat<strong>in</strong>g them can lead to high computational overheads because ofthe need for separate plann<strong>in</strong>g eort support<strong>in</strong>g each of the branches of the conditionalplan, measured <strong>in</strong> terms of the computation required to search for an validatethe plan. This problem can be alleviated by shar<strong>in</strong>g as much of the plann<strong>in</strong>g eortas possible between dierent branches.Analogical replay enables shar<strong>in</strong>g the eort to construct plans, while revalidat<strong>in</strong>gdecisions for each new branch. As the follow<strong>in</strong>g three examples show, <strong>in</strong> some casesthe plann<strong>in</strong>g architecture directly supports shar<strong>in</strong>g this eort and <strong>in</strong> others it doesnot.1. Step 1, start-truck, is shared through the head plan. The step is useful forboth branches, but is only planned for <strong>in</strong> one s<strong>in</strong>ce its eect is shared throughthe current state. Although <strong>in</strong> this plan there is only one step, <strong>in</strong> general anarbitrary amount of plann<strong>in</strong>g eort could be shared this way.2. Step 4, stop-truck, is shared through the tail plan. As shown <strong>in</strong> Figure 6.5, thisstep does not have a context label and achieves its goal <strong>in</strong> either context. Thuswhen the branch<strong>in</strong>g step is moved to the head plan, it rema<strong>in</strong>s <strong>in</strong> the tail plan<strong>in</strong> each recursive call made to B-prodigy, mak<strong>in</strong>g the plann<strong>in</strong>g eort available<strong>in</strong> each call.3. Sometimes duplicated plann<strong>in</strong>g eort is not architecturally shared as <strong>in</strong> the lasttwo examples. Suppose that extra set-up steps are required for load<strong>in</strong>g a truck.These would be added to the tail-plan <strong>in</strong> Figure 6.5 <strong>in</strong> two dierent places,as establishers of steps 3 and 6, and restricted to dierent contexts <strong>in</strong> the twoplaces. Thus, the plann<strong>in</strong>g eort cannot be shared by the tail-plan unless itis modied from a tree to a DAG structure. But this approach would lead to


88 Chapter 6. Eciency improvements <strong>in</strong> plann<strong>in</strong>gproblems if descendant establish<strong>in</strong>g steps were to depend on the context, for<strong>in</strong>stance on the location of the truck. We show below how the use of analogycan transfer plann<strong>in</strong>g eort of this k<strong>in</strong>d between contexts. Analogy provides anelegant way to handle other k<strong>in</strong>ds of shared steps as well.6.2.2 Us<strong>in</strong>g derivational analogy <strong>in</strong> WeaverWeaver's conditional planner, B-prodigy, is<strong>in</strong>tegrated with Prodigy/Analogy <strong>in</strong>such away that the usual tasks of case retrieval and set-up are made trivial becausethe case memory is restricted to the current problem. First, a previously visitedbranch is selected to guide plann<strong>in</strong>g for the new branch. By default, the branch thatwas solved rst is used. Next B-prodigy is <strong>in</strong>itialized to plan from the branchpo<strong>in</strong>t,the po<strong>in</strong>t where the new branch diverges from the guid<strong>in</strong>g branch. ThenB-prodigy plans for the new branch, guided by Prodigy/Analogy, proceed<strong>in</strong>g asusual by analogical-replay: previous decisions are followed if valid, unnecessary stepsare not used, and new steps are added when needed. Prodigy/Analogy successfullyguides the new plann<strong>in</strong>g process based on the high global similarity between thecurrent and past situations. This is particularly well suited for the analogical replayguidance and typically leads to m<strong>in</strong>or <strong>in</strong>teractions and a major shar<strong>in</strong>g of the pastplann<strong>in</strong>g eort. The smoothness of this <strong>in</strong>tegration is made possible by the common<strong>under</strong>ly<strong>in</strong>g framework of the Prodigy plann<strong>in</strong>g and learn<strong>in</strong>g system.In this <strong>in</strong>tegration of conditional plann<strong>in</strong>g and analogy, the analogical replaywith<strong>in</strong> the context of dierent branches of the same problem can be viewed as an<strong>in</strong>stance of <strong>in</strong>ternal analogy [Hickman, Shell, & Carbonell 1990]. The accumulationof a library of cases is not required, and there is no need to analyze the similaritybetween a new problem and a potentially large number of cases. The branches ofthe problem need only to be cached <strong>in</strong> memory and most of the doma<strong>in</strong> objects donot need to be mapped <strong>in</strong>to new objects, as the context rema<strong>in</strong>s the same. Whilewe currently use this policy <strong>in</strong> our <strong>in</strong>tegration, the full analogical reason<strong>in</strong>g paradigmleaves us with the freedom to reuse branches across dierent problems <strong>in</strong> the samedoma<strong>in</strong>. We may also need to merge dierent branches <strong>in</strong> a new situation.Table 6.1 presents the analogical reason<strong>in</strong>g procedure comb<strong>in</strong>ed with B-prodigy.We follow a s<strong>in</strong>gle case correspond<strong>in</strong>g to the plan for the last branch visited accord<strong>in</strong>gto the order selected by Weaver.The adaptation <strong>in</strong> the replay procedure <strong>in</strong>volves a validation of the steps proposedby the case. There may be a need to diverge from the proposed case step, becausenew goals exist <strong>in</strong> the current branch (step 9). Some steps <strong>in</strong> the old branch canbe skipped, as they may be already true <strong>in</strong> the new branch<strong>in</strong>g situation (step 13).Steps 8 and 12 account for the shar<strong>in</strong>g between dierent branches and for most of thenew plann<strong>in</strong>g, s<strong>in</strong>ce typically the state is only slightly dierent and most of the goalsare the same across branches. This selective use of replay controls the comb<strong>in</strong>atoricsof conditional plann<strong>in</strong>g.


6.2. Us<strong>in</strong>g derivational analogy <strong>in</strong> a conditional planner 89procedure b-prodigy-analogical-replay1. Let C be the guid<strong>in</strong>g case,and C i be the guid<strong>in</strong>g step <strong>in</strong> the case.2. Set the <strong>in</strong>itial case step C 0 based on the branchpo<strong>in</strong>t.3. Let i =0.4. Term<strong>in</strong>ate if the goal state is reached.5. Check which type of decision is C i :6. If C i adds a step O to the head plan,7. If O can be added to the current head planand no tail plann<strong>in</strong>g is needed before,8. then Replay C i ; L<strong>in</strong>k new step to C i ; goto 14.9. else Hold the case and call B-prodigy,if plann<strong>in</strong>g for new goals is needed; goto 5.10.If C i adds a step O g to the tail plan, to achievegoal g,11. If the step O g is valid and g is needed,12. then Replay C i ; L<strong>in</strong>k new step to C i ; goto 15.13. else Mark unusable all steps dependent onC i ;14.Advance the case to the next usable step C j ;15.i j; goto 5.Table 6.1: Overview of the analogical replay procedure comb<strong>in</strong>ed with B-prodigy.6.2.3 Illustrations of performance us<strong>in</strong>g synthetic doma<strong>in</strong>sEach segment of a conditional plan has a correspond<strong>in</strong>g plann<strong>in</strong>g cost. If a segmentis repeated k times and can be shared but is not, the planner <strong>in</strong>curs a penalty ofk,1times the cost of the segment. Suppose that a plan conta<strong>in</strong>s n b<strong>in</strong>ary branches <strong>in</strong>sequence, all of which share steps. Either the rst or the last part of the plan maybe created 2 n times, but with step shar<strong>in</strong>g it may only need to be created once. Thisexponential cost <strong>in</strong>crease can quickly become a dom<strong>in</strong>ant factor <strong>in</strong> creat<strong>in</strong>g plans foruncerta<strong>in</strong> doma<strong>in</strong>s.Chapter 7 describes experiments to determ<strong>in</strong>e the performance of analogical replay<strong>in</strong> the oil-spill doma<strong>in</strong>. However, to isolate features of the doma<strong>in</strong> or problem thatare pert<strong>in</strong>ent to performance, a family of synthetic doma<strong>in</strong>s was created <strong>in</strong> order tocomprehensively verify experimentally the eect of analogy <strong>in</strong> conditional plann<strong>in</strong>g.These doma<strong>in</strong>s allow precise control over the number of branches <strong>in</strong> a plan, the amountof plann<strong>in</strong>g eort that may be shared between branches and the amount that belongsonly to each branch.


90 Chapter 6. Eciency improvements <strong>in</strong> plann<strong>in</strong>gTable 6.2 describes the operators. The top-level goal is always g, achieved bythe operator Top. There is a s<strong>in</strong>gle branch<strong>in</strong>g operator, Branch, with N dierentbranches correspond<strong>in</strong>g to N contexts. A plan consists of three ma<strong>in</strong> segments. Therst segment consists of the steps taken before the branch po<strong>in</strong>t. These are all <strong>in</strong>aid of the goal bb, the only goal <strong>in</strong>itially unsatised, which isachieved by the stepBranch. All of the branches of Branch achieve bb, but delete cx and sh. Each branchalso adds a unique \context" fact, c i . After the branch po<strong>in</strong>t, the second segment isagroup of steps unique to each <strong>in</strong>dividual branch, <strong>in</strong> aid of the goal cx. We name eachof these segments \C i ." F<strong>in</strong>ally, the third \shared" segment conta<strong>in</strong>s the steps whichare the same <strong>in</strong> every branch <strong>in</strong> aid of the goal sh. They must be taken after thebranch<strong>in</strong>g po<strong>in</strong>t, s<strong>in</strong>ce Branch deletes sh. The plann<strong>in</strong>g work done <strong>in</strong> each segmentis controlled by the iterative steps that achieve the predicates iter-bb 0 , iter-cx i;0 anditer-sh 0 . The plan to achieve iter-sh 0 , for example, has a length determ<strong>in</strong>ed by somenumber z for which iter-sh z is the <strong>in</strong>itial state. The planner selects the operatorsS-sh k which succeed or F-sh k (k =0;...;z) which fail as their preconditions u-sh kof the operators A-sh k are all unachievable. The doma<strong>in</strong> can have N copies of theseoperators. So the plann<strong>in</strong>g eort to solve iter-sh 0 is up to 2N z operators added tothe tail plan, all but z of which are removed.Operator Preconds Adds DeletesTop bb, cx, sh g {Branch iter-bb 0 bb,c i sh, cxwhere irefers to each branch i, i =1;...;BContx i iter-cx i;0 ,c i cx {Shared iter-sh 0 sh {F-bb l a-bb l iter-bb l {A-bb l u-bb l a-bb l {S-bb l iter-bb l+1 iter-bb l {F-cx i;m a-cx i;m iter-cx i;m {A-cx i;m u-cx i;m a-cx i;m {S-cx i;m iter-cx i;m+1 iter-cx i;m {F-sh k a-sh k iter-sh k {A-sh k u-sh k a-sh k {S-sh k iter-sh k+1 iter-sh k {where l; k; m capture the lengths of the segmentsTable 6.2: Operator schemas <strong>in</strong> the test doma<strong>in</strong>.Figure 6.6 shows an example of a plan generated for for a problem with goal g,and <strong>in</strong>itial state cx, sh, iter-bb x , iter-cx 1;y , ..., iter-cx B;y , and iter-sh z .We have performed extensive experiments with a variety of setups. As an illustrativeperformance graph, Figure 6.7 shows the plann<strong>in</strong>g time <strong>in</strong> seconds when the


6.2. Us<strong>in</strong>g derivational analogy <strong>in</strong> a conditional planner 91S-bb x ...S-bb0 BranchS-cx1;y...S-cx1;0 Contx1 S-sh z ...S-sh0 Top...S-cx B;y ...S-cx B;0 Contx B S-sh z ...S-sh0 TopFigure 6.6: Atypical plan <strong>in</strong> the test doma<strong>in</strong>.number of branches is <strong>in</strong>creased from 1 to 10. For this graph, the nal plan hasone step <strong>in</strong> each of the \C i " and the \shared" segments. The doma<strong>in</strong> was chosen sothat B-prodigy exam<strong>in</strong>ed 4 search nodes to create a \C i " segment and 96 for the\shared" segment. With less extreme proportions of shared plann<strong>in</strong>g time to uniqueplann<strong>in</strong>g time, the shape of the graph is roughly the same and analogical replay stillproduces signicant speedups. With 24 search nodes exam<strong>in</strong>ed for each unique plansegment, and 72 for the shared segment, B-prodigy completes the plan with 10branches more than twice as quickly with analogical replay as without it.The improvement <strong>in</strong> time is similar when the depth and breadth of search are<strong>in</strong>creased for the shared segment and the number of branches is held constant.2015normalanalogy10502 3 4 5 6 7 8 9 10BranchesFigure 6.7: Time <strong>in</strong> seconds to solve plann<strong>in</strong>g problem with and without analogyplotted aga<strong>in</strong>st the number of branches <strong>in</strong> the conditional plan. Each po<strong>in</strong>t isanaverage of ve plann<strong>in</strong>g episodes.The use of analogical replay<strong>in</strong>B-prodigy is a heuristic based on the assumptionthat a signicant proportion of plann<strong>in</strong>g work can be shared between the branches.We tested the limits of this assumption by experiments hold<strong>in</strong>g constant both thenumber of branches and the eort to create the shared segment, and <strong>in</strong>creas<strong>in</strong>g theeort to create each unique segment. Under these conditions, the time taken by B-prodigy grows at the same rate whether or not analogical replay is used, because theoverhead of replay is small relative to plann<strong>in</strong>g eort, and appears constant. Whenthe plann<strong>in</strong>g eort for each C-i was <strong>in</strong>creased from 4 to 100 search nodes while theeort to create a shared branch was held constant at 24 search nodes, B-prodigytook about 1 second longer with analogy than without it, an overhead of less than 10per cent when each C-i took 100 search nodes.


92 Chapter 6. Eciency improvements <strong>in</strong> plann<strong>in</strong>g


Chapter 7Experimental results <strong>in</strong> theOil-spill doma<strong>in</strong>Throughout this thesis, most of the techniques <strong>in</strong>troduced have been described us<strong>in</strong>gsynthetic doma<strong>in</strong>s, and these have also been used to demonstrate the potentialbenets of the techniques. In this chapter I present an experimental demonstrationof these techniques <strong>in</strong> the doma<strong>in</strong> of oil-spill clean-up. This is a real-world doma<strong>in</strong>,consist<strong>in</strong>g of 62 operator and <strong>in</strong>ference rule schemata, that was orig<strong>in</strong>ally developedby SRI for the US Coast Guard [Desimone & Agosta 1994].In the next section I provide an overview of the oil-spill doma<strong>in</strong>. Sections 7.2and 7.3 discuss how Weaver improves plans <strong>in</strong> this doma<strong>in</strong>. In section 7.4 I demonstratethe improvement <strong>in</strong> the time to evaluate the belief net that comes from us<strong>in</strong>gthe event graph technique describe <strong>in</strong> Chapter 5. In Section 7.5 I <strong>in</strong>vestigate theeect of doma<strong>in</strong>-<strong>in</strong>dependent heuristics and <strong>in</strong> Section 7.6 I <strong>in</strong>vestigate the eect ofderivational analogy <strong>in</strong> this doma<strong>in</strong>.7.1 The oil-spill clean-up doma<strong>in</strong>The doma<strong>in</strong> models the problem of clean<strong>in</strong>g up oil-spills from vessels, which comprises3 k<strong>in</strong>ds of activity: (1) stopp<strong>in</strong>g the ow of oil from a tanker, (2) clean<strong>in</strong>g up oil fromthe surface of the water and (3) protect<strong>in</strong>g and/or clean<strong>in</strong>g sensitive areas of coastl<strong>in</strong>e.The doma<strong>in</strong> uses 151 object types, of which 52 of the most important are shown <strong>in</strong>Figure 7.1. The 62 operators and <strong>in</strong>ference rules of the doma<strong>in</strong> can be roughly groupedas follows: 8 are used exclusively for stopp<strong>in</strong>g the ow of oil from a vessel, by pump<strong>in</strong>goil from the vessel, tow<strong>in</strong>g the vessel to port or us<strong>in</strong>g booms to conta<strong>in</strong> the oilaround the vessel and skimm<strong>in</strong>g it from the surface. 18 are used for clean<strong>in</strong>g oil from the sea, by skimm<strong>in</strong>g it <strong>in</strong>to a barge, us<strong>in</strong>gchemical dispersants or controlled burns.93


94 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong> 11 are used for protect<strong>in</strong>g sensitive areas of coastl<strong>in</strong>e, either us<strong>in</strong>g booms tokeep oil from the shore, or us<strong>in</strong>g booms or berms and dams to divert oil andvacuum trucks to clean it from the shore. 9 are used for shared subgoals of the above three high-level activities. They<strong>in</strong>clude plac<strong>in</strong>g booms and mak<strong>in</strong>g <strong>in</strong>ferences about the state of the sea. 16 deal with mov<strong>in</strong>g equipment, teams and vessels.The doma<strong>in</strong> is shown <strong>in</strong> detail <strong>in</strong> Appendix B.The ma<strong>in</strong> sources of uncerta<strong>in</strong>ty <strong>in</strong> the doma<strong>in</strong> come from the movement of the oiland changes <strong>in</strong> the weather conditions while the plan is be<strong>in</strong>g executed. Equipmentisstored at various locations along the coast and may havetobemoved large distancesto be employed <strong>in</strong> a clean-up operation. Therefore plans may <strong>in</strong>clude long lead timesbefore crucial steps, such as position<strong>in</strong>g a boom or pump<strong>in</strong>g oil <strong>in</strong>to a barge, can takeplace. Over these time periods exogenous events model the weather and the amountofoil spilled <strong>in</strong> the sea as stochastic processes. Certa<strong>in</strong> steps require a level of calmnessof the sea <strong>in</strong> order to be performed, and their probability of success depends on theseexogenous events.The uncerta<strong>in</strong>ty <strong>in</strong> the oil spread over time can also aect the allotment of resourcesfor shore protection. For example <strong>in</strong> some cases spilled oil may have someprobability of reach<strong>in</strong>g one or more of a number of sensitive areas of coastl<strong>in</strong>e butresources may not be sucient to protect all of them. In these cases a conditionalplan that moves resources once the direction of the oil is known can have a higherprobability of success than any non-conditional plan. An example of this is shown <strong>in</strong>Section 7.2The plann<strong>in</strong>g problems used for this doma<strong>in</strong> model the geography and equipmentavailable <strong>in</strong> the San Francisco bay area (Figure 7.2). These problems are modelledwith 183 objects belong<strong>in</strong>g to subtypes of place, 55 belong<strong>in</strong>g to subtypes ofequipment, 30 belong<strong>in</strong>g to subtypes of vessel and 119 other objects. Details canbe found <strong>in</strong> [Desimone & Agosta 1994].In the rema<strong>in</strong>der of this chapter I demonstrate Weaver and show experimentalresults <strong>in</strong> the oil-spill doma<strong>in</strong> about Weaver's performance and about the impact ofthe techniques described <strong>in</strong> this thesis.7.2 Case studies of improv<strong>in</strong>g plan reliabilityThis section provides demonstrations of how Weaver improves the probability of plansuccess. The next section provides experimental results and analysis of the improvementobta<strong>in</strong>ed for a set of randomly generated examples. Subsequent sections thatdemonstrate techniques for improv<strong>in</strong>g Weaver's speed will focus on the rate of improvementof probability. When Prodigy is used to create plans <strong>in</strong> the oil-spill doma<strong>in</strong>,it ignores the doma<strong>in</strong>'s sources of uncerta<strong>in</strong>ty. Consequently Weaver can improve onplans found by Prodigy <strong>in</strong> three qualitatively dierent ways:


7.2. Case studies of improv<strong>in</strong>g plan reliability 95boatplatform-workboattugutility-boatvesselfish<strong>in</strong>g-boatbargedispersant-bargetankersea-sectorwork-bargeland-sectortank-bargesectorair-sectordracon-bargeairfieldseaportlocationurbandispersantplacesensitive-areasorbent<strong>in</strong>land-waterriverskimmerlakestorage-levelresponse-equipmentboomoil-area-levelbladdertop-typeoil-viscosityw<strong>in</strong>d-velocityfire-chem-equipmentpumpsea-statesupport-equipmentspilled-oilother-equipmenttankaircraftequipmentheavy-equipmentvacuum-truckenv-resourceteambackhoecranetractorFigure 7.1: Part of the type hierarchy, show<strong>in</strong>g the 52 most important of the 151types <strong>in</strong> the doma<strong>in</strong>.1. Better plans can be found by backtrack<strong>in</strong>g from one choice of operator or b<strong>in</strong>d<strong>in</strong>gsto another that avoids the source of uncerta<strong>in</strong>ty.2. Conditionally branch<strong>in</strong>g plans can be created that <strong>in</strong> some cases have a higherprobability of success than is possible without <strong>in</strong>clud<strong>in</strong>g conditional branches.


96 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong>Figure 7.2: San Francisco Bay area3. Better plans can be found by <strong>in</strong>clud<strong>in</strong>g steps whose purpose is to prevent someundesired action outcome or exogenous event.The mechanismsby whichWeaver can use these techniques are described <strong>in</strong> Chapter4. Here I provide a brief example of the last two of these eects <strong>in</strong> the oil-spilldoma<strong>in</strong>.Creat<strong>in</strong>g a conditional planWhen oil is spilled, it can move unpredictably on the surface of the water directedby the w<strong>in</strong>d and currents, so that the stretches of coastl<strong>in</strong>e that will be hit by oilare uncerta<strong>in</strong>. Consequently the number of sensitive areas potentially <strong>under</strong> threatfrom the oil may bemuch larger than the number that the oil may reach <strong>in</strong>anys<strong>in</strong>gle eventuality. Frequently the available resources for clean<strong>in</strong>g up the shore maybe adequate for only a small number of areas and a more reliable plan can be madeby delay<strong>in</strong>g the deision on where to deploy the resources as late as possible so thatthe path of the oil is known with greater certa<strong>in</strong>ty. The approach relies on build<strong>in</strong>ga conditional plan that tests for the sensitive areas <strong>under</strong> greatest threat.In the version of the oil-spill doma<strong>in</strong> <strong>under</strong> test, the fact that a particular sensitivearea of coastl<strong>in</strong>e is likely to be hit by oil is modelled with the predicate(threatened-shorel<strong>in</strong>e ) and the stochastic motion ofthe oil <strong>in</strong> the water is modelled by an exogenous event threatened-shore-changes,shown <strong>in</strong> Figure 7.3. This is a primitive model <strong>in</strong> which the threat from the oil shifts


7.2. Case studies of improv<strong>in</strong>g plan reliability 97from one area to another without regard to geographic proximity, although the constra<strong>in</strong>tfunction potentially-threatened ensures that the areas are both adjacentto the sea-sector conta<strong>in</strong><strong>in</strong>g the oil.(event threatened-shore-changes(params )(probability 0.1)(preconds(( Spilled-oil)( Sea-Sector)( (and Numerical(gfp (amount-spilled ))(> 0)))( (and Sensitive-Area(potentially-threatened )))( (and Sensitive-Area(potentially-threatened )(diff ))))(threatened-shorel<strong>in</strong>e ))(effects ()((del (threatened-shorel<strong>in</strong>e ))(add (threatened-shorel<strong>in</strong>e )))))Figure 7.3: The exogenous event threatened-shore-changes models the uncerta<strong>in</strong>tyof the oil's path <strong>in</strong> the water.Table 7.1 shows the objects and the goal for an example problem, and Table 7.2shows the <strong>in</strong>itial state for the problem. The tanker ss-weany has spilled oil <strong>in</strong> thegolden-gate sea sector, whose adjacent shorel<strong>in</strong>e, west-mar<strong>in</strong>, conta<strong>in</strong>s two sensitiveareas, rodeo-lagoon and pt-bonita-cove. Initially, only pt-bonita-cove isthreatened. The goal has been simplied by remov<strong>in</strong>g the requirement to clean oilup from the surface of the water, leav<strong>in</strong>g the requirements to stop the discharge ofoil from the tanker and protect the threatened sensitive area of coastl<strong>in</strong>e. There is atank barge and pump available to pump oil out of the tanker to stop its discharge.There is a tractor and a vacuum truck which can be used to build berms and damsto protect one area of coastl<strong>in</strong>e.The geographic regions and the specic properties of the response equipment<strong>in</strong> this problem correspond exactly to objects <strong>in</strong> the scenarios designed by SRI <strong>in</strong>conjunction with the coast guard [Desimone & Agosta 1994]. The example has beenmade short by modell<strong>in</strong>g a small oil spill that endangers only a small number oflocations. As an uncerta<strong>in</strong> plann<strong>in</strong>g problem it has also been simplied by modell<strong>in</strong>gonly the movement of the oil, not the change <strong>in</strong> the sea state or oil spill<strong>in</strong>g.Prodigy's solution for this problem, ignor<strong>in</strong>g <strong>in</strong>ference rules, is shown <strong>in</strong> Table 7.3.The rst three steps stop the discharge of oil from the tanker and the next threeprotect pt-bonita-cove. Weaver constructs a probabilistic model of this plan andnds it has a probability of success of roughly 1/2, because <strong>in</strong> the time taken to stop


98 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong>Objects:(spill SPILLED-OIL)(pt-bonita-cove rodeo-lagoon SENSITIVE-AREA)(golden-gate SEA-SECTOR)(west-mar<strong>in</strong> LAND-SECTOR)(richmond-port SEAPORT)(oakland URBAN)(ss-weany TANKER)(tank-barge2 TANK-BARGE)(utility-boat-2 UTILITY-BOAT)(cargo-pump2 CARGO-TRANSFER-PUMP)(vac-truck2 VACUUM-TRUCK)(tractor-2 TRACTOR)Goal: (and (no-discharge ss-weany spill golden-gate)(~ (unprotected-sensitive-area spill)))Table 7.1: The objects and goal for the example problem used to demonstrate aconditional plan formed <strong>under</strong> scarce resources.(unprotected-sensitive-area spill)(threatened-shorel<strong>in</strong>e spill pt-bonita-cove)(amount-spilled spill golden-gate 10)(discharge-size ss-weany spill 1000)(discharge-rate ss-weany spill 600)(sea-state golden-gate 2)(located-with<strong>in</strong> rodeo-lagoon west-mar<strong>in</strong>)(located-with<strong>in</strong> pt-bonita-cove west-mar<strong>in</strong>)(adjacent golden-gate west-mar<strong>in</strong>)(distance richmond-port golden-gate 25)(distance oakland rodeo-lagoon 20)(distance oakland pt-bonita-cove 20)(distance richmond-port rodeo-lagoon 20)(distance richmond-port pt-bonita-cove 20)(shore-personnel-required pt-bonita-cove 10)(shore-personnel-required rodeo-lagoon 10)(located ss-weany golden-gate)(located tank-barge2 richmond-port)(located cargo-pump2 richmond-port)(located vac-truck2 richmond-port)(located tractor-2 oakland)(max-speed tank-barge2 2)(barge-capacity-bbl tank-barge2 4000)(max-sea-state tank-barge2 4)(max-speed utility-boat-2 5)(capacity-bbl vac-truck2 1000)(capacity-bbl-per-hr cargo-pump2 480)Table 7.2: The <strong>in</strong>itial state for the example problem used to demonstrate a conditionalplan formed <strong>under</strong> scarce resources.the discharge of oil, the sensitive area that is threatened by oil can change a numberof times and the probability distribution of the threatened area has approached itssteady state value.Weaver's plan critic forces B-Prodigy to nd a branch<strong>in</strong>g plan, splitt<strong>in</strong>g on the test(threatened-shorel<strong>in</strong>e spill pt-bonita-cove). The exogenous event ismodelled<strong>in</strong> B-Prodigy as tak<strong>in</strong>g place due to some action that takes non-zero time. Bydefault the latest possible action is chosen, to push the branch as late as possible. In


7.2. Case studies of improv<strong>in</strong>g plan reliability 99(move-response-equipment-by-sea1b cargo-pump2 golden-gate richmond-portutility-boat-2)(move-to-sea-sector-from-port tank-barge2 richmond-port golden-gate)(cargo-transfer-oil-to-stabilize ss-weany spill golden-gatetank-barge2 cargo-pump2 600 120)(move-heavy-equip-by-ground2 tractor-2 oakland pt-bonita-cove)(move-heavy-equip-by-ground2 vac-truck2 richmond-port pt-bonita-cove)(build-berms-and-dams spill pt-bonita-cove)Table 7.3: Prodigy's solution to the example problem, ignor<strong>in</strong>g <strong>in</strong>ference rules.(move-response-equipment-by-sea1b cargo-pump2 golden-gate richmond-portutility-boat-2)(move-to-sea-sector-from-port tank-barge2 richmond-port golden-gate)(cargo-transfer-oil-to-stabilize ss-weany spill golden-gatetank-barge2 cargo-pump2 600 120)IF (threatened-shorel<strong>in</strong>e (spill)) is <strong>in</strong> (pt-bonita-cove)(move-heavy-equip-by-ground2 tractor-2 oakland pt-bonita-cove)(move-heavy-equip-by-ground2 vac-truck2 richmond-port pt-bonita-cove)(build-berms-and-dams spill pt-bonita-cove)ELSE(move-heavy-equip-by-ground2 tractor-2 oakland rodeo-lagoon)(move-heavy-equip-by-ground2 vac-truck2 richmond-port rodeo-lagoon)(build-berms-and-dams spill rodeo-lagoon)Table 7.4: Weaver's solution to the example problem, ignor<strong>in</strong>g <strong>in</strong>ference rules.this case that would be the action of mov<strong>in</strong>g vac-truck2, but s<strong>in</strong>ce neither it nortractor-2 can be moved more than once, no branch<strong>in</strong>g plan is possible at this po<strong>in</strong>tand Weaver backtracks to put the branch after the step cargo-transfer-oil-to-stabilize.B-Prodigy returns the plan shown <strong>in</strong> Table 7.4, which Weaver determ<strong>in</strong>es has a probabilityof success of 0.8. This is less than 1 because of the chance that the threatenedshorel<strong>in</strong>e will change after the tractor has begun to move to its location, but is higherthan any non-branch<strong>in</strong>g plan could achieve.Add<strong>in</strong>g preventive stepsIn some cases the probability of success can be improved by <strong>in</strong>sert<strong>in</strong>g steps <strong>in</strong>to theplan that reduce the probability of some undesired event or action outcome fromtak<strong>in</strong>g place, as described <strong>in</strong> Chapter 4. An example of this <strong>in</strong> the oil-spill doma<strong>in</strong>can occur when the actions taken to stabilize the discharge of oil from a tanker take asignicant amount of time to set up. Dur<strong>in</strong>g this time, oil may spill from the tankerand spread through the water and methods for clean<strong>in</strong>g it up may be unreliable andexpensive. However, the ow of oil from the tanker can be conta<strong>in</strong>ed by surround<strong>in</strong>git with a length of boom. If this boom can be put <strong>in</strong> place relatively quickly comparedwith the time taken to stabilize the discharge, then much of the oil can be preventedfrom spill<strong>in</strong>g.Figure 7.4 shows the exogenous event oil-spills. The amount-spilled predi-


100 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong>cate models the amount of oil <strong>in</strong> the water, and is the only predicate changed by theevent. The event will not take place if either there is a sucient amount of boomsurround<strong>in</strong>g the vessel, modelled by the predicate boom-level>=, or the tanker hasbeen stabilized to stop the discharge, modelled with the predicate no-discharge.Stopp<strong>in</strong>g the discharge is usually a top-level goal as well as a way to prevent theoil-spills event from happen<strong>in</strong>g.(event oil-spills(params )(probability 0.2)(preconds(( Spilled-oil)( Vessel)((and Numerical (gfp (vessel-boom-length ))))( Sea-sector)( (and Numerical (gfp (sea-state ))))( (and Numerical(gfp (discharge-rate ))))( (and Numerical(gfp (discharge-size ))))( (and Numerical (numberp ) (< )))( (and Numerical (add-with-ceil<strong>in</strong>g );; can't spill more than there is.(= ))(~ (no-discharge ))(amount-spilled )))(effects ()((del (amount-spilled ))(add (amount-spilled )))))Figure 7.4: The exogenous event oil-spills models the uncerta<strong>in</strong> ow ofoilfrom the tanker to the sea.Table 7.5 shows the objects that and goal for an example problem, and Table 7.6shows the <strong>in</strong>itial state for the problem. The tanker ss-catastrophe is discharg<strong>in</strong>goil <strong>in</strong> the SF-north-coast sea sector, although <strong>in</strong> the <strong>in</strong>itial state no oil has beenspilled. A skimmer is located at Mart<strong>in</strong>ez-port which is 28 miles from the area ofthe spill. A boom that is long enough to conta<strong>in</strong> the spill around the tanker is located8 miles away atRichmond-port. The goal is to stop the discharge of oil and have nooil spilled <strong>in</strong> the sea sector.Prodigy's solution to the problem, ignor<strong>in</strong>g <strong>in</strong>ference rules, is the follow<strong>in</strong>g twostepplan:


7.2. Case studies of improv<strong>in</strong>g plan reliability 101Objects:(sf-bay-spill SPILLED-OIL)(sf-north-coast SEA-SECTOR)(mart<strong>in</strong>ez-port richmond-port SEAPORT)(ss-catastrophe TANKER)(weir-skimmer2 PORTABLE-SKIMMER)(boom6 BOOM)(utility-boat-1 utility-boat-2 UTILITY-BOAT)Goal: (and (no-discharge ss-catastrophe sf-bay-spill sf-north-coast)(amount-spilled sf-bay-spill sf-north-coast 0))Table 7.5: The objects and goal for the example problem used to demonstrateadd<strong>in</strong>g steps to prevent anevent.(amount-spilled sf-bay-spill sf-north-coast 0)(discharge-size ss-catastrophe sf-bay-spill 12000)(discharge-rate ss-catastrophe sf-bay-spill 2000)(sea-state sf-north-coast 2)(distance mart<strong>in</strong>ez-port sf-north-coast 28)(distance richmond-port sf-north-coast 8)(located boom6 richmond-port)(located weir-skimmer2 mart<strong>in</strong>ez-port)(located ss-catastrophe sf-north-coast)(located utility-boat-1 mart<strong>in</strong>ez-port)(located utility-boat-2 richmond-port)(vessel-boom-length ss-catastrophe 1000)(max-sea-state boom6 4)(length-boom-ft boom6 1600)(max-sea-state weir-skimmer2 6)(skim-rate-bbl-per-hr weir-skimmer2 2000)(max-speed utility-boat-1 5)(draft utility-boat-1 4)(max-speed utility-boat-2 5)(draft utility-boat-2 4)Table 7.6: The <strong>in</strong>itial state for the example problem used to demonstrate add<strong>in</strong>gsteps to prevent anevent.


102 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong>(move-response-equipment-by-sea1b weir-skimmer2 sf-north-coastmart<strong>in</strong>ez-port utility-boat-1)(get-skimmer-to-skim-near-vessel ss-catastrophe weir-skimmer2sf-north-coast 2 2000)This achieves the goal of stopp<strong>in</strong>g the discharge, but it takes nearly 6 hours tomove the skimmer to the spill site and the plan's probability of success is only around0.26 because <strong>in</strong> this time oil is likely to spill. One way to improve the probability ofsuccess is to negate the event oil-spills as quickly as possible, by mov<strong>in</strong>g the boomfrom Richmond-port which can be done <strong>in</strong> <strong>under</strong> 2 hours. The plan critic tries thisapproach and adds the appropriate predicate us<strong>in</strong>g boom-level>= to the preconditionsof mov<strong>in</strong>g the skimmer and calls B-Prodigy aga<strong>in</strong>. The result<strong>in</strong>g four-step planhas a probability of success of 0.8:(move-response-equipment-by-sea1b boom6 sf-north-coastrichmond-port utility-boat-2)(use-boom-to-conta<strong>in</strong>-vessel ss-catastrophe boom6 1000 sf-north-coast)(move-response-equipment-by-sea1b weir-skimmer2 sf-north-coastmart<strong>in</strong>ez-port utility-boat-1)(get-skimmer-to-skim-near-vessel ss-catastrophe weir-skimmer2sf-north-coast 2 2000)7.3 Experimental results for improv<strong>in</strong>g plan reliability.In order to experimentally verify the amount of improvement <strong>in</strong> probability of successthat Weaver can produce, Weaver was run with 75 randomly generated problems eachsolved <strong>in</strong> 4 dierentways, mak<strong>in</strong>g random choices at the planner and plan critic choicepo<strong>in</strong>ts, yield<strong>in</strong>g a total of 300 trials.The algorithm to generate the random problems xes the geography of the area, asdened by the set of objects of types sea-sector, land-sector, urban, seaport andsensitive-area and by the literals for the predicates adjacent, located-with<strong>in</strong>,distance and berth-size. The geographic <strong>in</strong>formation models the San FranciscoBay area as dened <strong>in</strong> the version of the doma<strong>in</strong> written at sri. Pseudo-code for thealgorithm is given at the end of Appendix B. With<strong>in</strong> this geographic framework, onevessel is placed <strong>in</strong> a random sea-sector that conta<strong>in</strong>s at least one sensitive-area, witha random amount of oil, spill-rate and length, chosen with<strong>in</strong> xed parameters. Arandom number of equipment objects, for example, of type boom, tractor and tug,are distributed among the seaports at random. F<strong>in</strong>ally the amountofwork required toprotect a sensitive area is varied as are the <strong>in</strong>itial weather conditions. This generationalgorithm produces a set of problems that, along with random choices made <strong>in</strong> theproblem solver, exercise all possible comb<strong>in</strong>ations of the strategies available to theplanner while ensur<strong>in</strong>g that the generated problems are physically plausible.


7.3. Experimental results for improv<strong>in</strong>g plan reliability. 103The problems vary <strong>in</strong> diculty for the planner <strong>in</strong> the number of steps required,<strong>in</strong> the maximum achievable probability of success and <strong>in</strong> the number of improvementsteps needed to achieve a high probability of success. One of the major factorsaect<strong>in</strong>g these aspects of the problems is the amount of equipment available <strong>in</strong> the<strong>in</strong>itial state as compared with the size of the oil spill. With adequate equipmentavailable for a number of clean-up techniques the planner's task is relatively easy,but as equipment becomes more limited the planner may have to produce unusualcomb<strong>in</strong>ations of equipment or use equipment that is more likely to fail. A secondfactor is the location of the equipment. If critical equipment is located far from thespill, the plan may take a long time to execute and therefore allow more time forexogenous events that can reduce its probability of success. A third factor is the<strong>in</strong>itial state of the sea. Rough seas can make some clean-up equipment, particularlybooms and skimmers, unusable.Figure 7.5 shows a graph of the average probability of the plans found as thenumber of iterations through the Weaver algorithm is <strong>in</strong>creased, along with the 10thpercentile and 90th percentile probabilities. This graph uses a random sample of250 examples. The rate of <strong>in</strong>crease <strong>in</strong> average probability slows as the number ofiterations <strong>in</strong>creases. There is only slight improvement after the rst two iterations,partly because a signicant number of plans succeed with probability 0.9 or greaterafter two iterations. Figure 7.6 shows how many of the trials reach their maximumvalue at each iteration.10.8Mean10th percentile90th percentileProbability of success0.60.40.200 1 2 3 4 5 6 7 8Number of iterationsFigure 7.5: The average probability of success for the plans for the randomly generatedproblems, plotted aga<strong>in</strong>st the number of iterations of improvement. The upperand lower l<strong>in</strong>es show the 90th and 10th percentiles of the probabilities respectively.Figure 7.7 shows the average improvement per iteration for the rst four iterationson the random set of problemstwo dierent types of improvement that Weaver can use: add<strong>in</strong>g a conditionalbranch and add<strong>in</strong>g preventive steps. The immediate improvement from backtrack<strong>in</strong>gis not shown. Its average value <strong>in</strong> the oil-spill doma<strong>in</strong> is negative, that is, the plan's


104 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong>80706050Trials4030201000 1 2 3 4 5 6 7 8IterationFigure 7.6: The number of examples that reach their maximum probability ateach iteration.probability of success typically decreased when the planner backtracked. However,backtrack<strong>in</strong>g allows the planner to explore other parts of the space of plans and isfrequently used before Weaver nds its best plan. Although the average improvementfrom add<strong>in</strong>g a conditional branch is approximately one quarter that of add<strong>in</strong>g preventivesteps <strong>in</strong> this doma<strong>in</strong>, the overall <strong>in</strong>crease <strong>in</strong> probability due to add<strong>in</strong>g conditionalbranches is higher because this method is used more often. The percentage of thetotal improvement <strong>in</strong> probability over the random problem set that is due to eachimprovement type is shown <strong>in</strong> the graph on the right of the gure.Average improvement0.50.450.40.350.30.250.20.150.10.050PreventionBranchImprovement typeCumulative improvement9080706050403020100PreventionBranchImprovement typeFigure 7.7: The graph on the left shows the average improvement ga<strong>in</strong>ed from eachway to improve a plan across the random set of problems. The graph on the rightshows their cumulative contribution to success probability. Although <strong>in</strong> this doma<strong>in</strong>each conditional branch is generally only 1/4 as eective as add<strong>in</strong>g prevent<strong>in</strong>g steps,its overall contribution is a higher proportion because it is used more often.


7.3. Experimental results for improv<strong>in</strong>g plan reliability. 105Ablation studyFigure 7.7 shows how much improvement <strong>in</strong> probability was actually derived dur<strong>in</strong>gproblem solv<strong>in</strong>g <strong>in</strong>stances from each of the dierent ways to improve probability.However, it is frequently the case that while one k<strong>in</strong>d of improvement is used by thesystem, another could have been used, so a more complete picture of the relativeutility of the types of improvement can be ga<strong>in</strong>ed by selectively disabl<strong>in</strong>g each ofthem and runn<strong>in</strong>g the same set of examples without it. The results of such a studyare shown <strong>in</strong> Figure 7.8. When Weaver cannot create conditional branches, it reachesan average probability of success of roughly 0.6 after 3 iterations and stays constanton further iterations. When it cannot add steps to prevent an error from tak<strong>in</strong>g place,it reaches an average probability just below 0.7. Us<strong>in</strong>g both techniques it reaches anaverage of 0.77 <strong>in</strong> four iterations and 0.79 <strong>in</strong> eight iterations.0.80.70.6Probability of success0.50.40.3No restrictionsNo preventionNo branches0.20.100 1 2 3 4 5 6 7 8IterationFigure 7.8: Average probabilities of plans for the same example set, plotted aga<strong>in</strong>stthe number of iterations, when Weaver's mechanisms for improv<strong>in</strong>g plan probabilityare selectively disabled.Easy and hard cases for WeaverFigure 7.9 shows a histogram of the probabilities of the <strong>in</strong>itial plans found by Weaver<strong>in</strong> the example test sets. In approximately half the examples, Weaver's <strong>in</strong>itial planhas probability below 0.1, and the rest are clustered around a probability of 0.25with a separate peak at 0.9 - 1.0. To exam<strong>in</strong>e whether the cases with low <strong>in</strong>itialprobability are rectied quickly or represent examples that have low-probability nal


106 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong>plans, Figure 7.3 shows the probabilities plotted aga<strong>in</strong>st iterations <strong>in</strong> Weaver for thesetwo cases, with the mean for the whole set plotted as a reference. Although theyconverge, the <strong>in</strong>itial low probability is not recovered over four iterations of Weaver.6050% of trials4030201000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1ProbabilityFigure 7.9: Histogram of the percentage of trials whose <strong>in</strong>itial probability of successfell <strong>in</strong> each of the ten probability ranges 0.1 <strong>in</strong> width. Roughly half the <strong>in</strong>itialcases have probability between 0 and 0.11MeanInitial < 0.1Initial >= 0.10.8Probability of success0.60.40.200 1 2 3 4 5 6 7 8Number of iterationsFigure 7.10: The lower l<strong>in</strong>e shows the mean probability of success plotted aga<strong>in</strong>stthe iteration of Weaver for the cases whose <strong>in</strong>itial probability was between 0 and 0.1.The upper l<strong>in</strong>e shows these values for the other cases and the middle l<strong>in</strong>e shows theglobal mean.


7.4. Mutual constra<strong>in</strong>ts between the planner, evaluator and critic 107Iteration 1Iteration 260605050% of trials403020% of trials403020101000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1probability of success00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1probability of successIteration 3Iteration 860605050% of trials403020% of trials403020101000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1probability of success00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1probability of successFigure 7.11: Evolution of probabilities for the <strong>in</strong>itially low probability cases7.4 Mutual constra<strong>in</strong>ts between the planner, evaluatorand criticIn Section 4.4 I mentioned some of the advantages of us<strong>in</strong>g the emerg<strong>in</strong>g plan toconstra<strong>in</strong> the probabilistic model of the plan and us<strong>in</strong>g the model to <strong>in</strong>crementally<strong>in</strong>crease the amount of doma<strong>in</strong> uncerta<strong>in</strong>ty that the planner is aware of. One advantage,demonstrated with an articial doma<strong>in</strong>, is that the plan can be used toconstra<strong>in</strong> the set of exogenous events that need to be considered to evaluate the plan.This can greatly aect the cost of evaluat<strong>in</strong>g the plan, s<strong>in</strong>ce the size of the jo<strong>in</strong>tprobability distribution calculated can be exponential <strong>in</strong> the number of exogenousevents. In addition if the number of sources of uncerta<strong>in</strong>ty considered by the planneris not constra<strong>in</strong>ed, the number of separate cases to be considered by the planner canalso grow exponentially. In Section 7.4.1 I show that while it is important to constra<strong>in</strong>the events considered <strong>in</strong> the oil-spill doma<strong>in</strong>, it can eectively be done with adoma<strong>in</strong>-dependent algorithm that considers the problem and not the candidate plan.A second advantage of the mutual constra<strong>in</strong>ts between the planner, the plan evaluatorand the critic discussed <strong>in</strong> Section 4.4 is that the eects of events do not need tobe modelled as alternative outcomes for every operator dur<strong>in</strong>g which they can occur,but only where they aect the plan. This observation is used <strong>in</strong> the event graphtechnique developed <strong>in</strong> Chapter 5 to compress the belief net that is built to evaluatethe plan. In Section 7.4.2 I show that this technique is useful <strong>in</strong> the oil-spill doma<strong>in</strong>.


108 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong>Iteration 1Iteration 260605050% of trials403020% of trials403020101000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1probability of success00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1probability of successIteration 3Iteration 860605050% of trials403020% of trials403020101000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1probability of success00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1probability of successFigure 7.12: Evolution of probabilities for the <strong>in</strong>itially high probability cases7.4.1 Us<strong>in</strong>g the plan to constra<strong>in</strong> events <strong>in</strong> the evaluatorI focus on the exogenous events of the form sea-gets-worse and sea-gets-better,which haveanumerical sea state and a sea sector as parameters and alter the seastate of the given sector by one. The event sea-gets-better is shown <strong>in</strong> Figure 7.13,and both can be found <strong>in</strong> the doma<strong>in</strong> specication <strong>in</strong> Appendix B.The random problems used <strong>in</strong> this chapter conta<strong>in</strong> twenty-ve sea sectors, eachof which takes on one of six sea states <strong>in</strong>dependently of the other sea sectors. S<strong>in</strong>cethey are <strong>in</strong>dependent, they do not automatically lead to an exponential <strong>in</strong>crease <strong>in</strong> theexpense of evaluat<strong>in</strong>g the plan. However a planner explicitly consider<strong>in</strong>g each outcomeof some action due to the full set of events would need to consider 6 25 3 10 19states due to the sea-change events alone.The number of events considered can be considerably reduced by <strong>in</strong>spect<strong>in</strong>g aparticular plann<strong>in</strong>g problem. In the oil-spill doma<strong>in</strong>, only activities concerned withskimm<strong>in</strong>g or pump<strong>in</strong>g oil and lay<strong>in</strong>g booms depend on the sea state. This meansthat the only sectors of <strong>in</strong>terest are where the spill takes place and the sensitive areaspotentially threatened by the spilled oil. Although there are sixty-n<strong>in</strong>e sensitive areas<strong>in</strong> the doma<strong>in</strong>, only a maximum of three are reachable by oil from any one spill site,and the three are always <strong>in</strong> the same sea sector. This means that at most two seasectors need to be considered to evaluate any plan for any of the test problems. Whilethis method of reduc<strong>in</strong>g the sectors considered is very eective, it should be notedthat it is doma<strong>in</strong>-dependent, but us<strong>in</strong>g the plan to constra<strong>in</strong> the events is doma<strong>in</strong>-


7.4. Mutual constra<strong>in</strong>ts between the planner, evaluator and critic 109(event sea-gets-worse(params )(duration 1)(probability 0.1)(preconds(( Sea-Sector)( Sea-State)( (and Sea-State(add1 );; stop an <strong>in</strong>f<strong>in</strong>ite markov model..(< 7))))(sea-state ))(effects ()((del (sea-state ))(add (sea-state )))))Figure 7.13: The event sea-gets-worse.<strong>in</strong>dependent. Even <strong>in</strong> this doma<strong>in</strong>, the doma<strong>in</strong>-dependent method would be weakerthan the plan-based method if more sensitive areas of coastl<strong>in</strong>e were reachable, andthere is no guarantee that equivalent methods can be found for other doma<strong>in</strong>s of<strong>in</strong>terest.7.4.2 The event graphTo test the improvements <strong>in</strong> the speed of evaluat<strong>in</strong>g a plan that come from us<strong>in</strong>g theevent graph, the <strong>in</strong>itial plans from the same set of random examples were evaluatedboth with Markovcha<strong>in</strong>s built us<strong>in</strong>g the event graph technique described <strong>in</strong> Chapter 5,and directly represent<strong>in</strong>g <strong>in</strong>dividual event occurrences. The complexityofevaluat<strong>in</strong>gthe belief net depends ma<strong>in</strong>ly on the number of steps <strong>in</strong> the plan and the numberof precondition nodes required, but it also depends on the number and length of thepersistence <strong>in</strong>tervals <strong>in</strong> the plan. These are the time <strong>in</strong>tervals between the achievementof a precondition literal and its use dur<strong>in</strong>g which exogenous events may take placethat aect its value.The length <strong>in</strong> time units of the simulated plan is an <strong>in</strong>dication of the length of thepersistence <strong>in</strong>tervals with<strong>in</strong> it. For each step <strong>in</strong> the plan a duration can be calculatedfrom its b<strong>in</strong>d<strong>in</strong>gs, and this is used when steps are placed on a timel<strong>in</strong>e to create thebelief net as described <strong>in</strong> Section 4.2. The sum of these is the plan duration, s<strong>in</strong>ceWeaver does not apply any of the actions <strong>in</strong> parallel. Plans with high duration <strong>in</strong> theoil-spill doma<strong>in</strong> are typically caused by a small number of steps that move equipmentover long distances. Therefore plans with high duration may not have more stepsthan shorter plans, but will <strong>in</strong>clude many more exogenous events. If the events aremodelled explicitly, the correspond<strong>in</strong>g nodes will quickly dom<strong>in</strong>ate the time to createand evaluate the belief net. If the events are modelled with a Markov cha<strong>in</strong>, the plan


110 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong>duration must be much higher for the event l<strong>in</strong>ks to dom<strong>in</strong>ate.Figure 7.14 shows the times to create the Jensen jo<strong>in</strong> tree for the 67 plans from theset of problems used <strong>in</strong> experiments <strong>in</strong> the previous section that have a plan durationof 12 hours or less. The graph on the left uses explicit nodes for event occurrenceswhile the graph on the right uses Markovcha<strong>in</strong>s. There is a clear relationship betweenthe length of the plan and the time required to create the jo<strong>in</strong> tree when event nodesare modelled explicitly. There is no clear relationship when Markov cha<strong>in</strong>s are used.Note also that the maximum time us<strong>in</strong>g Markov cha<strong>in</strong>s is 3 seconds, compared with500 seconds for the orig<strong>in</strong>al method.5000004500004000003500003000002500002000001500001000005000000 2 4 6 8 10 12Time (msecs)3000250020001500100050005 6 7 8 9 10 11 12Simulated length of plan (hours)Figure 7.14: The times taken to create the Jensen jo<strong>in</strong> tree for the <strong>in</strong>itial planswith simulated duration less than 12 hours. On the left, explicit event nodes wereused and on the right, Markov cha<strong>in</strong>s built us<strong>in</strong>g the event graph. Times us<strong>in</strong>g theevent graph are on average less than one percent of times taken with explicit eventnodes.As Figure 7.15 shows, there is a mild <strong>in</strong>crease <strong>in</strong> times to create the Jensen jo<strong>in</strong> treefor a plan us<strong>in</strong>g Markov cha<strong>in</strong>s that is apparent when all the plans whose simulatedlength is below 1,000 hours are considered. Four outliers are removed from the gure,two of which take approximately 300 seconds while the other two have simulatedlengths above 1,000 hours but take less than 5 seconds.7.5 Doma<strong>in</strong>-<strong>in</strong>dependent heuristics <strong>in</strong> the oil-spilldoma<strong>in</strong>In Chapter 6 I mention some heuristics that can be used <strong>in</strong> the oil-spill doma<strong>in</strong>based on local approximations to maximise the probability of success. They areimplemented by means of a number of control rules, which are preference controlrules that lead the planner to try preferred alternatives before the others, but donot prune the search space. Therefore the planner can nd the same solutions withor without these control rules. They only aect the speed with which the plannerreaches a good solution.


7.5. Doma<strong>in</strong>-<strong>in</strong>dependent heuristics <strong>in</strong> the oil-spill doma<strong>in</strong> 111100008000Time (msecs)60004000200000 200 400 600 800 1000Simulated time (hours)Figure 7.15: The times taken to create the Jensen jo<strong>in</strong> tree us<strong>in</strong>g Markov cha<strong>in</strong>sfor all <strong>in</strong>itial plans with simulated duration less than 1,000 hours.The control rules can be grouped <strong>in</strong>to two categories: those that prefer closerth<strong>in</strong>gs and those that prefer bigger th<strong>in</strong>gs. Four control rules select operator b<strong>in</strong>d<strong>in</strong>gsthat choose closer objects | closer equipment, a closer transport boat or a closerport if the leak<strong>in</strong>g vessel is to be towed to safety. These rules typically lead to plansthat do not necessarily have fewer steps, but which have a shorter duration <strong>in</strong> termsof the sum of the durations of their steps. Four control rules choose larger pieces ofequipment | a unit of boom which is long enough to surround the vessel or protectthe sensitive area and a skimmer or pump that has enough capacity to negate theow. One control rule is <strong>in</strong> both categories, maximis<strong>in</strong>g the ratio of boom size to itsdistance from the target area.As Figure 7.16 shows, the control rules lead to a signicant improvement <strong>in</strong> theaverage probabilities of plans found <strong>in</strong> eight iterations of improvement. The gureshows the mean probabilities for 50 random problems, a subset of the problems used<strong>in</strong> Section 7.3. Us<strong>in</strong>g the control rules, the average probability reaches 0.79, and itreaches 0.61 without them.The fact that both probability curves appear to have become at <strong>in</strong>dicates theimportance of the decisions made early by the control rules. S<strong>in</strong>ce the planner has thesame search space with and without the control rules, they will ultimately convergeto the same values. However the trials without us<strong>in</strong>g the control rules show no<strong>in</strong>dication <strong>in</strong> eight iterations of backtrack<strong>in</strong>g over the choices of equipment that leadto the <strong>in</strong>itially poorer plans. Improv<strong>in</strong>g Weaver's ability to backtrack over these


112 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong>0.90.8Without rulesWith rules0.7Probability of success0.60.50.40.30.20.100 1 2 3 4 5 6 7 8IterationFigure 7.16: With and without greedy control rules.choices is an <strong>in</strong>terest<strong>in</strong>g area for further research.7.6 Analogical replayIn Chapter 6 I described the use of analogical replay to share plann<strong>in</strong>g eort betweenbranches of a conditional plan. This section demonstrates the eectiveness of analogicalreplay <strong>in</strong> the oil-spill doma<strong>in</strong>. As <strong>in</strong> the previous sections, I beg<strong>in</strong> with somecase studies that illustrate the technique <strong>in</strong> this doma<strong>in</strong>, and then describe averageperformance improvements over a sample population of problems drawn from thedoma<strong>in</strong>.7.6.1 Case studiesAnalogical replay allows Weaver to share the computational resources required tocreate a plan between dierent branches of a conditional plan. The technique is usefulwhenever a segment of a larger plan is repeated <strong>in</strong> dierent conditional branches butthe segment was not created before the branch<strong>in</strong>g action was applied, as shown <strong>in</strong>Chapter 6.


7.6. Analogical replay 113Replay with<strong>in</strong> one clean-up operation.Repeat<strong>in</strong>g segments of a plan after a conditional branch can frequently occur <strong>in</strong> theclean-up operations for a s<strong>in</strong>gle spill. The rst example used <strong>in</strong> this chapter canillustrate this if the top-level goals are treated by the planner <strong>in</strong> the reverse orderand a conditional branch is aga<strong>in</strong> made on the predicate threatened-shorel<strong>in</strong>e. Inthe orig<strong>in</strong>al example, shown <strong>in</strong> Table 7.2, some work is <strong>in</strong>itially done to achieve thegoal (no-discharge ss-weany spill golden-gate) and after it is completed it isuncerta<strong>in</strong> which of the two shorel<strong>in</strong>es pt-bonita-cove or rodeo-lagoon is threatenedbecause of the external event threatened-shore-changes shown <strong>in</strong> Figure 7.3. Inthat case the result<strong>in</strong>g plan after one conditional branch is added (Table 7.4) is:(move-response-equipment-by-sea1b cargo-pump2 golden-gate richmond-portutility-boat-2)(move-to-sea-sector-from-port tank-barge2 richmond-port golden-gate)(cargo-transfer-oil-to-stabilize ss-weany spill golden-gatetank-barge2 cargo-pump2 600 120)IF (threatened-shorel<strong>in</strong>e (spill)) is <strong>in</strong> (pt-bonita-cove)(move-heavy-equip-by-ground2 tractor-2 oakland pt-bonita-cove)(move-heavy-equip-by-ground2 vac-truck2 richmond-port pt-bonita-cove)(build-berms-and-dams spill pt-bonita-cove)ELSE(move-heavy-equip-by-ground2 tractor-2 oakland rodeo-lagoon)(move-heavy-equip-by-ground2 vac-truck2 richmond-port rodeo-lagoon)(build-berms-and-dams spill rodeo-lagoon)This plan has no opportunities for analogical replay because each branch conta<strong>in</strong>sunique steps. If the planner works on the goals <strong>in</strong> the opposite order, its rst planlooks like this:(move-heavy-equip-by-ground2 tractor-2 oakland pt-bonita-cove)(move-heavy-equip-by-ground2 vac-truck2 richmond-port pt-bonita-cove)(build-berms-and-dams spill pt-bonita-cove)(move-response-equipment-by-sea1b cargo-pump2 golden-gate richmond-portutility-boat-2)(move-to-sea-sector-from-port tank-barge2 richmond-port golden-gate)(cargo-transfer-oil-to-stabilize ss-weany spill golden-gatetank-barge2 cargo-pump2 600 120)This plan can still be defeated by the event threatened-shore-changes whichcan take place dur<strong>in</strong>g the two applications of the step move-heavy-equip-by-ground2,and after Weaver adds a conditional step to account for this case its plan has the fol-


114 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong>low<strong>in</strong>g form:(move-heavy-equip-by-ground2 tractor-2 oakland pt-bonita-cove)(move-heavy-equip-by-ground2 vac-truck2 richmond-port pt-bonita-cove)(build-berms-and-dams spill pt-bonita-cove)IF (threatened-shorel<strong>in</strong>e (spill)) is <strong>in</strong> (pt-bonita-cove)ELSE(move-response-equipment-by-sea1b cargo-pump2 golden-gate richmond-portutility-boat-2)(move-to-sea-sector-from-port tank-barge2 richmond-port golden-gate)(cargo-transfer-oil-to-stabilize ss-weany spill golden-gatetank-barge2 cargo-pump2 600 120)(move-heavy-equip-by-ground2 tractor-2 pt-bonita-cove rodeo-lagoon)(move-heavy-equip-by-ground2 vac-truck2-richmond-1000(build-berms-and-dams spill rodeo-lagoon)pt-bonita-cove rodeo-lagoon)(move-response-equipment-by-sea1b cargo-pump2 golden-gate richmond-portutility-boat-2)(move-to-sea-sector-from-port tank-barge2 richmond-port golden-gate)(cargo-transfer-oil-to-stabilize ss-weany spill golden-gatetank-barge2 cargo-pump2 600 120)In this plan, the steps taken to achieve (no-discharge ss-weany spill golden-gate)appear <strong>in</strong> both branches, and are found <strong>in</strong>dependently unless analogical replay is used.In this <strong>in</strong>stance, of 169 search nodes that are present <strong>in</strong> the nal plan, 71 are derivedfrom analogical replay, or about 42%. The total node number is high because a largenumber of <strong>in</strong>ference rules are used <strong>in</strong> the plan, which have not been shown here.7.6.2 Experiments <strong>in</strong> the oil-spill doma<strong>in</strong>On 20 randomly generated examples, the average proportion of nodes replayed was0.53. 7 of the trials formed a conditional branch, these had an average replay proportionof 0.36. 13 of the trials formed a protection, and these had an average replayof 0.64. This set of trials is shown <strong>in</strong> Table 7.7.


7.6. Analogical replay 115Problem Nodes Replayed Proportion Type1 53 35 0.66 protection2 62 43 0.69 protection3 48 32 0.67 protection4 52 35 0.67 protection5 51 35 0.69 protection6 40 11 0.27 context7 40 11 0.27 context8 35 11 0.31 context9 23 5 0.22 context10 75 32 0.43 context11 44 29 0.66 protection12 76 32 0.42 context13 72 29 0.4 context14 50 33 0.66 protection15 42 26 0.62 protection16 48 29 0.6 protection17 48 29 0.6 protection18 46 29 0.63 protection19 42 25 0.59 protection20 48 29 0.6 protectionTable 7.7: Proportion of nodes replayed by derivational analogy <strong>in</strong> randomlygenerated problems.


116 Chapter 7. Experimental results <strong>in</strong> the Oil-spill doma<strong>in</strong>


Chapter 8ConclusionsThis thesis presentsanovel planner that can build plans to meet a threshold probabilityof success given a representation of uncerta<strong>in</strong>ty <strong>in</strong> the form of probabilisticaction outcomes, a probability distribution over possible <strong>in</strong>itial states and probabilistic<strong>in</strong>formation about possible exogenous events that can aect the plann<strong>in</strong>g doma<strong>in</strong>.The representation of uncerta<strong>in</strong>ty was dened <strong>in</strong> terms of a Markov decision processmodel.The plann<strong>in</strong>g algorithm harnesses a novel conditional planner that extends prodigy4.0, and as such is able to make use of control rules, mach<strong>in</strong>e learn<strong>in</strong>g techniquesand a graphical user <strong>in</strong>terface [Veloso et al. 1995]. The conditional planner attemptsto create a plan that is certa<strong>in</strong> to succeed given a set of non-determ<strong>in</strong>istic actions. Aplan critic controls the planner by remov<strong>in</strong>g many of the sources of uncerta<strong>in</strong>ty fromthe planner's version of the problem doma<strong>in</strong>, and <strong>in</strong>crementally add<strong>in</strong>g back sourcesto improve the probability of success of the nal plan. This system has been tested<strong>in</strong> some synthetic doma<strong>in</strong>s and also <strong>in</strong> a large plann<strong>in</strong>g doma<strong>in</strong> for clean<strong>in</strong>g up oilspilled from a tanker near the coast of California.8.1 Contributions of the thesisThe ma<strong>in</strong> contributions of the thesis are: A representation and approach for plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty with exogenousevents. This is a novel problem area for an AI planner. Anovel, hybrid representation for the comput<strong>in</strong>g the plan's probability us<strong>in</strong>gbelief nets and Markov cha<strong>in</strong>s; a way to automatically decompose the plan <strong>in</strong>tocomponent Markov cha<strong>in</strong>s and recomb<strong>in</strong>e them us<strong>in</strong>g the net. Doma<strong>in</strong>-<strong>in</strong>dependent heuristics for plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty <strong>in</strong> order to applytechniques for plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty to large, realistic plann<strong>in</strong>g problems.117


118 Chapter 8. Conclusions The application of derivational analogy to plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty, asademonstration of how exist<strong>in</strong>g mach<strong>in</strong>e learn<strong>in</strong>g techniques can be used to improvethe performance of a probabilistic planner. Scal<strong>in</strong>g a system for plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty to a real-world doma<strong>in</strong>. The oilspilldoma<strong>in</strong> has more than 50 operators, a large state space and many sourcesof uncerta<strong>in</strong>ty. Typical plans range <strong>in</strong> length from 30 to 50 steps and may have3 or more conditional branches.8.2 Future research directionsThe thesis has provided a solid foundation for an excit<strong>in</strong>g and important topic forplann<strong>in</strong>g systems. While it gives a promis<strong>in</strong>g demonstration of plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty<strong>in</strong> large doma<strong>in</strong>s, it raises a number of questions and leaves room for furtherwork <strong>in</strong> many areas. Some of these are listed below. Tighter <strong>in</strong>tegration of plan creation and evaluation, and an evaluation of thespectrum of approaches from complete to very loose <strong>in</strong>tegration. Some of thepower of the system can be claimed to come from judicious use of plan evaluation.However the delayed evaluation sometimes causes the planner to spendprecious computational resources <strong>in</strong> parts of the search space which mightquickly be seen to yield only low-probability solutions if an evaluation wasmade earlier <strong>in</strong> the process. Work on meta-reason<strong>in</strong>g <strong>in</strong> the context of plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty. Anexplicit consideration of the tradeos between time spent plann<strong>in</strong>g and timespent evaluat<strong>in</strong>g the plan will allow exploration of ways to reason about how toallot the available computation most eectively between these processes. Forexample, an upper bound for the contribution of a conditional branch tothetotal probability of success is the probability that the ranch is reached. Ifthis is determ<strong>in</strong>ed to be small, the computation of the exact probability couldbe passed over <strong>in</strong> favour of more plann<strong>in</strong>g work to improve the probability ofsucces on more likely branches. A decision-theoretic framework for mak<strong>in</strong>g suchdecisions such as developed <strong>in</strong> [Russell & Wefald 1991] and [Zilberste<strong>in</strong> 1993]could be used. More research is needed <strong>in</strong> mach<strong>in</strong>e learn<strong>in</strong>g and plann<strong>in</strong>g, us<strong>in</strong>g derivationalanalogy and other forms of mach<strong>in</strong>e learn<strong>in</strong>g such as learn<strong>in</strong>g search control rulesfrom experience. Some prelim<strong>in</strong>ary work has shown the potential to compileexperience ga<strong>in</strong>ed from the probabilistic evaluation of plans <strong>in</strong>to search controlfor the planner although it largely avoids probabilistic evaluations [Blythe &Veloso 1996]. This approach may be one way to improve the <strong>in</strong>tegration of theplan creation and plan evaluations modules.


8.2. Future research directions 119 Although this thesis concentrated on a planner based on Prodigy 4.0, it is <strong>in</strong>pr<strong>in</strong>ciple quite possible to apply the same ideas to other plann<strong>in</strong>g algorithms.Despite promis<strong>in</strong>g work <strong>in</strong> plann<strong>in</strong>g <strong>under</strong> uncerta<strong>in</strong>ty with partial-order andhierarchical task network planners [Draper, Hanks, & Weld 1994; Haddawy,Doan, & Goodw<strong>in</strong> 1995], exogenous events have not yet been addressed <strong>in</strong> thesesystems. Experience with Weaver suggests that a planner based on iterativerepair pr<strong>in</strong>ciples such as [Ambite & Knoblock 1997; Kautz & Selman 1996]could be very appropriate for probabilistic plann<strong>in</strong>g. Weaver computes a lower bound on the probability of success of a plan, improv<strong>in</strong>gits speed by (1) ignor<strong>in</strong>g fortuitous exogenous events that are not explicitlyused by the conditional planner and (2) assum<strong>in</strong>g that a plan fails if any componentstep fails to execute. However it still computes an exact probability forthe belief net that it generates. S<strong>in</strong>ce the plann<strong>in</strong>g task is to pass a thresholdprobability, and s<strong>in</strong>ce only comparative values are needed for Weaver tochoose between alternative plan improvements, algorithms that compute rangesof probabilities from the belief net can lead to signicant performance improvements.


120 Chapter 8. Conclusions


Appendix AProofs of theoremsA.1 The plan's belief net provides a lower boundon its probability of success.Theorem: Let be a solution to a plann<strong>in</strong>g problem for which a belief net B is constructedbyWeaver. If B predicts a probability of success p, then the true probabilityof success as given by the <strong>in</strong>terpretation of as a policy on the <strong>under</strong>ly<strong>in</strong>g Markovdecision process is at least p.I show the result for non-branch<strong>in</strong>g plans , s<strong>in</strong>ce the predicted probability ofsuccess for a branch<strong>in</strong>g plan is the sum of such belief nets.I will show that for each complete assignment to the variables of the belief netB that assigns the value true to every action node <strong>in</strong> the plan, there is a set ofpaths through the MDP that agree with the assignment toBon the value of allthe uent, action and event nodes and whose comb<strong>in</strong>ed conditional probability mass,conditioned on the <strong>in</strong>itial state distribution, is at least that of the assignment toB.This is sucient to prove the result, s<strong>in</strong>ce the predicted probability of success fromB is the sum of such complete assignments.Let be such an assignment, i.e. a mapp<strong>in</strong>g from the nodes of B to values suchthat (a) =true for each action node a <strong>in</strong> B. Note that s<strong>in</strong>ce the f<strong>in</strong>ish actionnode has the value true, this assignment contributes to the belief <strong>in</strong> plan success. Forthe assignment tohave probability greater than zero, the preconditions of all actions,and of all events made true by the assignment, must be satised <strong>in</strong> the assignment.I will show that there is a set of paths <strong>in</strong> the MDP that match the assignment andhave the required conditional probability mass by <strong>in</strong>duction over the length of theplan. Let F (i) denote the set of uent nodes with time stamp i <strong>in</strong> B, and let M(i)denote the set of paths through the MDP go<strong>in</strong>g through states that are consistentwith the values of the uents <strong>in</strong> F (j) for all j i as well as with the actions and theevents e <strong>in</strong> B with (e) =true whose time stamps are less than (but not equal t)i. The <strong>in</strong>ductive strategy is to show that at each time po<strong>in</strong>t i, the probability mass121


122 Appendix A. Proofs of theoremsof the paths <strong>in</strong> M(i) is at least that of the restriction of the assignment to nodeswith time stamps before or equal to i, which I shall denote j i .Base case: M(0) consists of states <strong>in</strong> the MDP consistent with the assignmentto F (0). By the construction of the belief net B, the probability of the assignment toF(0), j 0 , is equal to the probability ofF(0) <strong>under</strong> the <strong>in</strong>itial state distribution. Thisis trivially equal to the conditional probability of the match<strong>in</strong>g states <strong>in</strong> the MDPgiven the <strong>in</strong>itial state distribution.Inductive step: Assum<strong>in</strong>g that the conditional probability mass of M(j) isatleast the probability that B gives to the assignment j j for all j i, we need to showthe same for M(i + 1) and j i+1 .Consider any event or action that takes place at time i and accord<strong>in</strong>g to hassome outcome o. Its preconditions appear <strong>in</strong> F (i) and so are satised <strong>in</strong> the nal stateof any path <strong>in</strong> M(i). Therefore the probability of a transition from any such statethat adds pend<strong>in</strong>g eects correspond<strong>in</strong>g to the correct outcome is the probabilityof the event tak<strong>in</strong>g place multiplied by the probability of the outcome. This is byconstruction the same as the conditional probability that the event or action nodehas the outcome o <strong>in</strong> the belief net B given the values <strong>in</strong> F (i). S<strong>in</strong>ce the events andactions that take place at time i do so with <strong>in</strong>dependent probabilities given F (i), theprobability mass of all paths that satisfy all these events and actions is above therequired value by the <strong>in</strong>ductive hypothesis.F<strong>in</strong>ally, if a path extends a path from M(i) to match the required events andactions from j i+1 , then it will also match the uent nodes F (i + 1). This is becausethe uent nodevalues are determ<strong>in</strong>ed by the pend<strong>in</strong>g eects from these events andactions or earlier ones <strong>in</strong> M(i). If some other action or event could alter one of thesevalues, it would have correspond<strong>in</strong>g nodes <strong>in</strong> the belief net.This argument establishes the required <strong>in</strong>equality. It does not guarantee equalitybecause the belief net construction algorithm does not search for events that maycause uent nodes to have their required values for the plan to succeed, only thosethat could cause other values. Therefore their may be paths <strong>in</strong> the MDP that reacha goal state even though not all the steps <strong>in</strong> the plan succeed.A.2 Event graphsTheorem: All queries about the probability of a logical expression over a set of literalsV L, after some xed number of transitions n given an <strong>in</strong>itial state probabilitydistribution over the full state space will have the same value <strong>in</strong> the submodel<strong>in</strong>duced byV as <strong>in</strong> the full model of the doma<strong>in</strong>.It is sucient to show that probabilities of all conjunctions <strong>in</strong>volv<strong>in</strong>g each literal <strong>in</strong>V or its negation will be the same <strong>in</strong> each model, s<strong>in</strong>ce any expression can be expressedas the disjunction of such expressions, whose probabilities can be summed s<strong>in</strong>ce theyare mutually exclusive. I rst show that a cha<strong>in</strong> M r built of the subgraph of theevent graph conta<strong>in</strong><strong>in</strong>g the variables <strong>in</strong> V and their ancestors <strong>in</strong> the event graph will


A.2. Event graphs 123compute the correct value, and then show<strong>in</strong>g that the same value can be computedby creat<strong>in</strong>g <strong>in</strong>dependent cha<strong>in</strong>s for the separate components of the subgraph.Each full conjunctive expression corresponds to a unique state of M r . For eachsuch state i, let F (i) denote the set of states <strong>in</strong> the full model M f that agree withi on the literals <strong>in</strong> V . Let p n be the probability that the reduced cha<strong>in</strong> M r;(i;j) r is <strong>in</strong>state j after n steps given that it began <strong>in</strong> state i, and let p n be the probabilityf;(x;y)that the full model M f is <strong>in</strong> state y after n steps given that it began <strong>in</strong> state x.A situation <strong>in</strong> which M r is <strong>in</strong> state i may correspond to any probability distributionP (:) over states <strong>in</strong> F (i) for M f , where P x2F (i) P (x) = 1. Thus given states i and j<strong>in</strong> M r ,we need to show that, for all n and for all P (:),X Xp n = r;(i;j)P (x) (A:1)tox2F (i)y2F (j)p n f;(x;y)By choos<strong>in</strong>g dierent probability distributions it can be seen that this is equivalentp n r;(i;j)=Xy2F (j)p n f;(x;y)8x 2 F (i) (A:2)s<strong>in</strong>ce we can choose P (x) = 1 for any x 2 F (i). Conversely if equation (2) is true, anyl<strong>in</strong>ear comb<strong>in</strong>ation given by a probability distribution will also satisfy the equality,so equation (1) is true also.We use <strong>in</strong>duction on n. The base case to prove is for n =1:Let E r be the set of events <strong>in</strong> the subgraph of the event graph used to build M r ,and let p(e; i; j) be the probability of the eect <strong>in</strong> e(e; i) that changes the literalsaected by e to match j, ifsuch an eect exists, and let p(e; i; j) = 0 otherwise. Bydenition, p 1 r;(i;j) = Q e2E rp(e; i; j)Similary for any x 2 F (i) and for any y 2 M f , p 1 = Q f;(x;y) e2E p(e; x; y). We canarrange this product <strong>in</strong> terms of events <strong>in</strong> E r and the rest:YYp 1 f;(x;y)= p(e; x; y) p(e 0 ;x;y)e2E r e 0 2E,E rNow the events <strong>in</strong> E r only eect the literals <strong>in</strong> M r and are completely determ<strong>in</strong>edby them, by the construction of the event subgraph. So if y 2 F (j), p(e; x; y) =p(e; i; j) and the rst term is just p 1 r;(i;j) .SowehaveXy2F(j)p 1 f;(x;y)= p 1 r;(i;j)Xy2F (j)Ye 0 2E,E rp(e 0 ;x;y)8x2 F(i)Also by the construction of the event subgraph, the events <strong>in</strong> E , E r do not alterany of the literals <strong>in</strong> M r . So for x 2 F (i), if we x the outcomes of events <strong>in</strong> E r tolead to j, all transitions with non-zero probability <strong>in</strong>M f must lead to a state <strong>in</strong> F (j).Then <strong>under</strong> these circumstances P y2F (j)is proved.Qe 0 2E,E r= 1, and the <strong>in</strong>duction base case


124 Appendix A. Proofs of theoremsNow suppose the result is true for n m , 1, so that for any x 2 F (i) andz 0 2 F (k),Xp m r;(i;j)=p m,1r;(i;k) p1 r;(k;j)k2M r= Xk2M r( Xz2F (k)p m,1f;(x;z) )( Xy2F (j)p 1 f;(z 0 ;y) )We can re-arrange the summands on the right hand side, and choose z 0separately for each z 2 F (k) togetX X Xp m = r;(i;j)k2M r z2F (k) y2F (j)p m,1f;(x;z) p1 f;(z;y)= zs<strong>in</strong>ce each state <strong>in</strong> M f is <strong>in</strong> F (k) for some k 2 M r ,X Xp m r;(i;j)=z2M f y2F (j)= Xy2F (j)= Xy2F (j)Xp m,1f;(x;z) p1 f;(z;y)z2M fp m,1f;(x;z) p1 f;(z;y)p m f;(x;y)which is the desired result.For the second stage of the proof we need to show that separate Markov cha<strong>in</strong>sderived from the components of the event graph's subgraph and treated <strong>in</strong>dependentlywill yield the same result as a calculation <strong>in</strong> M r . The proof uses the same technique as<strong>in</strong> the rst stage, not<strong>in</strong>g that the events <strong>in</strong> each component eect dierent literals andare determ<strong>in</strong>ed by dierent literals from all the other components, so the transitionprobabilities can be split <strong>in</strong> the same way.


Appendix BThe Oil-spill doma<strong>in</strong>This appendix conta<strong>in</strong>s the Weaver encod<strong>in</strong>g of the oil-spill doma<strong>in</strong> and a specicationof the algorithm used to generate random examples for Chapter 7. The doma<strong>in</strong> encod<strong>in</strong>g<strong>in</strong>cludes operators, <strong>in</strong>ference rules, external events, control rules and b<strong>in</strong>d<strong>in</strong>gsfunctions. The orig<strong>in</strong>al encod<strong>in</strong>g of this doma<strong>in</strong> <strong>in</strong> sipe can be found <strong>in</strong> [Desimone &Agosta 1994]. In order to keep the specication <strong>in</strong> this document manageable, severalparts have been omitted or contracted. Many simple operators have no preconditions.These have been listed simply by name and eects along with a comment. The specicationof the doma<strong>in</strong> problem scenario provided by sri and used as the basis of therandom problem generator, conta<strong>in</strong><strong>in</strong>g over 1500 predicates, has been ommitted. Forfull details, contact the author by email at jblythe@cs.cmu.edu.B.1 Doma<strong>in</strong> specication(<strong>in</strong>f<strong>in</strong>ite-type numerical #'numberp);;; All the names are read <strong>in</strong> as str<strong>in</strong>gs <strong>in</strong> the object fields(<strong>in</strong>f<strong>in</strong>ite-type name #'str<strong>in</strong>gp);;; Level 1(<strong>in</strong>ference-rule respond-to-spill-coastal(params )(preconds(( Spilled-oil)( Sea-sector)( Vessel))(and (no-discharge )(amount-spilled 0)(~ (unprotected-sensitive-area ))))(effects ()125


126 Appendix B. The Oil-spill doma<strong>in</strong>((add (respond-to-spill )))))(<strong>in</strong>ference-rule clean-up-spill(params )(preconds(( Sea-Sector)( Spilled-oil)( (and Numerical (remove-oil-steps ))))(remove-oil-steps ))(effects ()((add (amount-spilled 0)))))(<strong>in</strong>ference-rule protect-sensitive-areas(params )(preconds(( Spilled-oil))(forall (( Sensitive-area))(or (~ (threatened-shorel<strong>in</strong>e ))(protect-shore-steps 1))))(effects ()((del (unprotected-sensitive-area )))));;; Level 2;;; Stop when the discharge-rate is below an acceptable m<strong>in</strong>imum.(<strong>in</strong>ference-rule end-stabilize(params )(preconds(( Vessel)( Spilled-oil)( sea-sector)( (and Numerical(gfp (discharge-rate )))))(stabilize-discharge-rate ))(effects ()((add (no-discharge )))))(<strong>in</strong>ference-rule beg<strong>in</strong>-stabilize ; no preconditions(preconds (( (and Numerical (acceptable-discharge-rate )))))((add (stabilize-discharge-rate )))))(operator cargo-transfer-oil-to-stabilize(params )(preconds(( Vessel)( Spilled-oil)( Sea-sector)( (and Numerical (> 0)))


B.1. Doma<strong>in</strong> specication 127( (and Cargo-transfer-pump(not (gfp (<strong>in</strong>-use )))))( Tank-barge)((and Numerical(gfp (capacity-bbl-per-hr ))))( (and Numerical(gfp (barge-capacity-bbl ))))( (and Numerical(gfp (max-sea-state ))))( (and Numerical (sub )));; Although not used, will stop the operator be<strong>in</strong>g considered if;; the literal is not present.((and Numerical(gfp (discharge-size )))))(and (~ (<strong>in</strong>-use ))(located )(located )(sea-state-calmer )(stabilize-discharge-rate )))(effects ()((add (oil-pumped ))(add (did-cargo-transfer-oil ))(add (<strong>in</strong>-use ))(add (stabilize-discharge-rate )))))(operator stabilize-discharge-by-conta<strong>in</strong>-skim(params )(preconds(( Vessel)( Spilled-oil)( Sea-sector);; the call to numberp forces to be bound.( (and Numerical (numberp ) (> 0)))( (and Sea-state (gfp (sea-state ))))( (and Numerical(gfp (vessel-boom-length )))))(and (boom-level>= )(portable-skim-level>= )))(effects ()((add (boom-assembled ))(add (stabilize-discharge-rate )))))


128 Appendix B. The Oil-spill doma<strong>in</strong>(operator stabilize-discharge-by-tow<strong>in</strong>g-to-port(params )(duration(let ((distance (or (known-unique '(distance ))100))(speed (or (known-unique '(tow-speed-knots )) 2)))(/ distance speed)))(preconds(( Vessel)( Spilled-oil)( Sea-sector)( (and Numerical (> 0)))( (and Numerical(gfp (displacement ))))( (and Seaport (berth-greater )))( (and Tug (diff ))))(and (~ (<strong>in</strong>-use ))(located )))(effects ()((add (stabilize-discharge-rate ))(add (tanker-towed )))));;; OIL CONTAINMENT, COUNTERMEASURES, AND RECOVERY OPERATORS(<strong>in</strong>ference-rule beg<strong>in</strong>-remove-oil ;; no preconditions(effects () ((add (remove-oil-steps 0)))))(<strong>in</strong>ference-rule open-water-recovery(params )(preconds(( Spilled-oil)( (and Sea-sector(get-covered-sector )))( (and Numerical (> 0)))( (and Numerical (sub1 ))))(and (remove-oil-steps )(recovery-open-water )))(effects () ((del (remove-oil-steps ))(add (remove-oil-steps )))))(operator perform-open-water-recovery(params )(preconds(( Spilled-oil)((and Sea-sector (get-covered-sector )))( Vessel)(


B.1. Doma<strong>in</strong> specication 129(and Numerical(gfp (discharge-rate ))))((and Numerical(gfp (amount-spilled )))))(and (mobile-skim-level>= )(store-level>= )))(effects ()((add (recovery-open-water ))))(duration (/ )))(<strong>in</strong>ference-rule shallow-water-recovery(params )(preconds(( Spilled-oil)((and Sea-sector (get-covered-sector )))((and Sea-state (gfp (sea-state ))))( (and Numerical (> 0)))( (and Numerical (sub1 ))))(and (remove-oil-steps )(recovery-shallow-water )))(effects ()((del (remove-oil-steps ))(add (remove-oil-steps )))))(operator perform-shallow-water-recovery(params )(preconds(( Spilled-oil)((and Sea-sector (get-covered-sector )))( (and Sea-state (gfp (sea-state ))))((and Numerical(gfp (boom-length ))))( vessel)((and Numerical(gfp (discharge-rate ))))((and Numerical(gfp (amount-spilled )))))(and (boom-level>= )(portable-skim-level>= )(store-level>= )))(effects ()((add (boom-assembled ))(add (recovery-shallow-water )))))(<strong>in</strong>ference-rule apply-chemical-dispersant


130 Appendix B. The Oil-spill doma<strong>in</strong>(params )(preconds(( spilled-oil)((and sea-sector (get-covered-sector )))( (and Numerical (> 0)))( (and Numerical (sub1 ))))(and (remove-oil-steps )(apply-dispersant )))(effects()((del (remove-oil-steps ))(add (remove-oil-steps )))))(operator get-chem-dispersant(params )(duration 1)(preconds(( spilled-oil)( sea-sector)( Dispersant))(and (~ (use-prohibited ))(located )))(effects()((add (apply-dispersant )))))(<strong>in</strong>ference-rule <strong>in</strong>-situ-oil-burn<strong>in</strong>g(params )(preconds(( spilled-oil)((and sea-sector (get-covered-sector )))( (and Numerical (> 0)))( (and Numerical (sub1 ))))(and (remove-oil-steps )(burn-oil-<strong>in</strong>-situ )))(effects()((del (remove-oil-steps ))(add (remove-oil-steps )))))(operator perform-<strong>in</strong>-situ-burn<strong>in</strong>g ;; no preconditions(duration 1)(effects () ((add (burn-oil-<strong>in</strong>-situ )))));;; SHORE PROTECTION AND CLEAN UP OPERATORS(<strong>in</strong>ference-rule start-protect<strong>in</strong>g-shore;; no preconditions(effects ((add (protect-shore-steps 0)))))


B.1. Doma<strong>in</strong> specication 131(operator shore-exclusion-boom<strong>in</strong>g(params )(preconds(( spilled-oil)( sensitive-area)((and Land-sector (mygfp 'located-with<strong>in</strong> )))((and Sea-sector (adjacent )))((and Numerical(gfp (exclusion-boom-required ))))((and Numerical(gfp (boom-personnel-required ))))( (and Numerical (> 0)))( (and Numerical (sub1 ))))(and (protect-shore-steps )(boom-level>= )))(effects ()((del (protect-shore-steps ))(add (protect-shore-steps ))(add (boom-personnel-deployed ))(add (boom-assembled ))(add (barrier-provided )))))(operator shore-diversion-boom<strong>in</strong>g(params )(preconds(( spilled-oil)( sensitive-area)((and Land-sector (mygfp 'located-with<strong>in</strong> )))((and Sea-sector (adjacent )))((and Numerical (gfp (amount-spilled ))))((and Numerical(gfp (diversion-boom-required ))))((and Numerical(gfp (boom-personnel-required ))))( (and Numerical (> 0)))( (and Numerical (sub1 ))))(and (protect-shore-steps )(boom-level>= )(vacuum-level>= )))(effects ()((del (protect-shore-steps ))(add (protect-shore-steps ))


132 Appendix B. The Oil-spill doma<strong>in</strong>(add (boom-personnel-deployed ))(add (boom-assembled ))(add (barrier-provided )))))(operator berms-and-dams(params )(preconds(( spilled-oil)( sensitive-area)( (and Numerical (> 0)))( (and Numerical (sub1 )))((and Numerical (gfp (shore-personnel-required ))))( (and Land-sector(mygfp 'located-with<strong>in</strong> )))( (and Sea-sector (adjacent )))((and Numerical (gfp (amount-spilled )))))(and (protect-shore-steps )(barrier-provided )(vacuum-level>= )))(effects ()((del (protect-shore-steps ))(add (protect-shore-steps ))(add (shore-personnel-deployed )))))(operator cleanup-shore(params )(duration 1)(preconds(( spilled-oil)( sensitive-area)( (and Numerical (> 0)))( (and Numerical (sub1 )))((and Numerical (gfp (shore-personnel-required ))))( (and Land-sector(mygfp 'located-with<strong>in</strong> )))( (and Sea-sector(adjacent )))((and Numerical (gfp (amount-spilled ))))((and Numerical (gfp (boom-personnel-required ))))((and Numerical (gfp (conta<strong>in</strong>ment-boom-required)))))


B.1. Doma<strong>in</strong> specication 133(and (protect-shore-steps )(boom-level>= )(vacuum-level>= )))(effects ()((del (protect-shore-steps ))(add (protect-shore-steps ))(add (boom-personnel-deployed ))(add (boom-assembled ))(add (barrier-provided ))(add (shore-personnel-deployed )))));;; RECONNAISSANCE OPERATORS(operator recon-polluted-sectors-by-sea ;; no preconditions(effects () ((add (recon-performed )))))(operator recon-polluted-sectors-by-air ;; no preconditions(effects () ((add (recon-performed )))));;; LEVEL 3:(<strong>in</strong>ference-rule enough-boom1;; no preconditions(effects () ((add (boom-level>= )))))(<strong>in</strong>ference-rule enough-boom2;; no preconditions(effects () ((add (boom-level>= )))));;; get-boom-to-conta<strong>in</strong>-vessel(operator get-boom-to-conta<strong>in</strong>-vessel(params )(preconds(( Vessel)( (and Numerical (numberp )(> 0)))( sea-sector)( Sea-state)( (and Boom (gfp (<strong>in</strong>-service ))(sea-state-greater )))( (and Numerical(gfp (max-sea-state ))))( (and Numerical (gfp (length-boom-ft ))))((and Numerical (sub ))))(and (~ (boom-deployed ))(located )(sea-state-calmer )(boom-level>= )))(effects ()((del (boom-level>= ))(add (boom-level>= ))(add (boom-assembled ))


134 Appendix B. The Oil-spill doma<strong>in</strong>(add (boom-deployed )))));;; get-boom-to-sea-sector(operator get-boom-to-sea-sector(params )(preconds(( sea-sector)( sea-state)( (and numerical (> 0)))( (and Boom (gfp (<strong>in</strong>-service ))(sea-state-greater )))((and Numerical (gfp (max-sea-state ))))( (and Numerical (gfp (length-boom-ft ))))((and Numerical (sub ))))(and (sea-state-calmer )(located )(boom-level>= )))(effects ()((del (boom-level>= ))(add (boom-level>= ))(add (boom-deployed )))));;; get-boom-to-sensitive-area(operator get-boom-to-sensitive-area(params )(preconds(( sensitive-area)( (and Numerical (> 0)))( sea-sector)( (and Numerical (gfp (sea-state ))))( (and Boom (<strong>in</strong>-service )(sea-state-greater )))((and Numerical (gfp (max-sea-state ))))( (and Numerical (gfp (length-boom-ft ))))((and Numerical (sub ))))(and (sea-state-calmer )(located )(boom-level>= )))(effects ()((del (boom-level>= ))(add (boom-level>= ))(add (boom-deployed )))))(<strong>in</strong>ference-rule beg<strong>in</strong>-skim ;; no preconditions(preconds(( (and Numerical (>= 0 )))))


B.1. Doma<strong>in</strong> specication 135(effects ()((add (portable-skim-level>= )))))(<strong>in</strong>ference-rule beg<strong>in</strong>-skim2 ;; no preconditions(preconds(( (and Numerical (>= 0 )))))(effects ()((add (portable-skim-level>= )))))(<strong>in</strong>ference-rule beg<strong>in</strong>-skim3 ;; no preconditions(preconds(( (and Numerical (>= 0 )))))(effects () ((add (mobile-skim-level>= )))));;; get-skimmer-to-skim-near-vessel(operator get-skimmer-to-skim-near-vessel(params )(duration 1)(preconds(( vessel)( sea-sector)( sea-state)( (and Numerical (> 0)))( (and Portable-skimmer(skimmer-sea-state-greater )(not (gfp (<strong>in</strong>-use )))))( (and Numerical(gfp (max-sea-state ))))((and Numerical (gfp (skim-rate-bbl-per-hr ))))((and Numerical (sub )))((and Numerical (gfp (sweep-product )))))(and(~ (<strong>in</strong>-use ))(~ (skimmer-employed )) ; sorry about the redundancy(located )(boom-assembled )(sea-state-calmer ) ; Jim 4/96(portable-skim-level>= )))(effects ()((del (portable-skim-level>= ))(add (portable-skim-level>= ));;(del (free ))(add (skimmer-employed )))))


136 Appendix B. The Oil-spill doma<strong>in</strong>;;; get-portable-skimmer-to-skim-sea-sector(operator get-portable-skimmer-to-skim-sea-sector(params )(duration 1)(preconds(( sea-sector)( sea-state)( (and Numerical (> 0)))( (and Portable-skimmer(skimmer-sea-state-greater )))((and Numerical (gfp (max-sea-state ))))((and Numerical (gfp (skim-rate-bbl-per-hr ))))((and Numerical (sub )))((and Numerical (gfp (boom-assembled )))))(and (~ (<strong>in</strong>-use ))(located )(sea-state-calmer )(portable-skim-level>= )))(effects ()((del (portable-skim-level>= ))(add (portable-skim-level>= ))(add (<strong>in</strong>-use ))(add (skimmer-employed )))))(operator get-mobile-skimmer(params )(duration 1)(preconds(( sea-sector)( (and Numerical (> 0)));; use to be generated by skimmer-sea-state-greater( self-mobile-skimmer)( (and sea-state (gfp (max-sea-state ))))((and Numerical (gfp (skim-rate-bbl-per-hr ))))((and Numerical (sub ))))(and (located )(sea-state-calmer )(~ (<strong>in</strong>-use ))(mobile-skim-level>= )))(effects ()((del (mobile-skim-level>= ))(add (mobile-skim-level>= ))


B.1. Doma<strong>in</strong> specication 137(add (skimmer-employed )))));; (store-level>= )(<strong>in</strong>ference-rule beg<strong>in</strong>-stor<strong>in</strong>g ;; no preconditions(preconds(( (and numerical (>= 0 )))))(effects () ((add (store-level>= )))))(operator get-tank-barge-to-sea-sector(params )(duration 1)(preconds(( sea-sector)( (and Numerical (> 0)))( tank-barge)( (and sea-state (gfp (max-sea-state ))))( (and Numerical (gfp (barge-capacity-bbl ))))( (and Numerical (sub ))))(and (located )(sea-state-calmer )(store-level>= )))(effects()((del (store-level>= ))(add (store-level>= ))(add (oil-pumped )))))(operator get-dracon-barge-to-sea-sector(params )(duration 1)(preconds(( sea-sector)( (and Numerical (> 0)))( dracon-barge)( (and sea-state(mygfp 'max-sea-state )))( (and Numerical (gfp (barge-capacity-bbl ))))( (and Numerical (sub ))))(and (sea-state-calmer )(located )(store-level>= )))(effects ()((del (store-level>= ))(add (store-level>= ))(add (oil-pumped )))))(operator get-tractor-to-sensitive-area


138 Appendix B. The Oil-spill doma<strong>in</strong>(params )(duration 1)(preconds(( sensitive-area)( tractor))(and (free )(located )))(effects ()((del (free ))(add (barrier-provided )))))(<strong>in</strong>ference-rule free-if-available(params )(preconds(( Env-resource)( (and Numerical (mygfp 'available ))))(and (available )(~ (freed-once ))))(effects ()((add (free ))(add (freed-once )))));;; (vacuum-level>= )(<strong>in</strong>ference-rule beg<strong>in</strong>-vacuum ;; no preconditions(preconds(( (and Numerical (>= 0 )))))(effects () ((add (vacuum-level>= )))))(operator get-vacuum-truck(params )(duration 1)(preconds(( sensitive-area)( (and Numerical (> 0)))( Vacuum-truck)( (and Numerical (gfp (capacity-bbl ))))( (and Numerical (sub ))))(and (located )(barrier-provided )(vacuum-level>= )))(effects ()((del (vacuum-level>= ))(add (vacuum-level>= ))(add (oil-removed )))));;; DEPLOYMENT OPERATIONS(operator move-self-mobile-skimmer-by-sea ;; no preconditions(preconds


B.1. Doma<strong>in</strong> specication 139(( (and seaport (mygfp 'located )))))(effects ()((del (located ))(add (located )))))(operator move-response-equipment-by-sea1a ;; no preconditions(duration(let ((dist (known-unique `(distance ))))(if dist(/ dist 15) ; assume 15 knots.10)))(preconds(( (and platform-workboat (mygfp 'located-with<strong>in</strong> )))( (and seaport (mygfp 'located )))))(effects ()((del (located ))(add (located ))(add (located )))))(operator move-response-equipment-by-sea1b ;; no preconditions(duration(let ((dist (known-unique `(distance ))))(if dist(/ dist 15) ; assume 15 knots.10)))(preconds(( (and seaport (mygfp 'located )))))(effects ()((del (located ))(add (located )))))(operator move-response-equipment-by-sea2a ;; no preconditions(params )(duration 10)(preconds(( (and platform-workboat (mygfp 'located-with<strong>in</strong> )))( (and seaport (mygfp 'located )))))(effects ()((del (located ))(add (located ))(add (located )))))(operator move-response-equipment-by-sea2b ;; no preconditions(params )(preconds(( (and seaport (mygfp 'located )))))(effects ()((del (located ))(add (located )))))(operator move-response-equipment-by-sea3a;; no preconditions


140 Appendix B. The Oil-spill doma<strong>in</strong>(preconds(( (and platform-workboat (mygfp 'located-with<strong>in</strong> )))( (and seaport (mygfp 'located )))))(effects ()((del (located ))(add (located ))(add (located )))))(operator move-response-equipment-by-sea3b ;; no preconditions(preconds(( (and seaport (mygfp 'located )))))(effects () ((add (located )))))(operator move-response-equipment-by-sea4 ;; no preconditions(preconds(( (and sea-sector(mygfp 'located )))( (and sea-sector (diff )))))(effects () ((add (located )))))(operator move-response-equipment-by-ground(params )(duration 1)(preconds(( response-equipment)( (and location (mygfp 'located )))( (and location (diff ))))(located ))(effects ()((add (located )))))(operator move-response-equipment-by-air(params )(duration 1)(preconds(( response-equipment)( (and location (mygfp 'located )))( airfield)( (and airfield (diff ))))(located ))(effects ()((del (located ))(add (located )))))(operator move-heavy-equip-by-ground1 ;; no preconditions(params )(duration 1)(preconds(( heavy-equipment)( (and location (mygfp 'located )))


B.1. Doma<strong>in</strong> specication 141( land-sector)))(effects ()((del (located ))(add (located )))))(operator move-heavy-equip-by-ground2 ;; no preconditions(params )(duration 1)(preconds(( heavy-equipment)( (and location (mygfp 'located )))))(effects ()((del (located ))(add (located )))))(operator move-team-by-ground;; no preconditions(effects () ((add (located )))))(operator move-team-by-sea ;; no preconditions(preconds(( (and seaport (mygfp 'located )))))(effects ()((del (located ))(add (located )))))(operator move-team-by-air ;; no preconditions(preconds(( (and airfield (mygfp 'located )))))(effects ()((del (located ))(add (located )))))(operator move-to-sea-sector-from-port ;; no preconditions(duration(let ((distance (or (known-unique `(distance ))100))(max-speed (or (known-unique `(max-speed )) 10)))(/ distance max-speed))) ; a little optimistic..(preconds(( (and seaport (mygfp 'located )))))(effects ()((del (located ))(add (located )))));;;============================================================;;; Weather;;;============================================================(<strong>in</strong>ference-rule calm-enough(params )(preconds(( Sea-sector)


142 Appendix B. The Oil-spill doma<strong>in</strong>( sea-state)( (and sea-state(gfp (sea-state ))(


B.1. Doma<strong>in</strong> specication 143((amount-spilled )((amount-spilled ) ))))(setf *lits-to-ignore* nil);;; A range can be handled directly <strong>in</strong> the Bayes net, which is much;;; more efficient than the other methods. Note that the l<strong>in</strong>k <strong>in</strong> the;;; BN is still done through the <strong>in</strong>ference rule (calm-enough),;;; but the table is filled <strong>in</strong> us<strong>in</strong>g the fact that it is a range.(setf *lit-ranges*'((calm-enough(sea-state-calmer )(sea-state ) 0))))(sea-state ))(effects ()((del (sea-state ))(add (sea-state )))))(event oil-spills(params )(probability 0.2)(preconds(( Spilled-oil)( Vessel)(;; not gfp because I want this to have a default value (0).(and Numerical (vessel-boom-length )))


144 Appendix B. The Oil-spill doma<strong>in</strong>( ;;(and Sea-sector (gfp (located )))Sea-sector);; uh oh. (if left <strong>in</strong> this will collapse the MC)( (and Numerical (gfp (sea-state ))))( (and Numerical(gfp (discharge-rate ))))( (and Numerical(gfp (discharge-size ))))( ;; the numberp call forces it to be bound.(and Numerical (numberp ) (< )))( (and Numerical (add-with-ceil<strong>in</strong>g );; can't spill more than there is.(= ));; sorry for the double negative <strong>in</strong> the next predicate, will fix.(~ (no-discharge ))(amount-spilled );;(~ (covered-sector ))));; taken eveyth<strong>in</strong>g out so as not to have spurious nodes <strong>in</strong> the markov;; cha<strong>in</strong>.(effects ()(;;(add (some-spilled ));; this is a trigger for the eager <strong>in</strong>ference rule below.;;(add (new-spill ))(del (amount-spilled ))(add (amount-spilled ));;(add (boom-length 100)) ; entirely arbitrary.;;(add (covered-sector )))));;; used to change each place from not threatened to threatened;;; <strong>in</strong>dependently. Now "moves" the threat from one shore to another.;;; This event highlights a problem with the syntax. I wanted to have;;; only one such event fire, and use multiple outcomes to decide;;; which new shorel<strong>in</strong>e is chosen. But s<strong>in</strong>ce there is a variable;;; number of shorel<strong>in</strong>es I can't represent them. So I will allow;;; several such events to potentially fire although they are mutually;;; exclusive because their results conflict. This means their;;; probabilities have to be figured as mutually exclusive events;;; although they're represented here as the <strong>in</strong>dependent;;; probabilities. For example, two of these th<strong>in</strong>gs at 0.1 probability;;; means p(event 1) = 0.095 p(event 2) = 0.095 p(noth<strong>in</strong>g) =(event threatened-shore-changes(params )(probability 0.1)(preconds(( Spilled-oil)


B.1. Doma<strong>in</strong> specication 145( Sea-Sector);; Strictly, these b<strong>in</strong>d<strong>in</strong>gs should make this event depend on oil-spills.;; Syntactically they don't because this requires the predicate to;; be present with any amount, and so does oil-spills.( (and Numerical(gfp (amount-spilled ))(> 0)))( (and Sensitive-Area(potentially-threatened )))( (and Sensitive-Area(potentially-threatened )(diff ))))(~ (threatened-shorel<strong>in</strong>e )))|#;; the above is better when this is not a functional literal, below;; is better when it is.(threatened-shorel<strong>in</strong>e ))(effects ()((del (threatened-shorel<strong>in</strong>e ))(add (threatened-shorel<strong>in</strong>e )))))(setf *events-not-to-negate*'(sea-gets-better ; K<strong>in</strong>g Canute might have someth<strong>in</strong>gsea-gets-worse ; to say about thisthreatened-shore-changes));;;============================================================;;; Control rules;;;============================================================;;; This is a general one for conditional plann<strong>in</strong>g, that prefers goals;;; without contexts over goals with them. This might not w<strong>in</strong> <strong>in</strong>;;; general, but it does seem to <strong>in</strong> this doma<strong>in</strong> (so far).(control-rule prefer-no-context(if (and (candidate-goal )(goal-has-context )(~ (goal-has-context ))))(then prefer goal ))(defun goal-has-context (goal)(if (p4::lit-context goal) t nil));;; On the other hand if a goal becomes unsolvable <strong>in</strong> some context, I;;; want to quickly work on the parent goal.;;; The specialization makes the planner <strong>in</strong>complete, because one;;; option is to wait for the sea to calm, but I want to quickly shut;;; plann<strong>in</strong>g down on that path because it will have low probability.;;; Note: this rule must have priority over the rule below that;;; selects the sea-state-calmer rule only.


146 Appendix B. The Oil-spill doma<strong>in</strong>(control-rule promote-parent-goal-<strong>in</strong>-context(if (and (candidate-goal (sea-state-calmer ));; May as well select it anyway s<strong>in</strong>ce order won't matter.(known (sea-state ))(> );;(sea-state-goal-has-unselected-context-parent;; ;;);;)(context-affected-goals-are-parents-and-none-is-expanded )))(then select goal ));;; If we're <strong>in</strong> some context that has <strong>in</strong>troduced this unsatisfiable;;; sea-state goal and if it has parents that are context-dependent;;; and none of them has already been re-<strong>in</strong>troduced, then re-<strong>in</strong>troduce;;; one.;;; This function is called after we know the current state is too;;; high. So this must be on a branch where the sea-state goal is a;;; child of the affected goals. Just test if one is expanded and the;;; sea-state goal is not a child of that expanded goal.(defun context-affected-goals-are-parents-and-none-is-expanded(sector state goal-var)(unless (and (p4::get-context *current-node*)(some #'(lambda (cnode) (not (member-if #'zerop (cdr cnode))))(p4::get-context *current-node*)))(return-from context-affected-goals-are-parents-and-none-is-expandednil))(let* ((context (p4::get-context *current-node*))(affected (p4::context-node-affected (car p4::*context-nodes*))))(if (and affected(not (some #'(lambda (goal)(expanded-below-contextgoal (car p4::*context-nodes*)))affected)))(mapcan#'(lambda (aff)(let ((goal (p4::goal-node-goal(p4::nexus-parent(p4::nexus-parent aff)))))(if (member goal p4::*pend<strong>in</strong>g-goals*)(list (list (cons goal-var goal))))))affected))))(defun expanded-below-context (goal cnode)(do ((node *current-node* (p4::nexus-parent node)))((or (null node) (eq node cnode)(and (p4::goal-node-p node)(eq (p4::goal-node-goal node) goal)))(and (p4::goal-node-p node)(eq (p4::goal-node-goal node) goal)))))


B.1. Doma<strong>in</strong> specication 147;;; Otherwise, select the sea-state goal, either to get it out of the;;; way or to fail quickly.(control-rule do-sea-state-early(if (and (candidate-goal (sea-state-calmer ))(known (sea-state ))(or (= ) beg<strong>in</strong>-stor<strong>in</strong>g (< 0))( (portable-skim-level>= ) beg<strong>in</strong>-skim2(< 0))))


148 Appendix B. The Oil-spill doma<strong>in</strong>(mapc #'(lambda (pair)(let ((goal (first pair))(op (second pair))(test (or (third pair) '(> 0))))(eval `(control-rule ,(<strong>in</strong>tern (format nil "rej-~S" op))(if (and (current-goal ,goal) ,test))(then reject operator ,op)))))'(( (remove-oil-steps ) beg<strong>in</strong>-remove-oil)( (protect-shore-steps ) start-protect<strong>in</strong>g-shore)( (vacuum-level>= ) beg<strong>in</strong>-vacuum)( (stabilize-discharge-rate )beg<strong>in</strong>-stabilize (~ (acceptable-discharge-rate )))))(control-rule dont-protect-unnecessarily(if (and (current-goal (ok-shorel<strong>in</strong>e ))(~ (known (threatened-shorel<strong>in</strong>e )))))(then reject operator shorel<strong>in</strong>e-protected))(control-rule cannot-stop-threat(if (and (current-goal (ok-shorel<strong>in</strong>e ))(known (threatened-shorel<strong>in</strong>e ))))(then reject operator shorel<strong>in</strong>e-not-threatened));;; The order <strong>in</strong> which these <strong>in</strong>ference rules are expanded won't;;; matter. As soon as unprotected-sensitive-area is expanded, do these.(control-rule pick-a-shorel<strong>in</strong>e(if (lexically-first-candidate-shorel<strong>in</strong>e ))(then select goal (ok-shorel<strong>in</strong>e )));;; Look at all the shorel<strong>in</strong>es that could be worked on, force the one;;; that's first <strong>in</strong> the yellow pages.(defun lexically-first-candidate-shorel<strong>in</strong>e (area-var oil-var)(let ((shorel<strong>in</strong>es (candidate-goal '(ok-shorel<strong>in</strong>e ))))(if shorel<strong>in</strong>es(sublis (mapcar #'cons '( ) (list area-var oil-var))(list (car(sort shorel<strong>in</strong>es#'(lambda (b1 b2)(str<strong>in</strong>g< (oname (cdr (assoc ' b1)))(oname (cdr (assoc ' b2))))))))))));;; Similarly work on barrier-provided first, s<strong>in</strong>ce it won't mess with;;; other goals and we just want to fail quickly if there's no free;;; tractor.(control-rule provide-barrier-quickly(if (candidate-goal (barrier-provided )))(then select goal (barrier-provided )));;; Implement primary effects(control-rule provide-barrier-simply1


B.1. Doma<strong>in</strong> specication 149(if (current-goal (barrier-provided )))(then reject operator shore-exclusion-boom<strong>in</strong>g))(control-rule provide-barrier-simply2(if (current-goal (barrier-provided )))(then reject operator shore-diversion-boom<strong>in</strong>g))(control-rule provide-barrier-simply3(if (current-goal (barrier-provided )))(then reject operator cleanup-shore));;; This one forces it to use different equipment when subgoal<strong>in</strong>g to;;; pick up the slack <strong>in</strong> stabiliz<strong>in</strong>g(control-rule stabilize-differently(if (and (current-ops (cargo-transfer-oil-to-stabilize))(expanded-operator(cargo-transfer-oil-to-stabilize ))))(then reject b<strong>in</strong>d<strong>in</strong>gs (( . ))))(control-rule different-mobile-skimmer(if (and (current-ops (use-mobile-skimmer))(expanded-operator(use-mobile-skimmer ))))(then reject b<strong>in</strong>d<strong>in</strong>gs (( . ))))(control-rule different-portable-skimmer(if (and (current-ops (get-skimmer-to-skim-near-vesselget-portable-skimmer-to-skim-sea-sector))(or (expanded-operator(get-skimmer-to-skim-near-vessel ))(expanded-operator(get-portable-skimmer-to-skim-sea-sector )))))(then reject b<strong>in</strong>d<strong>in</strong>gs (( . ))))(control-rule different-boom(if (and (current-ops (use-boom-<strong>in</strong>-sensitive-area))(expanded-operator(use-boom-<strong>in</strong>-sensitive-area ))))(then reject b<strong>in</strong>d<strong>in</strong>gs (( . ))));;; These are purely speed-up search control, without which it takes;;; FOREVER. They remove an operator from the search space if it;;; requires a resource that has been switched off.;;; This is a good place to start th<strong>in</strong>k<strong>in</strong>g about iterative;;; sophistication search.;;; If a barge has already moved to a sea-sector, it can't be moved to


150 Appendix B. The Oil-spill doma<strong>in</strong>;;; another sea-sector - there's no operator to do it.(control-rule cannot-move-barge-from-sea-sector(if (and (current-ops (cargo-transfer-oil-to-stabilize))(current-goal (stabilize-discharge-rate ))(known (located ))(ptype sea-sector)(~ (eq ))))(then reject b<strong>in</strong>d<strong>in</strong>gs(( . )( . ))))(control-rule cannot-move-barge-from-sea-sector(if (and (current-ops (use-tank-barge-as-storage))(current-goal(store-level>= ))(known (located ))(ptype sea-sector)(~ (eq ))))(then reject b<strong>in</strong>d<strong>in</strong>gs(( . )( . ))))(defun ptype (object typename)(eq (p4::type-name (p4::prodigy-object-type object)) typename))(mapc #'(lambda (pair)(let ((object (first pair))(operator (second pair))(goal (third pair))(predicate (or (fourth pair) '<strong>in</strong>-use)))(eval `(control-rule ,(<strong>in</strong>tern (format nil "need-a-free-~S"object))(if (and (current-goal ,goal)(all-objects ,object ,predicate)))(then reject operator ,operator)))))'( (self-mobile-skimmer open-water-recovery(remove-oil-steps ))(portable-skimmer shallow-water-recovery(remove-oil-steps ))(dispersant apply-chemical-dispersant(remove-oil-steps ) use-prohibited)(cargo-transfer-pump cargo-transfer-oil-to-stabilize(stabilize-discharge-rate ))(portable-skimmer stabilize-discharge-by-conta<strong>in</strong>-skim(stabilize-discharge-rate ))));;; known with negated predicates really doesn't seem to work..(defun all-objects (type pred)


B.1. Doma<strong>in</strong> specication 151(every #'(lambda (object) (known (list pred (oname object))))(p4::type-real-<strong>in</strong>stances(p4::type-name-to-type type *current-problem-space*))));;; Some quick hacks to cut out useless search.(control-rule no-seaworthy-barges(if (and (current-goal (stabilize-discharge-rate ))(known (sea-state ))(no-objects-above-sea-state tank-barge )))(then reject operator cargo-transfer-oil-to-stabilize))(control-rule no-seaworthy-portable-skimmers-or-booms(if (and (current-goal (stabilize-discharge-rate ))(known (sea-state ))(or (no-objects-above-sea-state boom )(no-objects-above-sea-state portable-skimmer ))))(then reject operator stabilize-discharge-by-conta<strong>in</strong>-skim))(control-rule no-seaworthy-mobile-skimmers-or-booms(if (and (current-goal (recovery-open-water ))(known (sea-state ))(or (no-objects-above-sea-state self-mobile-skimmer )(and (no-objects-above-sea-state tank-barge )(no-objects-above-sea-state dracon-barge )))))(then reject operator perform-open-water-recovery))(control-rule no-seaworthy-boom-or-portable-skimmer-or-barge(if (and (current-goal (recovery-shallow-water ))(known (sea-state ))(or (no-objects-above-sea-state boom )(no-objects-above-sea-state portable-skimmer )(and (no-objects-above-sea-state tank-barge )(no-objects-above-sea-state dracon-barge )))))(then reject operator perform-shallow-water-recovery));;; Should be generalised to whether there is enough boom for all of them.;;; But "enough" is governed by a different predicate <strong>in</strong> each case.(control-rule no-seaworthy-boom-for-shore-ops(if (and (current-goal (protect-shore-steps ))(known (located-with<strong>in</strong> ))(adjacent )(known (sea-state ))(no-objects-above-sea-state boom )(try<strong>in</strong>g-operator (cleanup-shore shore-diversion-boom<strong>in</strong>gshore-exclusion-boom<strong>in</strong>g))))(then reject operator ))(control-rule not-enough-good-boom-for-exclusion-boom<strong>in</strong>g(if (and (current-goal (protect-shore-steps ))(known (located-with<strong>in</strong> ))(adjacent )(known (sea-state ))


152 Appendix B. The Oil-spill doma<strong>in</strong>(known (exclusion-boom-required ))(less-boom-above-state )))(then reject operator shore-exclusion-boom<strong>in</strong>g))(control-rule no-seaworthy-boom-to-clean-up-shore(if (and (current-goal (protect-shore-steps ))(known (located-with<strong>in</strong> ))(adjacent )(known (sea-state ))(no-objects-above-sea-state boom )))(then reject operator cleanup-shore))(defun no-objects-above-sea-state (type state)(every #'(lambda (object)(let ((max-state (known-unique(list 'max-sea-state (oname object) '))))(< max-state state)))(p4::type-real-<strong>in</strong>stances(p4::type-name-to-type type *current-problem-space*))))(defun less-boom-above-state (state amount)(let ((ttl 0))(dolist (boom (p4::type-<strong>in</strong>stances(p4::type-name-to-type 'boom *current-problem-space*)))(let ((max (known-unique `(max-sea-state ,(oname boom) ))))(if (> max state)(<strong>in</strong>cf ttl (known-unique `(length-boom-ft ,(oname boom) ))))))(< ttl amount)))(defun try<strong>in</strong>g-operator (var list);; If the var is a variable, return every possible b<strong>in</strong>d<strong>in</strong>g.(if (varp var)(mapcar#'(lambda (elt)(list(cons var (p4::rule-name-to-rule elt *current-problem-space*))))list);; Otherwise check if the value is <strong>in</strong> the list(member var list)));;; In p1 there are two sources of chemical dispersant, one 6 miles;;; away and one 345 miles away. Without this rule prodigy chooses the;;; further one.(control-rule prefer-closer-dispersant(if (and;;(current-goal (apply-dispersant


B.1. Doma<strong>in</strong> specication 153(closer-obj )))(then prefer b<strong>in</strong>d<strong>in</strong>gs(( . )( . )( . )( . ))))(control-rule prefer-closer-boat(if (and (current-ops (move-response-equipment-by-sea1b))(candidate-b<strong>in</strong>d<strong>in</strong>gs (( . )( . )( . )( . )( . )))(<strong>in</strong>st-val )(closer-obj )))(then prefer b<strong>in</strong>d<strong>in</strong>gs (( . )( . )( . )( . )( . ))));;; I'd rather not have to write so many similar rules.. on the other;;; hand this form may be easier to build a tree out of.(control-rule prefer-closer-skimmer(if (and (current-ops (use-mobile-skimmer))(candidate-b<strong>in</strong>d<strong>in</strong>gs )(elt-val 3 )(<strong>in</strong>st-val ) ; is bound from RHS(<strong>in</strong>st-val ) ; assumes they are the same.(closer-obj )(list-to-b<strong>in</strong>d<strong>in</strong>gs use-mobile-skimmer )))(then prefer b<strong>in</strong>d<strong>in</strong>gs ))(control-rule prefer-closer-port(if (and (current-ops (stabilize-discharge-by-tow<strong>in</strong>g-to-port))(candidate-b<strong>in</strong>d<strong>in</strong>gs )(elt-val 5 )(<strong>in</strong>st-val )(<strong>in</strong>st-val )(closer-port-to-sea )(list-to-b<strong>in</strong>d<strong>in</strong>gs stabilize-discharge-by-tow<strong>in</strong>g-to-port )))(then prefer b<strong>in</strong>d<strong>in</strong>gs ));;; Prefer to use a long enough boom if there is one. I guess two;;; short ones that can be transported together and are much closer;;; would be better, but.. Ok, so greedily prefer the boom that;;; maximises the ratio of proportion of boom added to distance.(control-rule best-boom-length-per-hour


154 Appendix B. The Oil-spill doma<strong>in</strong>(if (and (current-ops (use-boom-to-conta<strong>in</strong>-vessel))(candidate-b<strong>in</strong>d<strong>in</strong>gs )(elt-val 1 )(elt-val 4 )(obj-to-name )(elt-val 6 )(<strong>in</strong>st-val )(<strong>in</strong>st-val )(<strong>in</strong>st-val )(greater-used-length-per-hr )(list-to-b<strong>in</strong>d<strong>in</strong>gs use-boom-to-conta<strong>in</strong>-vessel )))(then prefer b<strong>in</strong>d<strong>in</strong>gs ));;; Figure out how much of the needed length is delivered "per hour";;; assum<strong>in</strong>g the same speed of delivery for each boom.(defun greater-used-length-per-hr (boom1 length1 boom2 length2needed-length sector)(let ((delivered1 (m<strong>in</strong> length1 needed-length))(delivered2 (m<strong>in</strong> length2 needed-length))(distance1 (obj-distance sector boom1))(distance2 (obj-distance sector boom2)))(cond ((zerop distance2) nil)((zerop distance1) t)(t(> (/ delivered1 distance1) (/ delivered2 distance2))))))(defun obj-to-name (object namevar)(if (p4::prodigy-object-p object)(list (list (cons namevar (p4::prodigy-object-name object))))))(control-rule prefer-long-enough-boom-<strong>in</strong>-sensitive-area(if (and (current-ops (use-boom-<strong>in</strong>-sensitive-area))(candidate-b<strong>in</strong>d<strong>in</strong>gs )(elt-val 7 )(<strong>in</strong>st-val )(< 0) ; there's enough boom <strong>in</strong> one b<strong>in</strong>d<strong>in</strong>gs set(> 0) ; there isn't <strong>in</strong> the other(list-to-b<strong>in</strong>d<strong>in</strong>gs use-boom-<strong>in</strong>-sensitive-area )))(then prefer b<strong>in</strong>d<strong>in</strong>gs ))(control-rule prefer-big-enough-skimmer(if (and (current-ops (get-skimmer-to-skim-near-vessel))(candidate-b<strong>in</strong>d<strong>in</strong>gs )(elt-val 7 )(<strong>in</strong>st-val )( 0)(list-to-b<strong>in</strong>d<strong>in</strong>gs get-skimmer-to-skim-near-vessel )))(then prefer b<strong>in</strong>d<strong>in</strong>gs ))


B.2. Random problem generator 155;;; prefer to use a bigger pump.(control-rule prefer-big-enough-pump(if (and (current-ops (cargo-transfer-oil-to-stabilize))(candidate-b<strong>in</strong>d<strong>in</strong>gs );; b<strong>in</strong>d to the pump capacity <strong>in</strong> b<strong>in</strong>d<strong>in</strong>gs (op-val cargo-transfer-oil-to-stabilize )(<strong>in</strong>st-val )(> )(list-to-b<strong>in</strong>d<strong>in</strong>gs cargo-transfer-oil-to-stabilize )))(then prefer b<strong>in</strong>d<strong>in</strong>gs ))B.2 Random problem generatorIn Chapter 7 I report on experiments with a number of randomly-generated problemsfrom the oil-spill doma<strong>in</strong>. Here, I describe the problem generator <strong>in</strong> detail.The algorithm to generate the random problems xes the geography of the area, asdened by the set of objects of types sea-sector, land-sector, urban, seaport andsensitive-area and by the literals for the predicates adjacent, located-with<strong>in</strong>,distance and berth-size. The geographic <strong>in</strong>formation models the San FranciscoBay area as dened <strong>in</strong> the version of the doma<strong>in</strong> written at sri [Desimone & Agosta1994]. This xed base of <strong>in</strong>formation <strong>in</strong>cludes 130 objects and 320 predicates. Thereader who is <strong>in</strong>terested <strong>in</strong> this doma<strong>in</strong> and generator is encouraged to send electronicmail to jblythe@cs.cmu.edu for a copy.With<strong>in</strong> this geographic framework, one vessel is placed <strong>in</strong> a random sea-sectorthat conta<strong>in</strong>s at least one sensitive-area, of which there are three. The vessel isgiven a random displacement, distributed uniformly between 30000 and 80000 at10000 <strong>in</strong>tervals. The spillable oil has size either 12000 or 24000 chosen uniformly atrandom. The discharge-rate is xed at 1000 and the boom length required to surroundthe vessel is also xed at 1000. With probability 2/3 there is no oil spilled <strong>in</strong> the<strong>in</strong>itial state and with probability 1/3 some is spilled, the amount chosen accord<strong>in</strong>gto a uniform distribution between 100 and 4000. With probability 1/3 there is nosensitive area <strong>in</strong>itially threatened by the oil, and with probability 2/3 an area isthreatened, chosen uniformly at random from the areas that can be threatened by oil<strong>in</strong> the chosen sea-sector.Next, the <strong>in</strong>itial sea-states and some features of the sensitive areas are randomlyassigned. The sea-state of a sea-sector can vary from one to six, with one mean<strong>in</strong>gcalm and six mean<strong>in</strong>g very rough. Each sea-sector <strong>in</strong> the doma<strong>in</strong> is assigned a valuefrom one to six randomly accord<strong>in</strong>g to the uniform distribution. With probability 1/5,each reachable sensitive area is made unthreatenable by the oil. Each reachable sensitivearea is given the value 10 for shore-personnel-required, and also assigned threerandom values accord<strong>in</strong>g to the uniform distribution: the diversion-boom-required isbetween 2000 and 10000, the exclusion-boom-required is also between 2000 and 10000


156 Appendix B. The Oil-spill doma<strong>in</strong>and the boom-personnel-required takes one of the values that occurs naturally <strong>in</strong> thedoma<strong>in</strong>: 4, 15, 18, 42 and 74.A random number of equipment objects, for example, of types boom, tractorand tug, are distributed among the seaports at random. The number of objects ofeach type is generated accord<strong>in</strong>g to a uniform distribution, with the m<strong>in</strong>imum andmaximum values show <strong>in</strong>Table B.1. In addition the properties of these objects areassigned at random, for example the max-speed and the tow-speed of each tug.Object type M<strong>in</strong> Maxtug 0 4tractor 0 4vacuum-truck 0 8cargo-transfer-pump 0 10dispersant 0 4weir-skimmer 0 10self-mobile-skimmer 0 8dracon-barge 0 10dispersant-barge 0 2tank-barge 0 4utility-boat 1 3boom 10 30Table B.1: The number of objects <strong>in</strong> a problem is chosen randomly accord<strong>in</strong>gto a uniform distribution. This table shows the m<strong>in</strong>imum and maximum number ofobjects generated for each type.This generation algorithm produces a set of problems that, along with randomchoices made <strong>in</strong> the problem solver, exercise all possible comb<strong>in</strong>ations of the strategiesavailable to the planner while ensur<strong>in</strong>g that the generated problems are physicallyplausible.


Bibliography[Agre & Chapman 1987] Agre, P. E., and Chapman, D. 1987. Pengi: An implementationof a theory of activity. InNational Conference onArticial Intelligence.[Allen et al. 1991] Allen, J. F.; Kautz, H. A.; Pelav<strong>in</strong>, R. N.; and Tenenberg, J. D.1991. Reason<strong>in</strong>g About Plans. 2929 Campus Drive, Suite 260, San Mateo, Ca94403: Morgan Kaufmann.[Ambite & Knoblock 1997] Ambite, J. L., and Knoblock, C. A. 1997. <strong>Plann<strong>in</strong>g</strong>by rewrit<strong>in</strong>g: Eciently generat<strong>in</strong>g high-quality plans. In Proc. Ord NationalConference onArticial Intelligence. AAAI Press.[Bacchus et al. 1997] Bacchus, F.; Grove, A. J.; Halpern, J. Y.; and Koller, D. 1997.From statistical knowledge bases to degrees of belief. Articial Intelligence 87:75{143.[Bagchi, Biswas, & Kawamura 1994] Bagchi, S.; Biswas, G.; and Kawamura, K.1994. Generat<strong>in</strong>g plans to succeed <strong>in</strong> uncerta<strong>in</strong> environments. In Hammond, K.,ed., Proc. Second International Conference onArticial Intelligence <strong>Plann<strong>in</strong>g</strong> Systems,1{6. University of Chicago, Ill<strong>in</strong>ois: AAAI Press.[Baral 1995] Baral, C. 1995. Reason<strong>in</strong>g about actions: Non-determ<strong>in</strong>istic eects,constra<strong>in</strong>ts, and qualication. In Proc. 14th International Jo<strong>in</strong>t Conference onArticial Intelligence, 2017{2024. Montreal, Quebec: Morgan Kaufmann.[Blythe & F<strong>in</strong>k 1997] Blythe, J., and F<strong>in</strong>k, E. 1997. Rasput<strong>in</strong>: A completebidirectional-cha<strong>in</strong><strong>in</strong>g planner. Technical Report forthcom<strong>in</strong>g, Computer ScienceDepartment, <strong>Carnegie</strong> <strong>Mellon</strong> University.[Blythe & Mitchell 1989] Blythe, J., and Mitchell, T. 1989. On becom<strong>in</strong>g reactive.In International Conference on Mach<strong>in</strong>e Learn<strong>in</strong>g.[Blythe & Veloso 1992] Blythe, J., and Veloso, M. 1992. An analysis of search techniquesfor a totally-ordered nonl<strong>in</strong>ear planner. In Hendler, J., ed., Proc. FirstInternational Conference onArticial Intelligence <strong>Plann<strong>in</strong>g</strong> Systems. Available ashttp://www.cs.cmu.edu/~jblythe/papers/search.ps.157


158 BIBLIOGRAPHY[Blythe & Veloso 1996] Blythe, J., and Veloso, M. 1996. Learn<strong>in</strong>g to improve uncerta<strong>in</strong>tyhandl<strong>in</strong>g <strong>in</strong> a hybrid plann<strong>in</strong>g system. In Kasif, S., ed., AAAI Fall Symposiumon Learn<strong>in</strong>g Complex Behaviors <strong>in</strong> Intelligent Adaptive Systems.[Blythe & Veloso 1997] Blythe, J., and Veloso, M. 1997. Us<strong>in</strong>g analogy <strong>in</strong> conditionalplanners. In Proc. Ord National Conference onArticial Intelligence. AAAI Press.[Bonissone & Dutta 1990] Bonissone, P., and Dutta, S. 1990. Merg<strong>in</strong>g strategic andtactical plann<strong>in</strong>g <strong>in</strong> dynamic, uncerta<strong>in</strong> environments. In DARPA Workshop onInnovative Approaches to <strong>Plann<strong>in</strong>g</strong>, Schedul<strong>in</strong>g and Control, 379{389.[Boutilier & Dearden 1994] Boutilier, C., and Dearden, R. 1994. Us<strong>in</strong>g abstractionsfor decision-theoretic plann<strong>in</strong>g with time constra<strong>in</strong>ts. In Proc. Ord National ConferenceonArticial Intelligence, 1016{1022. AAAI Press.[Boutilier, Brafman, & Geib 1997] Boutilier, C.; Brafman, R. I.; and Geib, C. 1997.Prioritized goal decomposition of markov decision processes: Toward a synthesisof classical and decision theoretic plann<strong>in</strong>g. In Proc. 15th International Jo<strong>in</strong>tConference onArticial Intelligence. Nagoya, Japan: Morgan Kaufmann.[Boutilier, Dean, & Hanks 1995] Boutilier, C.; Dean, T.; and Hanks, S. 1995. <strong>Plann<strong>in</strong>g</strong><strong>under</strong> uncerta<strong>in</strong>ty: structural assumptions and computational leverage. InGhallab, M., and Milani, A., eds., New Directions <strong>in</strong> AI <strong>Plann<strong>in</strong>g</strong>, 157{172. Assissi,Italy: IOS Press.[Boutilier, Dearden, & Goldszmidt 1995] Boutilier, C.; Dearden, R.; and Goldszmidt,M. 1995. Exploit<strong>in</strong>g structure <strong>in</strong> policy construction. In Proc. 14th InternationalJo<strong>in</strong>t Conference onArticial Intelligence, 1104{1111. Montreal, Quebec:Morgan Kaufmann.[Brafman 1997] Brafman, R. 1997. A heuristic variable grid solution method ofpomdps. In Proc. Ord National Conference on Articial Intelligence, 727{733.AAAI Press.[Brooks 1986] Brooks, R. 1986. A robust layered control system for a mobile robot.IEEE Journal of Robotics and Automation RA-2:14{23.[Carbonell et al. 1992] Carbonell, J. G.; Blythe, J.; Etzioni, O.; Gil, Y.; Joseph, R.;Kahn, D.; Knoblock, C.; M<strong>in</strong>ton, S.; Perez, A.; Reilly, S.; Veloso, M.; and Wang, M.1992. PRODIGY4.0: The manual and tutorial. Technical Report CMU-CS-92-150,School of Computer Science, <strong>Carnegie</strong> <strong>Mellon</strong> University.[Cassandra, Kaelbl<strong>in</strong>g, & Littman 1994] Cassandra, A. R.; Kaelbl<strong>in</strong>g, L. P.; andLittman, M. L. 1994. Act<strong>in</strong>g optimally <strong>in</strong> partially observable stochastic doma<strong>in</strong>s.In Proc. Ord National Conference on Articial Intelligence, 1023{1028.AAAI Press.


BIBLIOGRAPHY 159[Chapman 1987] Chapman, D. 1987. <strong>Plann<strong>in</strong>g</strong> for conjunctive goals. Articial Intelligence32:333{378.[Cooper 1990] Cooper, G. F. 1990. The computational complexity of probabilistic<strong>in</strong>ference us<strong>in</strong>g bayesian belief networks. Articial Intelligence 42:393{405.[Darwiche 1995] Darwiche, A. 1995. Condition<strong>in</strong>g methods for exact and approximate<strong>in</strong>ference <strong>in</strong> causal networks. In Besnard, P., and Hanks, S., eds., Proc.Eleventh Conference on<strong>Uncerta<strong>in</strong>ty</strong> <strong>in</strong> Articial Intelligence, 99{107. Montreal,Quebec: Morgan Kaufmann.[Dean & L<strong>in</strong> 1995] Dean, T., and L<strong>in</strong>, S.-H. 1995. Decomposition techniques forplann<strong>in</strong>g <strong>in</strong> stochastic doma<strong>in</strong>s. In Proc. 14th International Jo<strong>in</strong>t Conference onArticial Intelligence, 1121 { 1127. Montreal, Quebec: Morgan Kaufmann.[Dean & Wellman 1991] Dean, T. L., and Wellman, M. P. 1991. <strong>Plann<strong>in</strong>g</strong> and Control.Morgan Kaufmann.[Dean et al. 1993] Dean, T.; Kaelbl<strong>in</strong>g, L. P.; Kirman, J.; and Nicholson, A. 1993.<strong>Plann<strong>in</strong>g</strong> with deadl<strong>in</strong>es <strong>in</strong> stochastic doma<strong>in</strong>s. In National Conference onArticialIntelligence, National Conference on Articial Intelligence.[Dean et al. 1995] Dean, T.; Kaelbl<strong>in</strong>g, L.; Kirman, J.; and Nicholson, A. 1995.<strong>Plann<strong>in</strong>g</strong> <strong>under</strong> time constra<strong>in</strong>ts <strong>in</strong> stochastic doma<strong>in</strong>s. Articial Intelligence 76(1-2):35{74.[Dean 1994] Dean, T. 1994. Decision-theoretic plann<strong>in</strong>g and markov decision processes.To be submitted to AI Magaz<strong>in</strong>e.[Dearden & Boutilier 1997] Dearden, R., and Boutilier, C. 1997. Abstraction andapproximate decision theoretic plann<strong>in</strong>g. Articial Intelligence To appear.[Desimone & Agosta 1994] Desimone, R. V., and Agosta, J. M. 1994. Spill responsesystem conguration study | nal report. Technical Report ITAD-4368-FR-94-236, SRI International.[Draper & Hanks 1994] Draper, D., and Hanks, S. 1994. Localized partial evaluationof belief networks. In de Mantaras, R. L., and Poole, D., eds., Proc. Tenth Conferenceon <strong>Uncerta<strong>in</strong>ty</strong> <strong>in</strong> Articial Intelligence, 170{177. Seattle, WA: MorganKaufmann.[Draper, Hanks, & Weld 1994] Draper, D.; Hanks, S.; and Weld, D. 1994. Probabilisticplann<strong>in</strong>g with <strong>in</strong>formation gather<strong>in</strong>g and cont<strong>in</strong>gent execution. In Hammond,K., ed., Proc. Second International Conference onArticial Intelligence <strong>Plann<strong>in</strong>g</strong>Systems, 31{37. University of Chicago, Ill<strong>in</strong>ois: AAAI Press.


160 BIBLIOGRAPHY[Drummond & Bres<strong>in</strong>a 1990] Drummond, M., and Bres<strong>in</strong>a, J. 1990. Anytime syntheticprojection: Maximiz<strong>in</strong>g the probability of goal satisfaction. In Proc. OrdNational Conference onArticial Intelligence, 138{144. AAAI Press.[Etzioni 1990] Etzioni, O. 1990. A Structural Theory of Explanation-Based Learn<strong>in</strong>g.Ph.D. Dissertation, School of Computer Science, <strong>Carnegie</strong> <strong>Mellon</strong> University.Available as technical report CMU-CS-90-185.[Feldman & Sproull 1977] Feldman, J. A., and Sproull, R. F. 1977. Decision theoryand articial <strong>in</strong>telligence ii: The hungy monkey. Cognitive Science 1:158{192.[Fikes & Nilsson 1971] Fikes, R. E., and Nilsson, N. J. 1971. Strips: A new approachto the application of theorem prov<strong>in</strong>g to problem solv<strong>in</strong>g. Articial Intelligence2:189{208.[F<strong>in</strong>k & Veloso 1995] F<strong>in</strong>k, E., and Veloso, M. 1995. Formaliz<strong>in</strong>g the prodigy plann<strong>in</strong>galgorithm. In Ghallab, M., and Milani, A., eds., New Directions <strong>in</strong> AI <strong>Plann<strong>in</strong>g</strong>,261{272. Assissi, Italy: IOS Press.[Firby 1987] Firby, J. R. 1987. An <strong>in</strong>vestigation <strong>in</strong>to reactive plann<strong>in</strong>g <strong>in</strong> complexdoma<strong>in</strong>s. In Proc. Ord National Conference on Articial Intelligence, 202{206.AAAI Press.[Firby 1989] Firby, J.R. 1989. Adaptive Execution <strong>in</strong> Complex <strong>Dynamic</strong> Worlds.Ph.D. Dissertation, Department of Computer Science, Yale University.[George & Lansky 1987] George, M. P., and Lansky, A.L. 1987. Reactive reason<strong>in</strong>gand plann<strong>in</strong>g. In Proc. Ord National Conference onArticial Intelligence,677{681. AAAI Press.[Gerev<strong>in</strong>i & Schubert 1996] Gerev<strong>in</strong>i, A., and Schubert, L. 1996. Accelerat<strong>in</strong>gpartial-order planners: Some techniques for eective search control and prun<strong>in</strong>g.Journal or Articial Intelligence Research 5:95{137.[Gervasio & DeJong 1994] Gervasio, M. T., and DeJong, G. F. 1994. An <strong>in</strong>crementallearn<strong>in</strong>g approach for completable plann<strong>in</strong>g. In International Conference onMach<strong>in</strong>e Learn<strong>in</strong>g.[Gervasio 1996] Gervasio, M. T. 1996. An Incremental Curative Learn<strong>in</strong>g Approachfor <strong>Plann<strong>in</strong>g</strong> with Incorrect Doma<strong>in</strong> Theories. Ph.D. Dissertation, University ofIll<strong>in</strong>ois at Urbana-Chapmaign.[Gil 1992] Gil, Y. 1992. Acquir<strong>in</strong>g Doma<strong>in</strong> Knowledge for <strong>Plann<strong>in</strong>g</strong> by Experimentation.Ph.D. Dissertation, School of Computer Science, <strong>Carnegie</strong> <strong>Mellon</strong> University.Available as technical report CMU-CS-92-175.


BIBLIOGRAPHY 161[Goldman & Boddy 1994] Goldman, R. P., and Boddy, M.S. 1994. Epsilon-safeplann<strong>in</strong>g. In de Mantaras, R. L., and Poole, D., eds., Proc. Tenth Conference on<strong>Uncerta<strong>in</strong>ty</strong> <strong>in</strong> Articial Intelligence, 253{261. Seattle, WA: Morgan Kaufmann.[Goldszmidt & Darwiche 1995] Goldszmidt, M., and Darwiche, A. 1995. Plan simulationus<strong>in</strong>g bayesian networks. In IEEE-CAIA.[Goodw<strong>in</strong> 1994] Goodw<strong>in</strong>, R. 1994. Reason<strong>in</strong>g about when to start act<strong>in</strong>g. In Hammond,K., ed., Proc. Second International Conference on Articial Intelligence<strong>Plann<strong>in</strong>g</strong> Systems, 86{91. University of Chicago, Ill<strong>in</strong>ois: AAAI Press.[Haddawy&Suwandi 1994] Haddawy, P., and Suwandi, M. 1994. Decision-theoreticrenement plann<strong>in</strong>g us<strong>in</strong>g <strong>in</strong>heritance abstraction. In Hammond, K., ed., Proc.Second International Conference onArticial Intelligence <strong>Plann<strong>in</strong>g</strong> Systems. Universityof Chicago, Ill<strong>in</strong>ois: AAAI Press.[Haddawy, Doan, & Goodw<strong>in</strong> 1995] Haddawy, P.; Doan, A.; and Goodw<strong>in</strong>, R. 1995.Ecient decision-theoretic plann<strong>in</strong>g: Techniques and empirical analysis. InBesnard, P., and Hanks, S., eds., Proc. Eleventh Conference on <strong>Uncerta<strong>in</strong>ty</strong> <strong>in</strong>Articial Intelligence, 229{326. Montreal, Quebec: Morgan Kaufmann.[Haddawy, Doan, & Kahn 1996] Haddawy, P.; Doan, A.; and Kahn, C. E. 1996.Decision-theoretic renement plann<strong>in</strong>g <strong>in</strong> medical decision mak<strong>in</strong>g: Managementof acute deep venous thrombosis. Medical Decision Mak<strong>in</strong>g.[Haddawy 1991] Haddawy, P. 1991. Represent<strong>in</strong>g Plans Under <strong>Uncerta<strong>in</strong>ty</strong>: a Logicof Time, Chance and Action. Ph.D. Dissertation, University of Ill<strong>in</strong>ois at Urbana-Champaign.[Hickman, Shell, & Carbonell 1990] Hickman, A. K.; Shell, P.; and Carbonell, J. G.1990. Internal analogy: Reduc<strong>in</strong>g search dur<strong>in</strong>g problem solv<strong>in</strong>g. In Copetas,C., ed., The Computer Science Research Review 1990. The School of ComputerScience, <strong>Carnegie</strong> <strong>Mellon</strong> University.[Howard 1960] Howard, R. A. 1960. <strong>Dynamic</strong> Programm<strong>in</strong>g and Markov Processes.MIT Press.[Kartha 1995] Kartha, G. N. 1995. A Mathematical Investigation of Reason<strong>in</strong>g AboutActions. Ph.D. Dissertation, University ofTexas at Aust<strong>in</strong>.[Kautz & Selman 1996] Kautz, H. A., and Selman, B. 1996. Push<strong>in</strong>g the envelope:<strong>Plann<strong>in</strong>g</strong>, propositional logic, and stochastic search. In Proc. Ord National ConferenceonArticial Intelligence. AAAI Press.[Knoblock 1991] Knoblock, C. A. 1991. Automatically Generat<strong>in</strong>g Abstractions forProblem Solv<strong>in</strong>g. Ph.D. Dissertation, <strong>Carnegie</strong> <strong>Mellon</strong> University.


162 BIBLIOGRAPHY[Koenig & Simmons 1995] Koenig, S., and Simmons, R. 1995. Real-time search<strong>in</strong> nondeterm<strong>in</strong>istic doma<strong>in</strong>s. In Proc. 14th International Jo<strong>in</strong>t Conference onArticial Intelligence, 1660{1667. Montreal, Quebec: Morgan Kaufmann.[Kushmerick, Hanks, & Weld 1994] Kushmerick, N.; Hanks, S.; and Weld, D. 1994.An algorithm for probabilistic least-commitment plann<strong>in</strong>g. In Proc. Ord NationalConference onArticial Intelligence, 1073{1078. AAAI Press.[Kushmerick, Hanks, & Weld 1995] Kushmerick, N.; Hanks, S.; and Weld, D. 1995.An algorithm for probabilistic plann<strong>in</strong>g. Articial Intelligence 76:239 { 286.[Lauritzen & Spiegelhalter 1988] Lauritzen, S., and Spiegelhalter, D. 1988. Localcomputation with probabilities on graphical structures and their application toexpert systems. Journal of the Royal Statistical Society B 50(2):154{227.[Littman 1996] Littman, M. L. 1996. Algorithms for Sequential Decision Mak<strong>in</strong>g.Ph.D. Dissertation, Department of Computer Science, Brown University.[Loyall & Bates 1991] Loyall, A. B., and Bates, J. 1991. Hap: A reactive, adaptivearchitecture for agents. Technical Report CMU-CS-91-147, School of ComputerScience, <strong>Carnegie</strong> <strong>Mellon</strong> University, Pittsburgh, PA.[Mansell 1993] Mansell, T. M. 1993. A method for plann<strong>in</strong>g given uncerta<strong>in</strong> and<strong>in</strong>complete <strong>in</strong>formation. In Heckerman, D., and Mamdani, A., eds., <strong>Uncerta<strong>in</strong>ty</strong> <strong>in</strong>Articial Intelligence, volume 9, 350{358. Wash<strong>in</strong>gton, D.C.: Morgan Kaufmann.[McAllester & Rosenblitt 1991] McAllester, D., and Rosenblitt, D. 1991. Systematicnonl<strong>in</strong>ear plann<strong>in</strong>g. In Proc. Ord National Conference onArticial Intelligence,634{639. AAAI Press.[M<strong>in</strong>ton et al. 1989] M<strong>in</strong>ton, S.; Carbonell, J. G.; Knoblock, C. A.; Kuokka, D. R.;Etzioni, O.; and Gil, Y. 1989. Explanation-based learn<strong>in</strong>g: A problem solv<strong>in</strong>gperspective. Articial Intelligence 40:63{118.[M<strong>in</strong>ton 1988] M<strong>in</strong>ton, S. 1988. Learn<strong>in</strong>g Eective Search Control Knowledge: AnExplanation-Based Approach. Boston, MA: Kluwer.[Newell & Simon 1963] Newell, A., and Simon, H. A. 1963. Gps: a program thatsimulates human thought. In Feigenbaum, E. A., and Feldman, J., eds., Computersand Thought, 279{293. New York: McGraw-Hill.[Onder & Pollack 1997] Onder, N., and Pollack, M. 1997. Cont<strong>in</strong>gency selection <strong>in</strong>plan generation. In European Conference on <strong>Plann<strong>in</strong>g</strong>.[Parr & Russell 1995] Parr, R., and Russell, S. 1995. Approximat<strong>in</strong>g optimal policiesfor partially observable stochastic doma<strong>in</strong>s. In Proc. 14th International Jo<strong>in</strong>tConference onArticial Intelligence. Montreal, Quebec: Morgan Kaufmann.


BIBLIOGRAPHY 163[Pearl 1988] Pearl, J. 1988. Probabilistic Reason<strong>in</strong>g <strong>in</strong> Intelligent Systems. MorganKaufmann.[Pearl 1994] Pearl, J. 1994. A probabilistic calculus of actions. In de Mantaras,R. L., and Poole, D., eds., Proc. Tenth Conference on <strong>Uncerta<strong>in</strong>ty</strong> <strong>in</strong> ArticialIntelligence, 454{462. Seattle, WA: Morgan Kaufmann.[Penberthy &Weld 1992] Penberthy, J. S., and Weld, D. S. 1992. Ucpop: A sound,complete, partial order planner for adl. In Third International Conference onPr<strong>in</strong>ciples of Knowledge Representation and Reason<strong>in</strong>g, 103{114.[Peot & Smith 1992] Peot, M. A., and Smith, D. E. 1992. Conditional nonl<strong>in</strong>earplann<strong>in</strong>g. In Hendler, J., ed., Proc. First International Conference onArticialIntelligence <strong>Plann<strong>in</strong>g</strong> Systems, 189{197. College Park, Maryland: Morgan Kaufmann.[Perez & Carbonell 1994] Perez, M. A., and Carbonell, J. 1994. Control knowledge toimprove plan quality. In Hammond, K., ed., Proc. Second International Conferenceon Articial Intelligence <strong>Plann<strong>in</strong>g</strong> Systems, 323{328. University of Chicago, Ill<strong>in</strong>ois:AAAI Press.[Perez 1995] Perez, M. A. 1995. Learn<strong>in</strong>g Search Control Knowledge to ImprovePlan Quality. Ph.D. Dissertation, School of Computer Science, <strong>Carnegie</strong> <strong>Mellon</strong>University.[Pollack, Josl<strong>in</strong>, & Paolucci 1997] Pollack, M. E.; Josl<strong>in</strong>, D.; and Paolucci, M. 1997.Flaw selection strategies for partial-order plann<strong>in</strong>g. Journal of Articial IntelligenceResearch 6:223{262.[Pryor & Coll<strong>in</strong>s 1993] Pryor, L., and Coll<strong>in</strong>s, G. 1993. Cassandra: <strong>Plann<strong>in</strong>g</strong> forcont<strong>in</strong>gencies. Technical Report 41, The Institute for the Learn<strong>in</strong>g Sciences.[Pryor & Coll<strong>in</strong>s 1996] Pryor, L., and Coll<strong>in</strong>s, G. 1996. <strong>Plann<strong>in</strong>g</strong> for cont<strong>in</strong>gencies:A decision-based approach. Journal of Articial Intelligence Research 4:287{339.[Puterman 1994] Puterman, M. 1994. Markov Decision Processes : Discrete Stochastic<strong>Dynamic</strong> Programm<strong>in</strong>g. John Wiley & Sons.[Russell & Wefald 1991] Russell, S., and Wefald, E. 1991. Do the Right Th<strong>in</strong>g. MITPress.[Sacerdoti 1974] Sacerdoti, E. 1974. <strong>Plann<strong>in</strong>g</strong> <strong>in</strong> a hierarchy of abstraction spaces.Articial Intelligence 5:115{135.[Schoppers 1989a] Schoppers, M. J. 1989a. In defense of reaction plans as caches. AIMagaz<strong>in</strong>e 10(4):51{60.


164 BIBLIOGRAPHY[Schoppers 1989b] Schoppers, M. J. 1989b. Representation and Automatic Synthesisof Reaction Plans. Ph.D. Dissertation, University of Ill<strong>in</strong>ois at Urbana-Champaign.Available as technical report UIUCDCS-R-89-1546.[Shanahan 1995] Shanahan, M. 1995. A circumscriptive calculus of events. ArticialIntelligence 75(2).[Stone, Veloso, & Blythe 1994] Stone, P.; Veloso, M.; and Blythe, J. 1994. Theneed for dierent doma<strong>in</strong>-<strong>in</strong>dependent heuristics. In Hammond, K., ed., Proc.Second International Conference onArticial Intelligence <strong>Plann<strong>in</strong>g</strong> Systems, 164{169. University of Chicago, Ill<strong>in</strong>ois: AAAI Press.[Tash & Russell 1994] Tash, J. K., and Russell, S. 1994. Control strategies for astochastic planner. In Proc. Ord National Conference on Articial Intelligence,1079{1085. AAAI Press.[Tate 1977] Tate, A. 1977. Generat<strong>in</strong>g project networks. In International Jo<strong>in</strong>tConference onArticial Intelligence.[Veloso & Stone 1995] Veloso, M., and Stone, P. 1995. Flecs: <strong>Plann<strong>in</strong>g</strong> with a exiblecommitment strategy. Journal of Articial Intelligence Research 3:25{52.[Veloso et al. 1995] Veloso, M.; Carbonell, J.; Perez, A.; Borrajo, D.; F<strong>in</strong>k, E.; andBlythe, J. 1995. Integrat<strong>in</strong>g plann<strong>in</strong>g and learn<strong>in</strong>g: The prodigy architecture.Journal of Experimental and Theoretical AI 7:81{120.[Veloso 1992] Veloso, M. M. 1992. Learn<strong>in</strong>g by Analogical Reason<strong>in</strong>g <strong>in</strong> GeneralProblem Solv<strong>in</strong>g. Ph.D. Dissertation, <strong>Carnegie</strong> <strong>Mellon</strong> University.[Veloso 1994] Veloso, M. M. 1994. <strong>Plann<strong>in</strong>g</strong> and Learn<strong>in</strong>g by Analogical Reason<strong>in</strong>g.Spr<strong>in</strong>ger Verlag.[Vere 1983] Vere, S. 1983. <strong>Plann<strong>in</strong>g</strong> <strong>in</strong> time: W<strong>in</strong>dows and durations for activitiesand goals. IEEE Trans. Pattern Analysis and Mach<strong>in</strong>e Intelligence 5(3):246{67.[Verma 1986] Verma, T. S. 1986. Causal networks: Semantics and expressiveness.Technical Report R-65, UCLA Cognitive Systems Laboratory.[Wang 1996] Wang, X. 1996. Learn<strong>in</strong>g <strong>Plann<strong>in</strong>g</strong> Operators by Observation and Practice.Ph.D. Dissertation, <strong>Carnegie</strong> <strong>Mellon</strong> University.[Wash<strong>in</strong>gton 1994] Wash<strong>in</strong>gton, R. 1994. Abstraction plann<strong>in</strong>g <strong>in</strong> real time. Ph.D.Dissertation, Stanford University.[Wellman 1990a] Wellman, M. P. 1990a. Formulation of Tradeos <strong>in</strong> <strong>Plann<strong>in</strong>g</strong> Under<strong>Uncerta<strong>in</strong>ty</strong>. Pitman.


BIBLIOGRAPHY 165[Wellman 1990b] Wellman, M. P. 1990b. The strips assumption for plann<strong>in</strong>g <strong>under</strong>uncerta<strong>in</strong>ty. InProc. Ord National Conference onArticial Intelligence, 198{203.AAAI Press.[Williamson & Hanks 1994] Williamson, M., and Hanks, S. 1994. Optimal plann<strong>in</strong>gwith a a goal-directed utility model. In Hammond, K., ed., Proc. Second InternationalConference onArticial Intelligence <strong>Plann<strong>in</strong>g</strong> Systems, 176{181. Universityof Chicago, Ill<strong>in</strong>ois: AAAI Press.[Williamson 1996] Williamson, M. 1996. A Value-directed Approach to <strong>Plann<strong>in</strong>g</strong>.Ph.D. Dissertation, University ofWash<strong>in</strong>gton.[Yang&Tenenberg 1990] Yang, Q., and Tenenberg, J. D. 1990. Abtweak, abstract<strong>in</strong>ga nonl<strong>in</strong>ear, least commitment planner. In Proc. Ord National Conference onArticial Intelligence, 204{209. AAAI Press.[Zilberste<strong>in</strong> 1993] Zilberste<strong>in</strong>, S. 1993. Operational Rationality through Compilationof Anytime Algorithms. Ph.D. Dissertation, University of California at Berkeley.


166 BIBLIOGRAPHY

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!