Multiagent Systems

Prof. Dr. Jürgen Dix · Department of Informatics, TUC Multiagent Systems, WS 06/07 1/731 

Multiagent Systems 

Prof. Dr. Jürgen Dix 

Department of Informatics 

Clausthal University of Technology 

WS 06/07

Time and place: Monday and Tuesday 10–12 in 

R 210 (Am Regenbogen) 

Exercises: Every other Tuesday (approx.). 

Website 

http://cig.in.tu-clausthal.de/ 

Visit regularly! 

There you will find important information about 

the lecture, documents, exercises et cetera. 

Organization: Lecture: Prof. Dix, 

Exercise: Dr. Jamroga 

Exam: Oral exams on demand. 

rof. Dr. Jürgen Dix · Department of Informatics, TUC Multiagent Systems, WS 06/07 2/731


About this lecture 

This course is based, among other papers listed in the 

following chapters, mainly on the following books and 

articles. I am very much indebted to Yoav Shoham (for 

providing me with a draft of his forthcoming book), to 

Mike Wooldridge (for providing me with many of the 

figures and graphs of his two books), to Viviana Mascardi 

(for providing me with the tex sources of her nice article 

and the slides of a course), to Michael Fisher (for allowing 

me to make use of parts his forthcoming book on 

temporal logics) and to Tuomas Sandholm (for allowing 

me to use some of his graphs and experimental results 

wrt. multiagent TSP’s and contract nets). 

Thanks to Tristan Behrens who helped to translate my 

slides into beamer tex (and made many improvements).


Literatur I 

[Bordini et al. 2005] Bordini, R., Dastani, M., Dix, J., and 

El Fallah Segrouchni, A. (2005). 

Programming Multi Agent Systems: Languages, 

Platforms and Applications. 

Springer.


Literatur II 

[Mascardi et al. 2004] Mascardi, V., M. Martelli, and 

L. Sterling (2004). 

Logic-based specification languages for intelligent 

software agents. 

Theory and Practice of Logic Programming 4(4), 

429–494.


Literatur III 

[Shoham and Leyton-Brown 2007] Shoham, Y. and 

K. Leyton-Brown (2007). 

Multi Agent Systems. 

MIT Press. 

[Subrahmanian et al. 2000] Subrahmanian, V., P. Bonatti, 

J. Dix, T. Eiter, S. Kraus, F. Özcan, and R. Ross (2000). 

Heterogenous Active Agents. 

MIT-Press.


Literatur IV 

[Weiss 1999] Weiss, G. (Ed.) (1999). 

Multi-Agent Systems. 

MIT-Press. 

[Wooldridge 2000] Wooldridge, M. (2000). 

Reasoning about Rational Agents. 

MIT Press. 

[Wooldridge 2002] Wooldridge, M. (2002). 

An Introduction to MultiAgent Systems. 

John Wiley & Sons.


1. Introduction 

Chapter 1. Introduction 

Introduction 

1.1 Motivation 

1.2 Intelligent Agents 

1.3 Formal Description 

1.4 References


1. Introduction 

We are setting the stage for a precise discussion 

of agency. From informal concepts to (more or 

less) mathematical definitions. 

1 MAS versus Distributed AI, 

2 Environment of agents, 

3 Agents and other frameworks, 

4 Runs as characteristic behaviour, 

5 state-based versus standard agents.


1. Introduction 1. Motivation 

1.1 Motivation



Three Important Questions 

(Q1) What is a (software) agent? 

⇒ [Franklin and Graesser1997, 

Wooldridge and Jennings1995] (and 

references therein) 

(Q2) If some program P is not an agent, 

how can it be transformed into an 

agent? 

(Q3) If (Q1) is clear, what kind of Software 

Infrastructure is needed for the 

interaction of agents? What services 

are necessary?



Definition 1.1 (Distributed Artificial Intelligence (DAI)) 

The area investigating systems, where several 

autonomous acting entities work together to 

reach a given goal. 

The entities are called Agents, the area 

Multiagent Systems. 

AAMAS: several conferences joined in 2002 to 

form the main annual event. Bologna (2002), 

Melbourne (2003), New York (2004), Utrecht 

(2005), Hakodate (2006), Hawaii (2007): 

http://www.fun.ac.jp/aamas2006/main.html



Example 1.2 (RoboCup) 

2d Simulation, 3d simulation, small size, mid size, four 

legged, humanoid and rescue league. 

Example 1.3 (Grand Challenge) 

Grand Challenge: organised by DARPA (8th October). 

Stanley (VW Touareg coached by Stanford University 

(Sebastian Thrun) won in 2005. 

Example 1.4 (CLIMA Contest) 

A simple grid where agents are supposed to collect gold. 

Different roles of agents: scouts, collectors. 

http://clima.deis.unibo.it/contest.html 

http://cig.in.tu-clausthal.de/index.php\relax? 

id=clima7contest



Why do we need them? 

Information systems are distributed, open, 

heterogenous. 

We therefore need intelligent, interactive 

agents, that act autonomously.



(Software) Agent: Programs that are implemented on a 

platform and have “sensors” and “effectors” to 

read from and make changes to the 

environment, respectively. 

Intelligent: Performance measures, to evaluate the 

success. Rational vs. omniscient, decision 

making 

Interactive: with other agents (software or humans) by 

observing the environment. 

Coordination: Cooperation vs. Competition

of. Dr. Jürgen Dix · Department of Informatics, TUC Multiagent Systems, WS 06/07 16/731 


MAS versus Classical DAI 

MAS: Several Agents coordinate their 

knowledge and actions (semantics 

describes this). 

DAI: Particular problem is divided into 

smaller problems (nodes). These 

nodes have common knowledge. The 

solution method is given. 

Attention: 

Today DAI is often used synonymous with MAS.



AI 

Agent 

Intelligence: 

Property of a 

single Agent 

Cognitive Processes 

of a single Agent 

DAI 

Multiple Agents 

Intelligence: 

Property of 

several Agents 

Social Processes 

of several Agents



10 Desiderata 

1. Agents are for everyone! We need a method 

to agentise given programs. 

2. Take into account that data is stored in a wide 

variety of data structures, and data is 

manipulated by an existing corpus of 

algorithms. 

3. A theory of agents must not depend upon 

the set of actions that the agent performs. 

Rather, the set of actions that the agent 

performs must be a parameter that is taken 

into account in the semantics.



10 Desiderata 

4. Every (software) agent should execute 

actions based on some clearly articulated 

decision policy. A declarative framework for 

articulating decision policies of agents is 

imperative. 

5. Any agent construction framework must 

allow agents to reason: 

Reasoning about its beliefs about other agents. 

Reasoning about uncertainty in its beliefs about the world and 

about its beliefs about other agents. 

Reasoning about time. 

These capabilities should be viewed as 

extensions to a core agent action language.



10 Desiderata 

6. Any infrastructure to support multiagent 

interactions must provide security. 

7. While the efficiency of the code underlying a 

software agent cannot be guaranteed (as it 

will vary from one application to another), 

guarantees are needed that provide 

information on the performance of an agent 

relative to an oracle that supports calls to 

underlying software code.



10 Desiderata 

8. We must identify efficiently computable 

fragments of the general hierarchy of 

languages alluded to above, and our 

implementations must take advantage of the 

specific structure of such language 

fragments. 

9. A critical point is reliability—there is no point 

in a highly efficient implementation, if all 

agents deployed in the implementation come 

to a grinding halt when the agent 

“infrastructure” crashes.



10 Desiderata 

10. The only way of testing the applicability of 

any theory is to build a software system 

based on the theory, to deploy a set of 

applications based on the theory, and to 

report on experiments based on those 

applications.


1. Introduction 2. Intelligent Agents 

1.2 Intelligent Agents



Definition 1.5 (Agent) 

An agent is a computer system that acts in its 

environment and executes autonomous actions 

to reach certain goals. 

Learning, Intelligence. Environment is 

non-deterministic. 

sensors 

environment 

percepts 

actions 

? 

agent 

effectors



Definition 1.6 (Rational, Omniscient Agent) 

Rational agents are those, that always do the 

right thing. 

(A performance measure is needed.) 

Omniscient agents are those, that know the 

results of their actions in advance. 

Rational agents are in general not omniscient!



Question 

How is the right thing defined and what does it 

depend on? 

1 Performance measure (as objective as 

possible) 

2 Percept sequence: what has been observed 

3 Knowledge of the agent about the 

environment 

4 How the agent can act.



Definition 1.7 (Ideal Rational Agent) 

An ideal rational agent chooses for each percept 

sequence exactly the action which maximises its 

performance measure (given knowledge about 

the environment).



Agents can be described mathematically by a 

function 

set of percept sequences ↦→ set of actions. 

The internal structure of an agent can be 

described as 

Agent = Architecture + Program



Agents and their PAGE description: 

Agent Type Percepts Actions Goals Environment 

Medical diagnosis 

system 

Symptoms, 

findings, patient’s 

answers 

Questions, tests, 

treatments 

Healthy patient, 

minimize costs 

Patient, hospital 

Satellite image 

analysis system 

Pixels of varying 

intensity, color 

Print a 

categorization of 

scene 

Correct 

categorization 

Images from 

orbiting satellite 

Part-picking robot 

Pixels of varying 

intensity 

Pick up parts and 

sort into bins 

Place parts in 

correct bins 

Conveyor belt 

with parts 

Refinery controller 

Temperature, 

pressure readings 

Open, close 

valves; adjust 

temperature 

Maximize purity, 

yield, safety 

Refinery 

Interactive English 

tutor 

Typed words 

Print exercises, 

suggestions, 

corrections 

Maximize 

student’s score on 

test 

Set of students



Question: 

How do environment properties influence agent design? 

Definition 1.8 (Properties of the Environment) 

Accessible/Inaccessible: If not completely accessible, one needs 

internal states. 

Deterministic/Indeterministic: An inaccessible environment might 

seem indeterministic, even if it is not. 

Episodic/Nonepisodic: Percept-Action-Sequences are independent 

from each other. Closed episodes. 

Static/Dynamic: While the agent is thinking, the world is the 

same/changing. Semi-dynamic: The world does not 

change, but the performance measure. 

Discrete/Continous: Density of observations and actions. Relevant: 

Level of granularity.



Environment Accessible Deterministic Episodic Static Discrete 

Chess with a clock Yes Yes No Semi Yes 

Chess without a clock Yes Yes No Yes Yes 

Poker No No No Yes Yes 

Backgammon Yes No No Yes Yes 

Taxi driving No No No No No 

Medical diagnosis system No No No No No 

Image-analysis system Yes Yes Yes Semi No 

Part-picking robot No No Yes No No 

Refinery controller No No No No No 

Interactive English tutor No No No No Yes


xbiff,software demons are agents (not intelligent). 

Definition 1.9 (Intelligent Agent) 

An intelligent agent is an agent with the following 

properties: 

1 Autonomous: Operates without direct intervention of 

others, has some kind of control over its actions and 

internal state. 

2 Reactive: Reaction to changes in the environment at 

certain times to reach its goals. 

3 Pro-active: Taking the initiative, being goal-directed. 

4 Social: Interaction with others to reach the goals. 

Prof. Dr. Jürgen Dix · Department of Informatics, TUC Multiagent Systems, WS 06/07 32/731



Pro-active alone is not sufficient (C-Programs): 

the environment can change during execution. 

Socialisation: coordination, communication, 

(negotiation) skills. 

Difficulty: right balance between pro-active 

and reactive!



Agents vs. Object Orientation 

Objects have 

1 a state (encapsulated): control over internal 

state 

2 message passing capabilities 

Java: private and public methods. 

Objects have control over their state, but not 

over their behaviour. 

An object can not prevent others to use its 

public methods.



Agents call other agents and request them to 

execute actions. 

Objects do it for free, agents do it for 

money. 

No analoga to reactive, pro-active, social in 

OO. 

MAS are multi-threaded: each agent has a 

control thread. (In OO only the system as a 

whole possesses one.)



Agents vs. Expert Systems 

Expert Systems: Buzzword in the 80ies, 

problem solving in knowledge-rich domains: 

MYCIN. 

More an assistant, than a stand-alone system. 

Only interaction with the user: disembodied. 

No cooperation with others.



Agents as Intentional Systems 

Intentions: Agents are endowed with mental 

states. 

Matthias took his umbrella because he believed 

it was going to rain. 

Kjeld attended the MAS course because he 

wanted to learn about agents. 

An intentional system describes entities whose 

behaviour can be predicted by the method of 

attributing beliefs, desires and rational acumen.



1st order Int. Syst.: beliefs and desires, but no 

beliefs and desires about beliefs, desires (no 

nesting). 

intention vs. intension: Databases: 

extensional/intensional predicates. 

Can any automaton be described as an 

agent?


1. Introduction 3. Formal Description 

1.3 Formal Description



A first mathematical description 

At first, we want to keep everything as simple as 

possible. 

Agents and environments 

An agent is situated in an environment and can 

perform actions 

A := {a 1 , . . . , a n } 

(set of actions) 

and change the state of the environment 

S := {s 1 , s 2 , . . . , s n } 

(set of states).



How does the environment (the state s) develop when an 

action a is executed? 

We describe this with a function 

env : S × A −→ 2 S . 

This includes non-deterministic environments.


How do we describe agents? 

We could take a function action : S −→ A. 

Agent 

Sensors 

Condition−action rules 

What the world 

is like now 

What action I 

should do now 

Environment 

Effectors 




Question: 

How can we describe an agent, now? 

Definition 1.10 (Purely Reactive Agent) 

An agent is called purely reactive, if its function 

is given by 

action : S −→ A.



This is too weak! 

Take the whole history (of the environment) 

into account: s 0 → a0 s 1 → a1 . . . s n → an . . .. 

The same should be done for env!



This leads to agents that take the whole 

sequence of states into account, i.e. 

action : S ∗ −→ A. 

We also want to consider the actions performed 

by an agent. This requires the notion of a run 

(next slide).



We define the run of an agent in an environment 

as a sequence of interleaved states and actions: 

Definition 1.11 (Run r, R = R act ∪ R state ) 

A run r over A and S is a finite sequence 

r : s 0 → a0 s 1 → a1 . . . s n → an . . . 

Such a sequence may end with a state s n or with 

an action a n : we denote by R act the set of runs 

ending with an action and by R state the set of 

runs ending with a state.



Definition 1.12 (Environment, 2. version) 

An environment Env is a triple 〈S, s 0 , τ〉 

consisting of 

1 the set S of states, 

2 the inital state s 0 ∈ S, 

3 a function τ : R act −→ 2 S , which describes 

how the environment changes when an 

action is performed (given the whole history).



Definition 1.13 (Agent a) 

An agent a is determined by a function 

action : R state −→ A, 

describing which action the agent performs, 

given its current history. 

Important: 

An agent system is then a pair a = 〈action, Env〉 

consisting of an agent and an environment. 

We denote by R(a, Env) the set of runs of agent 

a in environment Env.



Definition 1.14 (Characteristic Behaviour) 

The characteristic behaviour of an agent a in 

an environment Env is the set R of all possible 

runs r : s 0 → a0 s 1 → a1 . . . s n → an . . . with: 

1 for all n: a n = action(〈s 0 , a 0 . . . , a n−1 , s n 〉), 

2 for all n > 0: 

s n ∈ τ(s 0 , a 0 , s 1 , a 1 , . . . , s n−1 , a n−1 ). 

For deterministic τ, the relation “∈” can be 

replaced by “=”.



Important: 

The formalization of the characteristic behaviour 

is dependent of the concrete agent type. Later 

we will introduce further behaviours (and 

corresponding agent designs).



Equivalence 

Two agents a, b are called behaviourally 

equivalent wrt. environment Env, if 

R(a, Env) = R(b, Env). 

Two agents a, b are called behaviourally 

equivalent, if they are behaviourally equivalent 

wrt. all possible environments Env.



So far so good, but... 

What is the problem with all these agents and 

this framework in general? 

Problem 

All agents have perfect information about the 

environment! 

(Of course, it can also be seen as feature!)



We need more realistic agents! 

Note 

In general, agents do not have complete 

information about the environment! 

We extend our framework by perceptions of an 

agent. 

Definition 1.15 (Actions A, Percepts P, States S) 

A := {a 1 , a 2 , . . . , a n } is the set of actions. 

P := {p 1 , p 2 , . . . , p m } is the set of percepts. 

S := {s 1 , s 2 , . . . , s l } is the set of states


Sensors don’t need to provide perfect 

information! 

Agent 

Sensors 



is like now 

What action I 

should do now 

Environment 

Effectors 




Question: 

How can agent programs be designed? 

There are four types of agent programs: 

Simple reflex agents 

Agents that keep track of the world 

Goal-based agents 

Utility-based agents



A first realistic agent 

We consider a purely reactive agent and just 

replace states by perceptions. 

Definition 1.16 (Simple Reflex Agent) 

An agent is called simple reflex agent, if its 

function is given by 

action : P −→ A.



function SIMPLE-REFLEX-AGENT( percept) returns action 

static: rules, a set of condition-action rules 

state INTERPRET-INPUT( percept) 

rule RULE-MATCH(state, rules) 

action RULE-ACTION[rule] 

return action 

function REFLEX-AGENT-WITH-STATE( percept) returns action 

static: state, a description of the current world state 

rules, a set of condition-action rules 

state UPDATE-STATE(state, percept) 

rule RULE-MATCH(state, rules) 

action RULE-ACTION[rule] 

state UPDATE-STATE(state, action) 

return action



As before, let us now consider sequences of 

percepts: 

Definition 1.17 (Standard Agent a) 

A standard agent a is given by a function 

action : P ∗ −→ A 

together with 

see : S −→ P. 

An agent is thus a pair 〈see, action〉.



Definition 1.18 (Indistinguishable) 

Two different states s, s ′ are indistinguishable for 

an agent a, if see(s) = see(s ′ ). 

The relation “indistinguishable” on S × S is an 

equivalence relation. 

What does | ∼ | = |S| mean? 

And what | ∼ | = 1? 

As mentioned before, the characteristic behavior 

has to match with the agent design!




The characteristic behaviour of a standard agent 

〈see, action〉 in an environment Env is the set of all finite 

sequences 

where 

p 0 → a0 p 1 → a1 . . . p n → an . . . 

p 0 = see(s 0 ), 

a i = action(〈p 0 , . . . , p i 〉), 

p i = see(s i ), where s i ∈ τ(s 0 , a 0 , s 1 , a 1 , . . . , s i−1 , a i−1 ). 

Such a sequence, even if deterministic from the agent’s 

viewpoint, may cover different environmental behaviours 

(runs): 

s 0 → a0 s 1 → a1 . . . s n → an . . .



Instead of using the whole history, resp. P ∗ , one 

can also use internal states I := {i 1 , i 2 , . . . i n }. 

Definition 1.20 (State-based Agent a state ) 

A state-based agent a state is given by a function 

action : I −→ A together with 

see : S −→ P, 

and next : I × P −→ I. 

Here next(i, p) is the successor state of i if p is 

observed.


State 

Sensors 

How the world evolves 

What my actions do 



is like now 

What action I 

should do now 

Environment 

Agent 

Effectors 




The characteristic behaviour of a state-based agent 

a state in an environment Env is the set of all finite 

sequences 

with 

(i 0 , p 0 ) → a0 (i 1 , p 1 ) → a1 . . . → an (i n , p n ), . . . 

1 for all n: a n = action(i n ), 

2 for all n: next(i n , p n ) = i n+1 , 

Sequence covers the runs r : s 0 → a0 s 1 → a1 . . . where 

a j = action(i j ), 

s j ∈ τ(s 0 , a 0 , s 1 , a 1 , . . . , s j−1 , a j−1 ), 

p j = see(s j ) 




Are state-based agents more expressive than 

standard agents? How to measure? 

Definition 1.22 (Environmental behavior of a state ) 

The environmental behaviour of an agent a state 

is the set of possible runs covered by 

characteristic behaviour of the agent.



Theorem 1.23 (Equivalence) 

Standard agents and state-based agents are 

equivalent with respect to their environmental 

behaviour. 

More precisely: For each state-based agent a state 

and next storage function there exists a standard 

agent a which has the same environmental 

behaviour, and vice versa.


Goal based agents 

State 

Sensors 



Goals 


is like now 

What it will be like 

if I do action A 

What action I 

should do now 

Environment 

Agent 

Effectors 




Intention: 

We want to tell our agent what to do, without 

telling it how to do it! 

We need a utility measure.



How can we define a utility function? 

Take for example 

u : S ↦→ R 

Question: 

What is the problem with this definition? 

Better idea: 

u : R ↦→ R 

Take the set of runs R and not the set of states S. 

Food for thought: How to calculate the utility if 

the agent has incomplete information?


Agents with a utility measure 

State 



Utility 

Sensors 


is like now 

What it will be like 

if I do action A 

How happy I will be 

in such a state 

What action I 

should do now 

Environment 

Agent 

Effectors 

The utility measure can often be simulated 

through a set of goals. 




Example: Tileworld 

H 

H 

T T T 

T T T 

(a) (b) (c) 

H



How can we define an optimal agent? 

Runs occur with some probability. So we have a 

probability distribution and can try to maximize 

the expectation: 

arg max a∈Agents 

∑ 

r∈R(a,Env 

Env) 

Maximize expected utility. 

u(r) Prob(r | 〈a,Env 

Env〉) 

But how to implement it (if at all possible)?



We can restrict the set of agents, taking 

resources into account. 

Performance measures or utilities are often 

difficult to come up with.



The framework of Achievement/Maintenance 

Tasks. 

Achievement Task: Achieve state of affairs φ. 

Set of goal states. Blocksworld. Goal 

described by predicates φ: states 

satisfying φ. 

Maintenance Task: Maintain state of affairs ψ. 

Avoid certain bad states. A file 

should never be open for both reading 

and writing: failure states are specified.



Definition 1.24 (Task environments) 

A task environment is a pair 〈Env 

Env, Ψ〉) where 

Ψ : R(a,Env 

Env) −→ {0, 1} is a predicate over runs. 

Let 

R Ψ (a,Env 

Env) = {r ∈ R(a,Env 

Env)| Ψ(r)}. 

An agent a succeeds in 〈Env 

Env, Ψ〉), if 

R Ψ (a,Env 

Env) = R(a,Env 

Env).



An achievement task is given be a task 

environment 〈Env 

Env, Ψ〉), if there is some G ⊆ S 

such that 

for all r ∈ R(a,Env 

Env) : Ψ(r) iff ∃ s ∈ G : s ∈ r. 

A maintenance task is given be a task 

environment 〈Env 

Env, Ψ〉), if there is some B ⊆ S 

such that 

for all r ∈ R(a,Env 

Env) : Ψ(r) iff ∀ s ∈ B : s ∉ r.



procedure RUN-ENVIRONMENT(state, UPDATE-FN, agents, termination) 

inputs: state, the initial state of the environment 

UPDATE-FN, function to modify the environment 

agents, a set of agents 

termination, a predicate to test when we are done 

repeat 

for each agent in agents do 

PERCEPT[agent] GET-PERCEPT(agent, state) 

end 


ACTION[agent] PROGRAM[agent](PERCEPT[agent]) 

end 

state UPDATE-FN(actions, agents, state) 

until termination(state)



function RUN-EVAL-ENVIRONMENT(state, UPDATE-FN, agents, 

termination, PERFORMANCE-FN) returns scores 

local variables: scores, a vector the same size as agents, all 0 

repeat 


PERCEPT[agent] GET-PERCEPT(agent, state) 

end 


ACTION[agent] PROGRAM[agent](PERCEPT[agent]) 

end 

state UPDATE-FN(actions, agents, state) 

scores PERFORMANCE-FN(scores, agents, state) 

until termination(state) 

return scores /* change */



1 2 

3 4 

5 6 

7 8 

A 

B


1. Introduction 4. References 

1.4 References



[Bozzano et al.1999] Bozzano, M., G. Delzanno, M. Martelli, 

V. Mascardi, and F. Zini (1999). 

Multi-agent systems development as a software engineering 

enterprise. 

Springer LNCS 1551, 46–60. 

[Franklin and Graesser1997] Franklin, S. and A. Graesser (1997). 

Is it an Agent, or Just a Program? 

In J. P. Müller, M. Wooldridge, and N. R. Jennings (Eds.), 

Intelligent Agents III. 

Springer LNAI 1193.



[Iglesias et al.1999] Iglesias, C., M. Garrijo, and J. Gonzalez (1999). 

A survey of agent-oriented methodologies. 

In J. Müller, M. P. Singh, and A. S. Rao (Eds.), Proceedings of the 

5th International Workshop on Intelligent Agents V : Agent 

Theories, Architectures, and Languages (ATAL-98). 

Springer LNCS 1555, 317–330. 

[Russel and Norvig1995] Russel, S. and P. Norvig (1995). 

Artificial Intelligence — A Modern Approach. 

Prentice Hall. 

[Russell and Norvig2004] Russell, S. J. and P. Norvig (2004). 

Künstliche Intelligenz: Ein moderner Ansatz. 

Pearson.



[Subrahmanian et al.2000] Subrahmanian, V., P. Bonatti, J. Dix, 

T. Eiter, S. Kraus, F. Özcan, and R. Ross (2000). 

Heterogenous Active Agents. 

MIT Press. 

[Weiss1999] Weiss, G. (1999). 

Multi-Agent Systems. 

MIT Press. 

[Wooldridge2000] Wooldridge, M. (2000). 


MIT Press. 

[Wooldridge2002] Wooldridge, M. (2002). 

An Introduction to MultiAgent Systems. 

John Wiley & Sons.



[Wooldridge and Jennings1995] Wooldridge, M. J. and N. R. Jennings 

(1995). 

Agent Theories, Architectures and Languages: A survey. 

In M. J. Wooldridge and N. R. Jennings (Eds.), Intelligent Agents. 

Springer, LNAI 890, 1–39.


2. Basic Architectures 

Chapter 2. Basic Architectures 

Basic Architectures 

2.1 Reactive Agents 

2.2 BDI-Architecture 

2.3 AOP 

2.4 Layered Architectures 

2.5 References


2. Basic Architectures 

We are presenting, at an abstract level, the three 

main architectures: 

subsumption-, 

BDI/Agent oriented programming-, 

layered 

architectures. In the next chapter, we come 

back to the BDI approach and consider it in 

greater detail using a logic-based framework.


2. Basic Architectures 1. Reactive Agents 

2.1 Reactive Agents



Idea: 

Intelligent behaviour is Interaction of the 

agents with their environment. 

It emerges through splitting in simpler 

interactions.



Subsumption-Architectures: 

Decision making is realized through 

goal-directed behaviours: each behaviour 

is an individual action. 

nonsymbolic implementation. 

Many behaviours can be applied 

concurrently. How to select between them? 

Implementation through 

Subsumption-Hierarchies, Layers. 

Upper layers represent abstract behaviour.



Formal Model: 

see: as up to now, but close relation between 

observation and action: no transformation of the 

input. 

action: Set of behaviours and inhibition relation. 

Beh := {〈c, a〉 : c ⊆ P, a ∈ A}. 

〈c, a〉 “fires” if 

see(s) ∈ c (c stands for “condition”). 

≺ ⊆ Ag rules × Ag rules 

is called inhibition-relation, Ag rules ⊆ Beh. 

We require ≺ to be a total ordering. 

b 1 ≺b 2 means: b 1 inhibits b 2 , 

b 1 has priority over b 2 .





Example 2.1 (Exploring a Planet) 

A distant planet (asteroid) is assumed to contain 

gold. Samples should be brought to a spaceship 

landed on the planet. It is not known where the 

gold is. Several autonomous vehicles are 

available. Due to the topography of the planet 

there is no connection between the vehicles. 

The spaceship sends off radio signals: gradient 

field.



Low Level Behaviour: 

(1) If detect an obstacle then change direction. 

2. Layer: 

(2) If Samples on board and at base then drop 

off. 

(3) If Samples on board and not at base then 

follow gradient field. 

3. Layer: 

(4) If Samples found then pick them up. 

4. Layer: 

(5) If true then take a random walk. 

With the following ordering 

(1) ≺ (2) ≺ (3) ≺ (4) ≺ (5). 

Under which asumptions (on the distribution of the gold) 

does this work perfectly?



Vehicles can communicate indirectly with 

each other: 

they put off, and 

pick up 

radiactive samples that can be sensed.



Low Level Behaviour: 

(1) If detect an obstacle then change direction. 

2. Layer: 

(2) If Samples on board and at base then drop 

off. 

(3) If Samples on board and not at base then 

drop off two radioactive crumbs and follow 

gradient field. 

3. Layer: 

(4) If Samples found then pick them up. 

(5) If radioactive crumbs found then take one 

and follow the gradient field (away from the 

spaceship). 

4. Layer: 

(6) If true then take a random walk. 

With the ordering (1) ≺ (2) ≺ (3) ≺ (4) ≺ (5) ≺ (6).



Pro: Simple, economic, efficient, robust, elegant. 

Contra: 

Without knowledge about the 

environment agents need to know about 

the own local environment. 

Decisions only based on local information. 

How about bringing in learning? 

Relation between agents, environment and 

behaviours is not clear. 

Agents with ≤ 10 behaviours are doable. 

But the more layers the more complicated 

to understand what is going on.


2. Basic Architectures 2. BDI-Architecture 

2.2 BDI-Architecture



Belief, Desire, Intention.





Three main questions: 

Deliberation: How to deliberate? 

Planning: Once committed to something, how 

to reach the goal? 

Replanning: What if during execution of the 

plan, things are running out of control 

and the original plan fails?



Belief 1: 

Belief 2: 

Desire 1: 

Desire 2 (Int.): 

Desire 3: 

New Belief: 

New Belief: 

Desire 4: 



Making money is important. 

I like computing. 

Graduate in Computer Science. 

Pass the Vordiplom. 

Graduate in time, marks are unimportant. 

Money is not so important after all. 

Working scientifically is fun. 

Pursue an academic career. 

Make sure to graduate with honours. 

Study abroad.



Intentions are the most important thing. 

Beliefs and intentions generate desires. 

Desires can be inconsistent with each other. 

Intentions are recomputed based on the 

current intentions, desires and beliefs. 

Intentions should persist, normally. 

Beliefs are constantly updated and thus 

generate new desires. 

From time to time intentions need to be 

re-examined.





Deliberation has been split into two 

components: 

1 Generate options (desires). 

2 Filter the right intentions. 

(B, D, I) where B ⊆ Bel, D ⊆ Des, I ⊆ Int 

I can be represented as a stack (priorities are 

available)



An agent has commitments both to 

ends: the wishes to bring about, 

means: the mechanism to achieve a certain 

state of affairs. 

Means-end reasoning. 

What is wrong with our current control loop? 

It is overcommitted to both means and ends. 

No way to replan if something goes wrong.





goal/ 

intention/ 

task 

state of 

environment 

possible actions 

planner 

plan to achieve goal



What is a plan, planning algorithm? 

Definition 2.2 (Plan) 

A plan π is a list of primitive actions. They lead, 

by applying them successively, from the initial 

state to the goal state. 

Still overcommitted to intentions!





Still limited in the way the agent can reconsider 

its intentions





But reconsidering intentions is costly. 

Pro-active vs. reactive 

Extreme: stubborn agents, unsure agents. 

What is better? Depends on the environment. 

Let γ the rate of world change. 

1 γ small: stubbornness pays off. 

2 γ big: unsureness pays off. 

What to do? 

Meta-level control




2. Basic Architectures 3. AOP 

2.3 AOP



Shoham suggests the following three 

components of an Agent Oriented 

Programming system: 

A formal language with clear syntax for 

describing the mental state. 

A programming language for defining 

agents. 

A method of transforming legacy code into 

an agent.



We have defined an agent as a mapping from its 

beliefs and state of the world into actions. 

Shoham: an agent has commitments. One such 

commitment can be to perform an action. 

No need for action selection. But the 

language specification must have a 

mechanism for adopting commitments or 

not. 

All rules must be created at compile time: 

huge amount of possible scenarios.



Beliefs: Sentences in a point-based temporal 

framework: turn(I, left) t 

(instantaneous actions). 

go_swimming(I, CLZ) t : I believe 

now that I am going to swim in CLZ at 

time t. 

ontable(block_a) t , 

B now 

I ontable(block_b) t . 

B 3 aB 4 b is_friend(a, b). 

No distinction between actions and 

facts. 

B now 

I



Obligations: (also called commitments) are the 

beliefs that one agent will create the 

truth of a statement (for another 

agent) 

OBL t I,you 

go_swimming(I, CLZ) t+1 , 

(at time t, I am obligated to you to go 

swimming in CLZ at time t + 1). 

Does not need to be an action: 

OBL t (I,you 

you) in_lecture(you 

you) t+1 . 

A decision is an obligation to oneself: 

DEC t aφ := OBL t (a,a) φ.



Capabilities: An agent is said to be capable of a 

statement if it has the ability to see that 

that statement hold at the specified 

time: 

CANI 

now in_lecture(you) t 

(I am capable of seeing to it that you 

are in a lecture at time t.) 

CAN 5 robotopen(door) t . But capabilities 

may change.



The following properties are indispensable: 

Internal Consistency: Beliefs and desires should 

be consistent. 

Good Faith: An agent should not commit to a 

statement unless it believes it is 

capable of the statement. 

Introspection: Agents must be aware of their 

obligations: 

OBL t I,a bstatement −→ B t I OBL t I,a bstatement 

¬(OBL t I,a bstatement) −→ B t I not(OBL t I,a bstatement) 

Persistence of mental state: beliefs and 

obligations should persist by default.



AGENT-0 is based on a quantified multi-modal 

logic, with direct reference to time. Three 

modalities: Beliefs, Commitments, Capabilities. 

Definition 2.3 (Cycle) 

AGENT-0 follows the following simple control 

loop when executing a program: 

At each time step 

1 gather incoming messages and update the 

mental state accordingly, 

2 execute commitments (using capabilities).



initialize 

messages in 

beliefs 

update 

beliefs 

commitments 

update 

commitments 

abilities 

EXECUTE 

messages out 

internal actions



Definition 2.4 (Actions in AGENT-0) 

AGENT-0 distinguishes two types of actions: 

private and communicative actions. These 

actions can also be conditional actions. 

Private actions are not handled by the AOP 

interpreter. Only the user is notified that the 

action is to be executed. The syntax is: 

(DO p-action).



Communicative actions are used to deal with other 

agents. There are four classes: 

(INFORM 

(REQUEST 

(UNREQUEST 

(REFRAIN 

t a fact) 

t a action) 

t a action) 

a action) 

(t specifies the time the message is to take place, a is the 

agent that receives the message, and action is any action 

statement. An INFORM action sends the fact to agent a, a 

REQUEST notifies agent a that the requester would like 

the action to be realised. An UNREQUEST is the inverse of 

a REQUEST. A REFRAIN message asks that an action not 

be committed to by the receiving agent.



Conditional actions are of the form 

IF mntlcond action.



Definition 2.5 (AGENT-0 Programs) 

An agent in AGENT-0 consists of (1) a set of 

initial beliefs, (2) a set of capabilities, (3) a set of 

initial commitments, and (4) a set of 

commitment rules of the form 

(COMMITmsgcond mntlcond (agent 

agent, t, action)), 

“Commit to perform action for agent at time t 

(if msgcond holds of the new incoming 

messages, if mntlcond holds in the mental state, 

if the agent is currently capable of doing 

action.”



agent: this is the name of an agent; 

action: private and communicative actions as 

defined above. But only primitive 

actions are allowed. An action like Go 

to find the Gold and bring it home! is, 

most likely, not primitive. It involves 

finding a plan (sequence of primitive 

actions).



msgcond: a message condition is of the form 

(From Type Content) where From is an 

agent name, Type is a message type 

(REQUEST, etc), and Content matches 

the actual message ( (x REQUEST y) 

matches every request from every 

agent. 

mntlcond: this describes a condition about the 

own mental state of the agent, precise 

syntax is described below;



Definition 2.6 (Mental conditions mntlcond) 

Statements are built from simple atomic sentences: 

(t (rock west 65)) (“it is true at time t that there is a rock 65 

meters west”), (NOT (t (empty gasT ank))) (“it is not true 

at time t that the gas tank is empty”). 

Mental conditions take the form (B a fact) (“the agent a 

believes the fact”) or ((OBL I,a ) action) (“the agent I is 

committed to agent a to perform action”), or 

action t (“the agent I has the capability to perform 

action at time t”). 

CAN now 

I 

They can be connected conjunctively or disjunctively: 

(B x) (B y) (NOT (OBL I,a z) ∧ 

(OBL I,b z)



Existent./universally quantified variables (∃x), 

(∀x) are allowed.



Capabilities are stored in a database as pairs 

(private-action mntlcond) 

Why the additional mntlcond? 

What makes an agent do anything?



PLACA 

Shortcomings of AGENT-0: 

only commitments to primitive actions, 

agents must know the primitive actions of 

the intended recipient, 

requesting agent must develop the plan to 

achieve its goal.



Add two important things: 

1 consistent list of intentions, 

2 list of consistent plans.



New syntactic structures added by PLACA: 

(INTEND x): Intend to make sentence x true 

(add to list). 

(ADOPT x): Adopt intention/plan x to 

intention/plan list. 

(DROP x): Drop the intention/plan x. 

(CAN-DO x), (CAN-ACHIEVE x), 

(PLAN-DO x), (PLAN-ACHIEVE x), 

(PLAN-NOT-DO x): Truth statements used in 

mental conditions. Plans are created by an 

external plan generator. System can 

dynamically alter non-succeeding plans.



The Agent-K Extension 

Idea: Standardisation of message passing 

(KQML). Making it more practical and realistic. 

KQML (Knowledge Query & Manipulation 

Language): standard for specifying queries of 

expert system databases. 

KQML supports over 20 messaging types, 

AGENT-0 just 3.



Two major changes: 

Outgoing messages: Replace INFORM, 

REQUEST, and UNREQUEST message actions 

with one KQML command that takes as 

arguments the message, the time, the KQML 

type. 

Incoming messages: converted into 

AGENT-0 types (basic AGENT-0 types suffice 

for encapsulation of all KQML types). 

It is allowed that several commitments can 

match a single message (in AGENT-0 only 

the first rule matching a message is selected).



An example application of AGENT-0 

Agents: P a passenger, C an airline clerk (program), S 

C’s supervisor. 

Airline Reservation: By confirming a reservation, the airline 

commits to issue a boarding pass to the 

passenger in due time. 

January 2004: 

P to C: Please let me know about flights from 

Munich to Cologne on February 22nd. 

C to P: Flight # 123 departs on 8:15, flight # 233 

at 10:00, flight # 455 at noon, ... 

P to C: Please book me on # 123. 

C to P: Sorry, sold out.



P to C: Then please try # 233. 

C to P: Confirmed, your reservation number is 

34544567. 

P to C: Please do also book me on # 455. 

C to P: This is in conflict with # 233: no double 

bookings are allowed. 

P to C: Please get permission for it, I am a VIP. 

C to S: I request permission for the following ... 

S to C: Permission denied. 

C to P: Sorry but I can’t get approval. 

February 2004: 

P to C: Hi, I am P, I have a reservation for # 233. 

C to P: Here is your boarding pass.


Initial beliefs: 

1 Flight schedule: Database contains tuples of the form 

(time (flight(from, to, number))) 

2 Available seats: 

(time (rem_seats(time1, flightnum, seats ))) 




Capabilities: 

1. Issue boarding pass (bp) exactly one hour before departure. 

IF (B (time -1h present(pass)) ∧ 

B (time flight(?from, ?to, flightnum))) 

DO (time -1h physical_issue_bp(pass, flightnum, time) ) 

We define this statement as a macro 

issue_bp(pass, flight_num, time). 

2. Keep track of the remaining seats. 

DO 

(?time (update_rem_seats(?time1, ?flightnum, ?add_seats)) 

B 

(?time 

(rem_seats(?time1, ?flight_num, ?curr_seats)) 

∧ 

?curr_seats ≥ ?add_seats )



Commitment rules: 

1. The airline commits to answering inquires about flight 

schedules. 

COMMIT 

(?pass 

REQUEST now (IF (B?p) INFORM ?time ?pass ?p))) 

true 

(?pass 

now+1 ((IF (B?p) INFORM ?time ?pass ?p)))



2. The airline commits to issue a boarding pass upon request 

from the customer. 

COMMIT 

(?cust 

REQUEST now issue_bp(?pass, ?flight, ?time) 

(B (?time (rem_seats(?flight, ?n))) 

∧ (?n > 0) 

∧ (¬ ((OBL?anyone 

?anyone) (issue_bp 

issue_bp(?pass, ?anyflight, ?time)))) 

(myself 

, now+1, (DO update_rem_seats(?time, ?flight, −1))



Two more macros: 

2. Requests from customers need to be answered. 

(REQUEST ?time askee 

(IF (B q) (INFORM time+1 asker q))) 

If q is a variable, then we get all answers to the query q. 

query_which(?time, asker, askee, q)



2. Here we get a confirmation or disconfirmation of a fact 

q. 


(IF (B q) (INFORM time+1 asker q))) 


(IF (B ¬q) (INFORM time+1 asker ¬q))) 

query_whether(?time, asker, askee, q)



And here is the formalisation of the conversation 

above. 

agent action 

smith 

query_which(?20jan/1:00, smith, ba, 

22feb/?!time flight(Mün, Köln, ?!flightnum) 

ba INFORM 20jan/2:00 smith 

22feb/8:30 flight(Mün, Köln, #123) 


22feb/10:00 flight(Mün, Köln, #233) 


22feb/...... 

smith REQUEST 20jan/5:00 ba 

issue_bp(smith, #123, 22feb/8:30)



agent action 

smith 

query_whether(20jan/6:00, smith, ba, 

COMMIT smith issue_bp(smith, #123, 22feb/8:30) 


¬ COMMIT smith issue_bp(smith, #123, 22feb/8:30) 

smith REQUEST 20jan/8:00 ba 

issue_bp(smith, #233, 22feb/10:00) 

smith 

query_whether(20jan/9:00, smith, ba, 




· · · · · ·



agent 

smith 

ba 

action 

INFORM 22feb/9:00 ba present(smith) 

DO 22feb/9:00 

issue_bp(smith, #233, 22feb/10:00)



PRS: Procedural Reasoning System 

data input from sensors 

ENVIRONMENT 

AGENT 

BELIEFS 

PLANS 

Interpreter 

DESIRES 

INTENTIONS 

action output



No planning from first principles, but plans 

are in a library (precompiled). 

Plans are constructed in advance from the 

agent programmer. 

Plans consist of 

a goal (postcondition), 

a context, (precondition), 

a body, (actions to be carried out).



The body is not just a sequence of primitive 

actions. Also allowed are goals, disjunctions of 

goals, loops. 

Beliefs are represented as in Prolog. 

There is a top-Goal at the beginning, which is 

pushed on the intention stack. 

Agent searches through its plan library. 

Options (in the BDI framework): set of plans 

that achieve the goal and have their 

preconditions satisfied.



How to deliberate (select between the potential 

plans)? 

1 Meta-level plans. Change agent’s intentions 

at runtime. 

2 Utilities for plans. Executing the best plan 

results in other goals pushed on the stack, 

this in turn results in more plans etc.





http://www.marcush.net/IRS/irs_downloads.html


2. Basic Architectures 4. Layered Architectures 

2.4 Layered Architectures



At least 2 layers: reactive (event-driven), 

pro-active (goal directed). 

action 

output 

perceptual 

input 

Layer n 

Layer n 

Layer n 

... ... ... 

action 

Layer 2 output Layer 2 

Layer 2 

Layer 1 

Layer 1 

Layer 1 

perceptual 

input 

perceptual 

input 

action 

output 

(a) Horizontal layering 

(b) Vertical layering 

(One pass control) 

(c) Vertical layering 

(Two pass control)



Horizontal: 

Vertical: 

simple (n behaviours, n layers), 

overall behaviour might be 

inconsistent, 

Interaction between layers: m n (m 

= # actions per layer) 

Control-system is needed. 

Only m 2 (n − 1) interactions 

between layers. 

Not fault tolerant: If one layer fails, 

everything breaks down.



Touring Machine 

Autonomous Vehicle, based on horizontally 

layered architecture. 

sensor 

input 

Modelling layer 

Perception subsystem 

Planning Layer 

Action subsystem 

Reactive layer 

action 

output 

Control subsystem



Reactive Layer: e.g. Obstacle avoidance 

Rule 1: Avoid curb 

if 

then 

is_in_front(curb, observer) and 

speed(observer) > 0 and 

seperation(curb, observer) < curb_threshold 

change_orientation(curb_avoidance_angle) 

Planning-Layer: Pro-active behaviour



Modelling Layer: updating of the world, 

beliefs, predicts conflicts between 

agents, changes planning-goals 

Control-subsystem: Decides about who is active. 

Certain observations should never 

reach certain layers. Control rules.



Reactive rules only make reference to the 

current state (no reasoning). 

Right hand side contains actions, not just 

predicates. 

Planning layer does not do planning from 

first principles. It consults a library of plan 

skeletons, called schemas. 

Modelling layer considers other agents, 

predicts conflicts, generates new goals (for 

resolving conflicts). New goals are handed 

over to the planning layer.



control rules can either 

suppress sensor information (between 

control rules and control layers), or 

censor action outputs from control layers. 

Censor Rule 1: 

if 

then 

entity(obstacle6) in perception buffer 

remove_sensory_record(layer_R, entity(obstacle6)) 

Reactive layer never comes to know about 

obstacle6.



InteRRaP 

Generic MAS, based on two pass vertically 

layered architecture. 

cooperation layer 

social knowledge 

plan layer 

planning knowledge 

behaviour layer 

world model 

world interface 

perceptual input 

action output



Similar to Touring (layers). 

Each layer has associated KB’s. 

Layers interact with each other: bottom-up 

activation and top-down execution. 

Each layer implements two functions: 

situation recognition and goal activation 

(mapping a KB and goals to a set of new goals), 

planning and scheduling (selection of plans to 

execute).



Layered architectures do not have a clear 

semantics and the horizontal interaction is 

difficult.



BDI dates back to [Bratman et al.1988]. 

PRS (procedural reasoning system, 

[Georgeff and Lansky1987]) uses BDI. 

Applications: Space Shuttle (Diagnosis), 

Sydney Airport (air traffic control). 

BDI-Logics: [Rao and Georgeff1991, 

Rao and Georgeff1995a, Rao1995].


2. Basic Architectures 5. References 

2.5 References



[Bozzano et al.1999] Bozzano, M., G. Delzanno, 

M. Martelli, V. Mascardi, and F. Zini (1999). 

Multi-agent systems development as a software 

engineering enterprise. 

Lecture Notes in Computer Science 1551, 46–60. 

[Bratman et al.1988] Bratman, M., D. Israel, and 

M. Pollack (1988). 

Plans and Resource-Bounded Practical Reasoning. 

Computational Intelligence 4(4), 349–355.



[Brooks1986] Brooks, R. A. (1986). 

A robust layered control system for a mobile robot. 

IEEE Journal of Robotics and Automation RA-2, 14–23. 

[Brooks1991] Brooks, R. A. (1991). 

Integrated systems based on behaviors. 

SIGART Bulletin 2 4(2), 46–50. 

[Georgeff and Lansky1987] Georgeff, M. and A. Lansky 

(1987). 

Reactive Reasoning and Planning. 

In AAAI, Seattle, WA, pp. 677–682.



[Georgeff et al.1999] Georgeff, M., B. Pell, M. Pollack, 

M. Tambe, and M. Wooldridge (1999). 

The belief-desire-intention model of agency. 

In J. Müller, M. P. Singh, and A. S. Rao (Eds.), 

Proceedings of the 5th International Workshop on 

Intelligent Agents V : Agent Theories, Architectures, and 

Languages (ATAL-98), Volume 1555, pp. 1–10. 

Springer-Verlag: Heidelberg, Germany.



[Hindriks et al.1998] Hindriks, K. V., F. S. de Boer, H. Hoek, 

and J.-J. C. Meyer (1998). 

Formal semantics of the core of AGENT-0. 

In ECAI’98 Workshop on Practical Reasoning and 

Rationality, pp. 20–29. 

[Iglesias et al.1999] Iglesias, C., M. Garrijo, and 

J. Gonzalez (1999). 

A survey of agent-oriented methodologies. 

In J. Müller, M. P. Singh, and A. S. Rao (Eds.), 

Proceedings of the 5th International Workshop on 

Intelligent Agents V : Agent Theories, Architectures, and



Languages (ATAL-98), Volume 1555, pp. 317–330. 

Springer-Verlag: Heidelberg, Germany. 

[Mascardi et al.2004] Mascardi, V., M. Martelli, and 

S. Sterling (2004). 



Theory and Practice of Logic Programming, 

4(4)*429-494.



[Rao1995] Rao, A. S. (1995). 

Decision Procedures for Propositional Linear-Time 

Belief-Desire-Intention Logics. 

In M. Wooldridge, J. Müller, and M. Tambe (Eds.), 

Intelligent Agents II – Proceedings of the 1995 Workshop 

on Agent Theories, Architectures and Languages (ATAL-95), 

Volume 890 of LNAI, pp. 1–39. Berlin, Germany: 

Springer-Verlag.



[Rao and Georgeff1991] Rao, A. S. and M. Georgeff 

(1991). 

Modeling Rational Agents within a BDI-Architecture. 

In J. F. Allen, R. Fikes, and E. Sandewall (Eds.), 

Proceedings of the International Conference on Knowledge 

Representation and Reasoning, Cambridge, MA, pp. 

473–484. Morgan Kaufmann.



[Rao and Georgeff1995a] Rao, A. S. and M. Georgeff 

(1995a, June). 

Formal models and decision procedures for multi-agent 

systems. 

Technical Report 61, Australian Artificial Intelligence 

Institute, Melbourne. 

[Rao and Georgeff1995b] Rao, A. S. and M. P. Georgeff 

(1995b). 

BDI-agents: from theory to practice. 

In Proceedings of the First Intl. Conference on Multiagent 

Systems, San Francisco.



[Shoham1993] Shoham, Y. (1993, March). 

Agent-oriented programming. 

Artificial Intelligence 60, 51–92. 

[Ue et al.1999] Ue, A., P. Edwards, and W. H. E. Davies 

(1999, October 23). 

Agent-K: An integration of AOP and KQML.


3. Decision Making (1) 

Chapter 3. Decision Making (1) 

Decision Making (1) 

3.1 Evaluation Criteria 

3.2 Non-coop Games 

3.3 (Im-) Perfect Information 

3.4 Repeated Games 

3.5 References



Outline (1) 

We start by emphasising the difference between 

classical AI and Agent Systems and present 

several 

evaluation criteria for comparing protocols. 

We also 

introduce non-cooperative games and 

state the minmax theorem and Nash's 

theorem.



Outline (2) 

We end this chapter with 

imperfect information games, where agents 

do not have complete knowledge, 

repeated games, like bargaining 

mechanisms, which lead to interesting 

protocols for infinite games.



Classical DAI: System Designer fixes an 

Interaction-Protocol which is uniform 

for all agents. The designer also fixes 

a strategy for each agent. 

What is a the outcome, assuming that the 

protocol is followed and the agents follow the 

strategies?



MAI: Interaction-Protocol is given. Each 

agent determines its own strategy 

(maximising its own good, via a utility 

function, without looking at the global 

task). 

Global optimum 

What is the outcome, given a protocol that 

guarantees that each agent’s desired local 

strategy is the best one (and is therefore chosen 

by the agent)?


3. Decision Making (1) 1. Evaluation Criteria 

3.1 Evaluation Criteria



We need to compare protocols. Each such protocol leads 

to a solution. So we determine how good these solutions 

are. 

Social Welfare: Sum of all utilities 

Pareto Efficiency: A solution x is Pareto-optimal, if 

there is no solution x ′ with: 

(1) ∃ agent ag : ut ag (x x ′ ) > ut ag (x) 

(2) ∀ agents ag ′ : ut ag ′(x′ 

x ′ ) ≥ ut ag ′(x). 

Individual rational: if the payoff is higher than not 

participating at all.



Stability: 

Case 1: Strategy of an agent 

depends on the others. 

The profile SA ∗ = 〈S∗ 1 , S∗ 2 , . . . , S∗ |A| 〉 is 

called a Nash-equilibrium, iff ∀i : Si 

∗ 

is the best strategy for agent i if 

all the others choose 

〈S1 ∗, S∗ 2 , . . . , S∗ i−1 , S∗ i+1 , . . . , S∗ |A| 〉. 

Case 2: Strategy of an agent does 

not depend on the others. 

Such strategies are called dominant.



Example 3.1 (Prisoners Dilemma, Type 1) 

Two prisoners are suspected of a crime (which they both 

committed). They can choose to (1) cooperate with each 

other (not confessing to the crime) or (2) defect (giving 

evidence that the other was involved). Both cooperating 

(not confessing) gives them a shorter prison term than 

both defecting. But if only one of them defects (the 

betrayer), the other gets maximal prison term. The 

betrayer then has maximal payoff. 

Prisoner 1 

cooperate 

defect 

Prisoner 2 

cooperate defect 

(3,3) (0,5) 

(5,0) (1,1)



Social Welfare: Both cooperate, 

Pareto-Efficiency: All are Pareto optimal, 

except when both defect. 

Dominant Strategy: Both defect. 

Nash Equilibrium: Both defect.


3. Decision Making (1) 2. Non-coop Games 

3.2 Non-coop Games



Prisoners dilemma revisited: c a d b 

Prisoner 1 cooperate 

defect 

Prisoner 2 

cooperate defect 

(a,a) (c,b) 

(b,c) (d,d)



Example 3.2 (Trivial mixed-motive game, Type 0) 

Player 1 C D 

Player 2 

C D 

(4, 4) 

(3, 2) 

(2, 3) 

(1, 1)



Example 3.3 (Battle of the Bismarck Sea) 

In 1943 the northern half of New Guinea was controlled by 

the Japanese, the southern half by the allies. The Japanese 

wanted to reinforce their troops. This could happen using 

two different routes: (1) north (rain and bad visibility) or 

(2) south (weather ok). Trip should take 3 days. 

The allies want to bomb the convoy as long as possible. If 

they search north, they can bomb 2 days (independently 

of the route taken by the Japanese). If they go south, they 

can bomb 3 days if the Japanese go south too, and only 1 

day, if the Japanese go north.



Allies 

Search North 

Search South 

Japanese 

Sail North Sail South 

2 days 2 days 

1 day 3 days 

Allies: What is the largest of all row minima? 

Japanese: What is smallest of the column maxima? 

Battle of the Bismarck sea: 

largest row minimum = smallest column maximum. 

This is called a saddle point.



Example 3.4 (Rochambeau Game) 

Also known as paper, rock and scissors: paper 

covers rock, rock smashes scissors, scissors cut 

paper. 

Max 

P 

S 

R 

Min 

P S R 

0 -1 1 

1 0 -1 

-1 1 0



Definition 3.5 (n-Person Normal Form Game) 

A finite n-person normal form game is a tuple 

〈N, A, O, ϱ, µ〉, where 

N = {1, . . . i, . . . , n} is a finite set of players. 

A = 〈A 1 , . . . , A i , . . . A n 〉 where A i is the set of actions 

available to player i. A vector a ∈ A is called action 

profile. The elements of A i are also called pure 

strategies. 

O is the set of outcomes. 

ϱ : A −→ O assigns each action profile an outcome. 

µ = 〈µ 1 , . . . µ i , . . . µ n 〉 where µ i : O −→ R is a 

real-valued utility (payoff) function for player i.



Games can be represented graphically using an 

n-dimensional payoff matrix. Here is a generic picture 

for 2-player, 2-strategy games: 

Player 1 a1 1 

a 2 1 

Player 2 

a 1 2 a 2 2 

(µ 1 (a 1 1, a 1 2), µ 2 (a 1 1, a 1 2) 

(µ 1 (a 2 1, a 2 2), µ 2 (a 2 1, a 2 2) 

(µ 1 (a 1 1, a 1 2), µ 2 (a 1 1, a 2 2) 

(µ 1 (a 2 1, a 1 2), µ 2 (a 2 1, a 2 2) 

We often forget about ϱ (thus we are making no 

distinction between actions and outcomes). Thus we 

simply write µ 1 (a 1 1, a 1 2) instead of the more precise 

µ 1 (ϱ(〈a 1 1, a 1 2〉).



Definition 3.6 (Common Payoff Game) 

A common payoff game (team game) is a game 

in which for all action profiles a ∈ A 1 × . . . × A n 

and any two agents i, j the following holds: 

µ i (a) = µ j (a). 

In such games agents have no conflicting 

interests. Their graphical depiction is simpler 

than above (the second component is not 

needed).



The opposite of a team game is a 

Definition 3.7 (Constant Sum Game) 

A 2-player n-strategy normal form game is called 

constant sum game, if there exists a constant c 

such that for each strategy profile a ∈ A 1 × A 2 : 

µ 1 (a) + µ 2 (a) = c. 

We usually set wlog c = 0 (zero sum games). 

What we are really after is strategies. A pure 

strategy is one where one action is chosen and 

played. But does this always make sense?



Example 3.8 (Battle of the Sexes, Type 2) 

Married couple looks for evening entertainment. 

They prefer to go out together, but have 

different views about what to do (say going to 

the theatre and eating in a gourmet restaurant). 

Husband Theatre 

Restaurant 

Wife 

Theatre Restaurant 

(4,3) (2,2) 

(1,1) (3,4)



Example 3.9 (Leader Game, Type 3) 

Two drivers attempt to enter a busy stream of 

traffic. When the cross traffic clears, each one 

has to decide whether to concede the right of 

way of the other (C) or drive into the gap (D). If 

both decide for C, they are delayed. If both 

decide for D there may be a collision. 

Driver 1 C D 

Driver 2 

C D 

(2,2) 

(4,3) 

(3,4) 

(1,1)



Example 3.10 (Fighters and Bombers) 

Consider fighter pilots in WW II. A good strategy to attack 

bombers is to swoop down from the sun: Hun-in-the-sun 

strategy. But the bomber pilots can put on their 

sunglasses and stare into the sun to watch the fighters. So 

another strategy is to attack them from below Ezak-Imak 

strategy: if they are not spotted, it is fine, if they are, it is 

fatal for them (they are much slower when climbing). The 

table contains the survival probabilities of the fighter 

pilot. 

Fighter Pilots 

Hun-in-the-Sun 

Ezak-Imak 

Bomber Crew 

Look Up Look Down 

0.95 1 

1 

0



Example 3.11 (Matching Pennies Game) 

Two players display one side of a penny (head or 

tails). Player 1 wins the penny if they display the 

same, player 2 wins otherwise. 

Player 1 Head 

Tails 

Player 2 

Head Tails 

(1, -1) (-1, 1) 

(-1, 1) (1, -1)



Definition 3.12 (Mixed Strategy for NF Games) 

Let 〈N, A, O, ϱ, u〉 be normal form game. For a set X let 

Π(X) be the set of all probability distributions over X. 

The set of mixed strategies for player i is the set 

S i = Π(A i ). The set of mixed strategy profiles is 

S 1 × . . . × S n . 

The support of a mixed strategy is the set of actions that 

are assigned non-zero probabilities. 

What is the payoff of such strategies? We have to take into 

account the probability with which an action is chosen. 

This leads to the expected utility µ expected .



Definition 3.13 (Expected Utility for player i) 

The expected utility for player i of the mixed 

strategy profile (s 1 , . . . , s n ) is defined as 

µ expected (s 1 , . . . , s n ) = ∑ n∏ 

µ i (ϱ(a)) s j (a j ). 

a∈A 

j=1 

What is the optimal strategy (maximising the 

expected payoff) for an agent in an 2-agent 

setting?



Definition 3.14 (Maxmin strategy) 

Given a game 〈{1, 2}, {A 1 , A 2 }, {µ 1 , µ 2 }〉, the maxmin 

strategy of player i is a mixed strategy that maximises the 

guaranteed payoff of player i, no matter what the other 

player −i does: 

argmax si 

min 

s −i 

µ expected 

i (s 1 , s 2 ) 

The maxmin value for player i is 

max si min s−i µ expected 

i (s 1 , s 2 ). 

The minmax strategy for player i is 

argmin si 

max 

s −i 

µ expected 

−i (s 1 , s 2 ) 

and its minmax value is min s−i max si µ expected 

−i (s 1 , s 2 ).



Lemma 3.15 

In each finite normal form 2-person game (not 

necessarily constant sum), the maxmin value of 

one player is never strictly greater than the 

minmax value for the other.



We illustrate the maxmin strategy using a 

2-person 3-strategy constant sum game: 

Player A A-I 

A-II 

Player B 

B-I B-II B-III 

0 

1 

5 

6 

1 

2 

We assume Player A’s optimal strategy is to play 

strategy 

A-I with probability x and 

A-II with probability 1 − x. 

In the following we want to determine x. 

1 

2 

3 

4



Thus Player A’s expected utility is as follows: 

1 when playing against B-I: 0x + 1(1 − x) = 1 − x, 

2 when playing against B-II: 5 6 x + 1 2 (1 − x) = 1 2 + 1 3 x, 

3 when playing against B-III: 1 2 x + 3 4 (1 − x) = 3 4 − 1 4 x. 

This can be illustrated with the following picture (see 

blackboard). Thus B-III does not play any role. 

Thus the maxmin point is determined by setting 

1 − x = 1 2 + 1 3 x, 

which gives x = 3 8 . The value of the game is 5 8 . 

The strategy for Player B is to choose B-I with probability 1 4 

and B-II with probability 3 4 .



More in accordance with the minmax strategy let us 

compute 

argmax si 

min 

s −i 

µ expected 

i (s 1 , s 2 ) 

We assume Player A plays (as above) A-I with probability x 

and A-II with probability 1 − x (strategy s 1 . Similarly, Player 

B plays B-I with probability y and B-II with probability 1 − y 

(strategy s 2 ). 

We compute µ expected 

1 (s 1 , s 2 ) 

thus 

0 · x · y + 5 6 x(1 − y) + 1 · (1 − x)y + 1 (1 − x)(1 − y) 

2 

µ expected 

1 (s 1 , s 2 ) = y((− 4 3 x) + 1 2 ) + 1 3 x + 1 2



According to the minmax strategy, we have to choose x 

such that the minimal values of the above term are 

maximal. For each value of x the above is a straight line 

with some gradient. Thus we get the maximum when the 

line does not slope at all! 

Thus x = 3 8 . A similar reasoning gives y = 1 4 .



Theorem 3.16 (von Neumann (1928)) 

In any finite 2-person constant-sum game the following 

holds: 

1 The maxmin value for one player is equal to the minmax 

value for the other. The maxmin of player 1 is usually 

called value of the game. 

2 For each player, the set of maxmin strategies coincides 

with the set of minmax strategies. 

3 The maxmin strategies are optimal: if one player does 

not play a maxmin strategy, then its payoff (expected 

utility) goes down. 

What is the optimal strategy (maximising the expected 

payoff) for an agent in an n-agent setting?



Figure 1: A saddle.



Definition 3.17 (Best Response to a Strategy Profile) 

Given a strategy profile 

〈s 1 , s 2 , . . . , s i−1 , s i+1 , . . . , s n 〉, 

the best response of player i to it is any mixed 

strategy s ∗ i ∈ S i such that 

µ i (s ∗ i , 〈s 1 , s 2 , . . . , s i−1 , s i+1 , . . . , s n 〉) ≧ 

µ i (s i , 〈s 1 , s 2 , . . . , s i−1 , s i+1 , . . . , s n 〉) 

for all strategies s i ∈ S i . 

Is the best response unique?



Example 3.18 (Best responses sets for Rochambeau) 

How does the set of best responses look like? 

1 Player 2 plays the pure strategy P. 

2 Player 2 plays P with probability .5 and S with 

probability .5. 

3 Player 2 plays P with probability 1 3 

and S with 

probability 1 3 and R with probability 1 3 . 

Is a non-pure strategy in the best response set 

(say a strategy (s 1 , s 2 ) with probabilities 〈p, 1 − p〉, 

p ≠ 0), then so are all other mixed strategies with 

probabilities 〈p ′ , 1 − p ′ 〉 where p ≠ p ′ ≠ 0.



Definition 3.19 (Nash-Equilibrium) 

A strategy profile 〈s ∗ 1, s ∗ 2, . . . , s ∗ n〉 is a Nash 

equilibrium if for any agent i, s ∗ i is the best 

response to 〈s ∗ 1, s ∗ 2, . . . , s ∗ i−1 , s∗ i+1 , . . . , s∗ n〉. 

What are the Nash equilibria in the Battle of 

sexes? What about the matching pennies?



Example 3.20 (Cuban Missile Crisis, Type 4) 

This relates to the well-known crisis in October 

1962. 

U. S. 

Blockade 

Air strike 

Withdrawal 

(3, 3) 

Compromise 

(4, 2) 

U.S. victory 

USSR 

Maintain 

(2, 4) 

USSR victory 

(1, 1) 

Nuclear War



Theorem 3.21 (Nash (1950)) 

Every finite normal form game has a Nash 

equilibrium. 

Corollary 3.22 (Nash Eq. in constant-sum Games) 

In any finite normal form 2-person constant-sum 

game, the Nash equilibria are exactly all pairs 

〈s 1 , s 2 〉 of maxmin strategies (s 1 for player 1, s 2 

for player 2). 

All Nash equilibria have the same payoff 

(expected utility): the value of the game, that 

player 1 gets.


Proof 

We use Kakatuni’s theorem: Let X be a nonempty subset 

of n-dimensional Euclidean space, and f : X −→ 2 X . The 

following are sufficient conditions for f to have a fixed 

point (i.e. an x ∗ ∈ X with x ∗ ∈ f(x ∗ )): 

1 X is compact: any sequence in X has a limit in X. 

2 X is convex: x, y ∈ X, α ∈ [0, 1] ⇒ αx + (1 − α)y ∈ X. 

3 ∀x : f(x) is nonempty and convex. 

4 For any sequence of pairs (x i , x ∗ i ) such that x i , x ∗ i ∈ X 

and x ∗ i ∈ f(x i ), if lim i→∞ (x i , x ∗ i ) = (x, x ∗ ) then 

x ∗ ∈ f(x). 




Let X consist of all mixed strategy profiles and let f be the 

best response set: f(〈s 1 , . . . , s n 〉) is the set of all best 

responses 〈s ′ 1, . . . , s ′ n〉 (where s ′ i is players i best response 

to 〈s 1 , . . . , s i−1 , s i+1 , . . . s n 〉). 

Why is that a subset of an n-dimensional Euclidean 

space? 

A mixed strategy over k actions (pure strategies) is a 

k − 1 dimensional simplex (namely the one satisfying 

∑ k 

i=1 p i = 1). Therefore X is the cartesian product of n 

simplices. X is compact and convex (why?). 

The function f satisfies the remaining properties listed 

above. Thus there is a fixed point and this fixed point is a 

Nash equilibrium.



Theorem 3.23 (Brouwer’s Fixpoint Theorem) 

Let D be the unit Euclidean ball in R n and let f : D → D be a 

continuous mapping. Then there exists x ∈ D with f(x) = x. 

Proof 

Reduction to the C 1 -differentiable case: Let r : R n → R 

be the function 

r(x) = r(x 1 , x 2 , . . . , x n ) = √ x 2 1 + x 2 2 + · · · + x 2 n. 

Let φ : R n → R given by: 

φ(x) = a n (1/4r(x) 4 − 1/2r(x) 2 + 1/4) 

on D and equal to 0 in the complement of D, 

where the constant a n is chosen such that the 

integral of φ over R n equals 1.



Let f : D → D be continuous. Let F : R n → D be the 

extension of f to R n , for which we have 

F (x) = f( x 

||x|| ) on Rn \ D. 

For k ∈ N, k ≠ 0, put f k (x) := ∫ R n k n φ(ky)F (x − y)dy. 

Show that the restriction of f k to D maps D into D. Show 

that the mappings F k are continuously differentiable and 

approximate in the topology of uniform convergence the 

mapping F . Show that if there exists a continuous 

mapping f : D → D without fixed points, then there will 

also exist a continuously differential mapping without 

fixed points. It follows, that it suffices to proof the 

Brouwer Fixpoint Theorem only for continuously 

differentiable mappings.



Proof for C 1 -differentiable mappings: The proof is by 

contradiction. 

Assume, that the continuously differential mapping 

f : D → D has no fixed points. Let g : D → ∂D the 

mapping, such that for every point x ∈ D the points 

f(x), x, g(x) are in that order on a line of R n . The mapping 

g is also continuously differentiable and satisfies g(x) = x 

for x ∈ ∂D. We write g(x) = (g 1 (x), g 2 (x), . . . , g n (x)) and 

get (for x ∈ ∂D and i = 1 . . . n) g i (x 1 , x 2 , . . . , x n ) = x i . Note 

dg 1 ∧ dg 2 ∧ · · · ∧ dg n = 0 since g 2 1 + g 2 2 + · · · + g 2 n = 1. Then: 

0 ≠ ∫ D dx 1 ∧ dx 2 ∧ · · · ∧ dx n = ∫ ∂D x 1 ∧ dx 2 ∧ · · · ∧ dx n 

= ∫ ∂D g 1 ∧ dg 2 ∧ · · · ∧ dg n = ∫ D dg 1 ∧ dg 2 ∧ · · · ∧ dg n 

= ∫ D 0 = 0



It is obvious that there is not always a strategy 

that is strictly dominating all others (this is why 

the Nash equilibrium has been introduced). 

Reducing games 

However, often games can be reduced and the 

computation of the equilibrium simplified.



Definition 3.24 (Dominating strategy) 

A strategy is strictly dominated for an agent, if 

there is some other strategy that strictly 

dominates it. 

Theorem 3.25 (Church-Rosser) 

Given a 2-person 〈n, m〉 strategy game. All 

strictly dominated columns, as well as all 

strictly dominated rows can be eliminated. This 

results in a finite series of reduced games. The final 

result does not depend on the order of the 

eliminations.



Note that we eliminate only pure strategies. 

Such a strategy might be dominated by a mixed 

strategy. 

L C R 

U 〈3, 2〉〈2, 1〉〈3, 1〉 

M 〈1, 1〉〈1, 1〉〈2, 2〉 

D 〈0, 1〉〈4, 2〉〈0, 1〉 

1 Eliminate row M. 

2 Eliminate column R.



This leads to 

L C 

U 〈3, 2〉〈2, 1〉 

D 〈0, 1〉〈4, 2〉


3. Decision Making (1) 3. (Im-) Perfect Information 

3.3 (Im-) Perfect Information



We have previously introduced normal form games 

(Definition 3.5). This notion does not allow to deal with 

sequences of actions that are reactions to actions of the 

opponent. 

Extensive form (tree form) games 

Unlike games in normal form, those in extensive form do 

not assume that all moves between players are made 

simultaneously. This leads to a tree form, and allows to 

introduce strategies, that take into account the history of 

the game. 

We distinguish between perfect and imperfect 

information games. While the former assume that the 

players have complete knowledge about the game, the 

latter do not: a player might not know exactly which node 

it is in.



The following definition covers a game as a tree: 

Definition 3.26 (Extensive form Games, Perfect Inf.) 

A finite perfect information game in extensive form is a 

tuple G = 〈A, A, H, Z, α, ρ, σ, µ 1 , . . . , µ n 〉 where 

A is a set of n players, A is a set of actions 

H is a set of non-terminal nodes, Z a set of terminal 

nodes, H ∩ Z = ∅, 

α : H → A assigns to each node a set of actions, 

ρ : H → N assigns to each non-terminal node a player 

who chooses an action at that node, 

σ : H × A → H ∪ Z assigns to each (node,action) a 

successor node (h 1 ≠ h 2 implies σ(h 1 , a 1 ) ≠ σ(h 2 , a 2 )), 

µ i : Z → R are the utility functions.



Such games can be visualised as finite trees. 

Here is the famous “Sharing Game”. 

Figure 2: The Sharing game.


Here is another (generic) game. 

Figure 3: A generic game. 




Transforming extensive form games into 

normal form 

Note that the definitions of best response and 

Nash equilibria carry over to games in extensive 

form. Indeed we have: 

Lemma 3.27 (Extensive form ↩→ Normal form) 

Each game in extensive form can be transformed 

in normal form (such that the strategy spaces are 

the same).



Is there a converse of Lemma 3.27? 

We consider prisoner’s dilemma and try to 

model a game in extensive form with the same 

payoffs and strategy profiles.



In fact, it is not surprising that we do not 

succeed in the general case: 

Theorem 3.28 (Zermelo, 1913) 

Every perfect information game in extensive form 

has a pure strategy Nash equilibrium. 

We will later introduce imperfect information 

games (in extensive form): Slide 234.



Example 3.29 (Unintended Nash equilibria) 

We consider the following perfect information 

game in extensive form. 

Figure 4: Unintended Equilibrium.



This leads to the notion of subgame perfect Nash 

equilibria: 

Definition 3.30 (Subgame perfect Nash equilibria) 

Let G be a perfect information game in extensive form. 

Subgame: A subgame of G rooted at node h is the 

restriction of G to the descendants of h. 

SPE: The subgame perfect Nash equilibria of a 

perfect information game G in extensive form 

are those Nash equilibria of G, that are also 

Nash equilibria for all subgames G ′ of G.



Figure 5: The Centipede game.



Definition 3.31 (Extensive form Games, Imperfect Inf.) 

A finite imperfect information game in 

extensive form is a tuple 

G = 〈A, A, H, Z, α, ρ, σ, µ 1 , . . . , µ n , I 1 , . . . , I n 〉 

where 

〈A, A, H, Z, α, ρ, σ, µ 1 , . . . , µ n 〉 is a perfect 

information game in the sense of 

Definition 3.26 on Slide 225, 

I i are equivalence relations on 

{h ∈ H : ρ(h) = i} such that h, h ′ ∈ I i implies 

α(h) = α(h ′ ).



Figure 6: An imperfect game.



Definition 3.32 (Pure strategy in imperfect inf. games) 

Given an imperfect information game in 

extensive form, a pure strategy for player i is a 

vector 〈a 1 , . . . , a k , 〉 with a j ∈ α(I (i,j) ) where 

I (i,1) , . . . I (i,k) are the k equivalence classes for 

agent i.



Figure 7: Prisoner’s dilemma. 

There is a pure strategy Nash equilibrium.



NF game ↩→ Imperfect game 

Each game in normal form can be transformed 

into an imperfect information game in extensive 

form. 

Each imperfect information game in extensive 

form can be transformed into a game in normal 

form. 

This is obvious if we consider pure strategies. 

But what about mixed strategies? 

What is the set of mixed strategies for an 

imperfect game?



SPE: What about subperfect equilibria 

(analogue of Definition 3.30 on 

Slide 232 for imperfect games? 

First try: In each information set, we have a set 

of subgames (a forest). Why not 

asking that a strategy should be a best 

response in all subgames of that 

forest?



Figure 8: Subgames in Imperfect Games.



Nash equilibria: (L, U) and (R, D). 

In one subgame, U dominates D, in the other 

D dominates U. 

But (R, D) seems to be the unique choice: 

both players can put themselves into the 

others place and reason accordingly. 

Requiring that a strategy is best response 

to all subgames is too strong.



We are not giving the precise definitions but 

state the following result. 

Theorem 3.33 (Existence of Sequential Equilibria) 

Each imperfect information game in extensive form 

with perfect recall has a sequential equilibrium.


3. Decision Making (1) 4. Repeated Games 

3.4 Repeated Games



Often games are not just played once. They 

are repeated finitely often (until consensus is 

reached). 

Sometimes, infinitely repeated games can 

be used to define equilibria. 

In repeated games, it makes sense to make its 

choices dependent on the previous game (or 

the whole history).



Figure 9: Iterated Prisoners Dilemma.



Axiomatic Bargaining 

We assume two agents 1, 2 , each with a utility 

function µ i : E → R. If the agents do not agree 

on a result e the fallback e fallback is taken. 

Example 3.34 (Sharing 1 Pound) 

How to share 1 Pound? Let e fallback = 0. 

Agent 1 offers ρ (0 < ρ < 1). Agent 2 agrees! 

Such deals are individually rational and each 

one is in Nash-equilibrium! 

Therefore we need axioms!



Axioms on the global solution 

µ ∗ = 〈µ µ 1 (e ∗ ),µ µ 2 (e ∗ )〉. 

Invariance: Absolute values of the utility 

functions do not matter, only relative 

values. 

Symmetry: Changing the agents does not 

influence the solution. 

Irrelevant Alternatives: If E is made smaller but 

e ∗ still remains, then e ∗ remains the 

solution. 

Pareto: The players can not get a higher utility 

than µ ∗ = 〈µ µ 1 (e ∗ ),µ µ 2 (e ∗ )〉.



Theorem 3.35 (Unique Solution) 

The four axioms above uniquely determine a 

solution. This solution is given by 

e ∗ = arg max e {(µ µ 1 (e) − µ 1 (e fallback )) × (µ µ 2 (e) − µ 2 (e fallback ))}.



Strategic Bargaining 

No axioms: view it as a game! 

Example revisited: Sharing 1 Pound Sterling. 

Protocol with finitely many steps: The last 

offerer just offers ɛ. This should be accepted, so 

the last offerer gets 1 − ɛ. 

This is unsatisfiable.



Ways out: 

1 Add a discountfactor δ: in round n, only the 

δ n−1 th part of the original value is available. 

2 Bargaining costs: bargaining is not for 

free—fees have to be paid.



Finite Games: Suppose δ = 0.9. Then the 

outcome depends on # rounds. 

Round 1’s share 2’s share Total value Offerer 

. . . 

. 

. 

n − 3 0.819 0.181 0.9 n−4 2 

n − 2 0.91 0.09 0.9 n−3 1 

n − 1 0.9 0.1 0.9 n−2 2 

n 1 0 0.9 n−1 1 

Infinite Games: δ 1 factor for agent 1, δ 2 

factor for agent 2.



Problem: Nash equilibria in sequential games 

might exist in the first stages, but not 

later. 

Solution: We consider subgame perfect Nash 

equilibria: Nash equilibria that remain 

Nash equilibria in every possible 

subgame.



Theorem 3.36 (Unique solution for infinite games) 

In a discounted infinite round setting, there exists a 

unique subgame perfect Nash equilibrium: 

Agent 1 gets 1−δ 2 

1−δ 1 δ 2 

. Agent 2 gets the rest. 

Agreement is reached in the first round. 

Proof.



Bargaining Costs 

Agent 1 pays c 1 , agent 2 pays c 2 . 

Protocol: After each round, the roles change 

and the fee is subtracted (c 1 from agent 1, c 2 

from agent 2). Therefore the game is finite. 

c 1 = c 2 : Any split is in subgame perfect 

Nash-equilibrium. 

c 1 < c 2 : Only one subgame perfect Nash 

equilibrium: Agent 1 gets all. 

c 1 > c 2 : Only one subgame perfect Nash 

equilibrium: Agent 1 gets c 2 , agent 2 

gets 1 − c 2 .


3. Decision Making (1) 5. References 

3.5 References



[FudTir] Fudenberg, D. and J. Tirole (1991). 

Game Theory. 

MIT Press. 

[Nash50] Nash, J. (1950). 

Equilibrium points in n-person games. 

Proceedings of the National Academy of Sciences of the 

United States of America 36, 48–49. 

[OsbRub] Osborne, M. and A. Rubinstein (1994). 

A Course in Game Theory. 

MIT Press.



[Rosenthal73] Rosenthal, R. (1973). 

A class of games possessing pure-strategy Nash 

equilibria. 

International Journal of Game Theory 2, 65–67.



Chapter 4. Decision Making (2) 

Decision Making (2) 

4.1 Voting 

4.2 Tactical Voting 

4.3 Auctions 

4.4 Market Mechanisms 

4.5 References



Outline (1) 

We deal with voting systems and discuss 

Arrow's theorem in different settings. 

A problem we have to face is that agents are 

lying to get what they want: 

Tactical Voting is an important area in MAS. 

The Gibbard/Satterthwaite theorem is 

similar to Arrow’s theorem and deals with 

non-manipulable voting systems.



Outline (2) 

We end this section with 

a short overview on General Market 

Criteria, which play an important role in 

economics.


4. Decision Making (2) 1. Voting 

4.1 Voting



Voting procedure 

Agents give input to a mechanism: the outcome 

is taken as a solution for the agents. 

Non-ranking voting: Each agent votes for 

exactly one candidate. The winner is 

the one with a majority of votes. 

Approval voting: Each agent can cast a vote for 

as many candidates as she wishes (at 

most one for each candidate). 

Ranking voting: Each agent expresses his full 

preference over the candidates.



Definition 4.1 (Condorcet- Condition, Set) 

The Condorcet condition is the following: If a candidate x 

is chosen, then for any other candidate y, it must be the 

case that at least half the voters prefer x to y. 

The Condorcet set is the set of candidates that meet the 

Condorcet condition. 

1 2 3 

w 1 A B C 

w 2 B C A 

w 3 C B A 

Figure 10: A Tie, but Condorcet helps.



1 2 3 

w 1 A B C 

w 2 B C A 

w 3 C A B 

Figure 11: Condorcet rules out all candidates. 

Comparing A and B: majority for A. 

Comparing A and C: majority for C. 

Comparing B and C: majority for B. 

Desired Preference ordering: A > B > C > A



Lemma 4.2 

In approval voting, at least one of the winners is 

in the Condorcet set. 

Definition 4.3 (Borda Protocol) 

Suppose O is the number of candidates. Each 

agents gives his best candidate |O| points, the 

second best gets |O| − 1 points , etc. After all 

agents have voted, sum up, across all voters. 

The highest count wins.



Does Borda voting satisfy the Condorcet 

condition? 

Agent 

Preferences 

1 a ≻ b ≻ c ≻ d 

2 b ≻ c ≻ d ≻ a 

3 c ≻ d ≻ a ≻ b 

4 a ≻ b ≻ c ≻ d 

5 b ≻ c ≻ d ≻ a 

6 c ≻ d ≻ a ≻ b 

7 a ≻ b ≻ c ≻ d 

Borda count c wins: 20, b: 19, a: 18, d loses: 13 

Borda count 

without d 

a wins: 15, b: 14, c loses: 13 

Winner turns loser and loser turns winner.



Binary protocol: Pairwise comparison. Take any 

two candidates and determine the winner. The 

winner enters the next round, where it is 

compared with one of the remaining 

candidates. 

Which ordering should we use?



Figure 12: Four different orderings and four alternatives. 

Last ordering: 

d wins, but all agents prefer c over d.



Formal model for social choice 

We want to precisely define what a potential model for 

voting, given preferences over the set of candidates, could 

be. Given such a model, we want to investigate fair 

elections. 

Let A the set of agents, O the set of possible outcomes. 

(O could be A, a set of laws, or a set of candidates). 

Voting based on total orders Ord tot 

The voting of agent i is described by a binary relation 

≺ i ⊆ O × O, 

which we assume to be irreflexive, transitive and total: 

Ties are not allowed! We denote by Ord tot the set of all 

such binary relations.



Often, not all subsets of O are votable, only a subset 

V ⊆ 2 O \ {∅}. The simplest scenario is for 

V = {{o} : o ∈ O}. 

Each v ∈ V represents a possible “set of 

candidates”. The voting model then has to select 

some of the elements of v. 

Each agent votes independently of the others. But we 

also allow that only a subset is considered. Let 

therefore be 

|A| 

∏ 

U ⊆ 

Ord tot . 

i=1 

The set U represents the set of people (and their 

preferences over the candidates) participating at 

the election and casting their votes.



We are now defining what a selection process, 

an election mechanism really is. 

Definition 4.4 (Social choice: 2 versions) 

1 A social choice function is any function 

C ∗ : V × U → O; (v, 〈≺ 1 , . . . , ≺ |A| 〉) ↦→ o 

2 A social choice correspondence is any 

function 

W ∗ : V × U → 2 O ; (v, 〈≺ 1 , . . . , ≺ |A| 〉) ↦→ v ′



What are fair conditions for an election process? 

Note that our preferences are total. 

Definition 4.5 (Unanimity, Monotonicity) 

A social choice C ∗ function satisfies unanimity, if for any 

profile l = (≺ 1 , . . . , ≺ |A| ), v ∈ V and outcome o ∈ v: 

If for all 1 ≤ i ≤ |A| : o = max ≺ i 

then C ∗ (v, l) = o 

A social choice C ∗ function satisfies monotonicity, if for 

any profile l = (≺ 1 , . . . , ≺ |A| ), v ∈ V and outcome 

o = C ∗ (v, l) ∈ v: if l ′ == (≺ ≺ ′ 1, . . . ,≺ ≺ ′ |A|) is a different profile 

such that for all o ′ and i 

o ′ ≺ i o implies o ′ ≺ ′ io 

then C ∗ (v, l ′ ) = o.



Finally we call C ∗ dictatorial, if there exists an 

agent i such that 

for all l = (≺ 1 , . . . , ≺ |A| ) : 

C ∗ (v, l) = max 

≺ i 

v. 

Theorem 4.6 (May (1952)) 

If there are only two candidates, there is a social 

choice function which is not dictatorial but yet 

satisfies unanimity and monotonicity.



Theorem 4.7 (Muller-Satterthwaite (1977)) 

If there are at least 3 candidates, then any social 

choice function satisfying unanimity and 

monotonicity must be dictatorial. 

What about majority voting?



A variant of the above theorem is for 

correspondences. In that case, we require 

correspondences to satisfy the following two 

conditions (sometimes called prime directive): 

1 W ∗ (v, 〈≺ 1 , . . . , ≺ |A| 〉) ≠ ∅ and 

2 W ∗ (v, 〈≺ 1 , . . . , ≺ |A| 〉) ⊆ v. 

Such a function simply selects a subset of v: the 

elected members of the list v.



≺ ′ 

IIA: For all v ∈ V : if (∀i ∈ A : ≺ i | v = ≺ ′ i| v ) 

then 

W ∗ (v, 〈≺ 1 , . . . , ≺ |A| 〉) = W ∗ (v, 〈≺ ≺ ′ 1, . . . ,≺ ≺ ′ 

Unanimity: If o ∈ v and (∀i ∈ A : o ′ ≺ i o) then 

o ′ ∉ W ∗ (v, 〈≺ 1 , . . . , ≺ |A| 〉). 

Consistency: If {o, o ′ } ⊆ v and 

W ∗ (v, u) ∩ {o, o ′ } = {o} and {o, o ′ } ⊆ v ′ then 

o ′ ∉ W ∗ (v ′ , u) 

≺ ′ |A|〉)



Theorem 4.8 (Social choice correspondence) 

Any social choice correspondence satisfying 

independence of irrelevant alternatives, unanimity 

and consistency must be dictatorial.



We are now relaxing our assumptions 

considerably (and thereby strengthening 

Arrow’s theorem). 

Voting based on partial orders 

The voting of agent i is described by a binary 

relation 

≺ i ⊆ O × O, 

which we assume to be irreflexive, transitive 

and antisymmetric: Ties are now allowed! We 

denote by Ord par the set of all such binary 

relations.



Definition 4.9 (Social welfare function) 

A social welfare function is any function 

≺ ∗ 

f ∗ : U → Ord par ; (≺ 1 , . . . , ≺ |A| ) ↦→ ≺ ∗ 

For each V ⊆ 2 O \ {∅} the function f ∗ w.r.t. U induces a 

choice function C 〈≺1 ,...,≺ |A| 〉 as follows: 

{ V −→ V 

C 〈≺1 ,...,≺ |A| 〉 = def 

v ↦→ C 〈≺1 ,...,≺ |A| 〉(v) = max ≺ ∗ 

max ≺ ∗ | V 

to ≺ ∗ | V . 

≺ ∗ | V 

v is the set of all maximal elements in v according 

Each tuple u = (≺ 1 , . . . , ≺ |A| ) determines the election for 

all possible v ∈ V . 

v



What are desirable properties for f ∗ ? 

Pareto-Efficiency: for all o, o ′ ∈ O: (∀i ∈ A : o≺ i o ′ ) 

implies o≺ ≺ ∗ o ′ . 

Independence of Irrelevant Alternatives: for all 

o, o ′ ∈ O: 

(∀i ∈ A : o ≺ i o ′ iff o≺ ≺ ′ i o ′ ) ⇒ (o≺ ≺ ∗ o ′ iff o≺ ′ ∗ 

≺ ′∗ o ′ ) . 

Note that this implies in particular 

(∀i ∈ A : ≺ i | v = ≺ ′ i| v ) 

⇒ 

∀o, o ′ ∈ v, ∀v ′ ∈ V s.t. v ⊆ v ′ : (o≺ ≺ ∗ | v ′ o ′ iff o≺ ′ ∗ 

≺ ′∗ | v ′ o ′ ) 

Majority Vote 

The simple majority vote protocol does not satisfy the 

Independence of irrelevant alternatives.



We consider 7 voters (A = {w 1 , w 2 , . . . , w 7 }) and O = {a, b, c, d}, 

V = {{a, b, c, d}, {a, b, c}}. The columns in the following table 

represent two different preference orderings of the voters: one is 

given in black, the second in red. 

≺ 1 (≺ 1) ≺ 2 (≺ 2) ≺ 3 (≺ 3) ≺ 4 (≺ 4) ≺ 5 (≺ 5) ≺ 6 (≺ 6) ≺ 7 (≺ 7) 

a 1 (2) 1 (2) 1 (1) 1 (1) 2 (2) 2 (2) 2 (2) 

b 2 (3) 2 (3) 2 (2) 2 (2) 1 (1) 1 (1) 1 (1) 

c 3 (4) 3 (4) 3 (3) 3 (3) 3 (3) 3 (3) 3 (3) 

d 4 (1) 4 (1) 4 (4) 4 (4) 4 (4) 4 (4) 4 (4) 

≺ ∗ 

Let ≺ ∗ be the solution generated by the ≺ i and ≺ ∗ the solution 

generated by the ≺ i . Then we have for i = 1, . . . , 7: b ≺ i a iff b ≺ i a, 

but b≺ ≺ ∗ a and a≺ ≺ ∗ b. The latter holds because on the whole set O, for 

≺ ∗ a gets selected 4 times and b only 3 times, while for ≺ ∗ a gets 

selected only 2 times but b gets still selected 3 times. The former holds 

because we even have ≺ i | {a,b,c} = ≺ i | {a,b,c} . 

The introduction of the irrelevant (concerning the relative ordering of 

a and b) alternative d changes everything: the original majority of a is 

split and drops below one of the less preferred alternatives (b). 

≺ ∗



Before turning to Arrow’s theorem, we state the 

following surprising fact. 

Lemma 4.10 

Let o ∈ O and suppose every agent i puts o either 

on top (greatest element wrt. ≺ i ) or to the bottom 

(smallest element wrt. ≺ i ). 

Then o is either the greatest or the smallest element 

of ≺ ∗ .



Proof. 

Suppose not, i.e there is a a, b ∈ O (a ≠ o, b ≠ o) such that 

a≺ ≺ ∗ o≺ ≺ ∗ b and thus a≺ ≺ ∗ b. 

Where are a, b located in the rankings ≺ i ? 

≺ ′ 

We construct rankings ≺ ′ i by moving a above b (if not yet) 

without disturbing the preferences between o, a and 

between o, b. By IIA, this modification does not change 

the ranking between o, a and between o, b for ≺ ∗ , thus 

a≺ ≺ ∗ b still holds. 

≺ ′ 

But the new profile satisfies b≺ ≺ ′ ia for all agents i, therefore, 

by unanimity, b≺ ≺ ∗ a a contradiction.



Theorem 4.11 (Arrow (1951)) 

If the social welfare function f ∗ is (1) pareto efficient and (2) 

independent from irrelevant alternatives, then there always 

exists a dictator: for all U ⊆ ∏ |A| 

Ordpar 

∃i ∈ A : ∀o, o ′ ∈ O : o≺ i o ′ 

To be more precise: for all U ⊆ ∏ |A| 

i=1 Ord 

iff o≺ ≺ ∗ o ′ . 

Ord 

i=1 Ordpar 

∃i ∈ A : ∀〈≺ 1 , . . . , ≺ |A| 〉 ∈ U : 

∀o, o ′ ∈ O, o≺ i o ′ iff o f ∗ (〈≺ 1 , . . . , ≺ |A| 〉) o ′ .



Proof (of Arrows theorem). 

The proof is the third proof given by John 

Geanakoplos (1996) and based on the following 

Lemma 4.12 (Strict Neutrality) 

Consider two pairs of alternatives a, b and α, β. 

Suppose each voter strictly prefers a to b or b to 

a, i.e. for all i: a≺ ≺ i b or b≺ ≺ i a 

Suppose further that each voter has the same 

preference for α, β as she has for a, b. 

Then either a≺ ≺ ∗ b and α ≺ ∗ β or b≺ ≺ ∗ a and β ≺ ∗ α.



Proof (of the lemma). 

Because |A| ≥ 3, the pair (a, b) is distinct from 

(α, β). We assume wlog that b≼ ≼ ∗ a. We are now 

constructing a different profile 〈≺ ′ 1, . . . , ≺ ′ |A| 〉 

obtained as follows (for all i): 

If a ≠ α, then we change ≺ i by moving α just 

above a. 

If b ≠ β, then we change ≺ i by moving β just 

below b. 

This can be done by maintaining the old 

preferences between α and β (as preferences 

between a and b are strict).



≺ ′∗ 

By unanimity, we have a≺ ≺ ′∗ α (for α ≠ a) and 

β≺ ≺ ′∗ b (for β ≠ b). Using transitivity, we get 

β ≺ ′∗ α. By IIA, we also get β ≺ ∗ α (because 

α≺ ≺ ′ iβ iff α≺ i β). 

By independence of irrelevant alternatives, we 

can change the roles of (a, b) with (α, β) and get 

b≺ ≺ ∗ a (note that there is no equality sign ≼ ∗ ).



The proof of Arrows theorem is by considering 

two alternatives a, b and the profile where a≺ ≺ i b 

for all agents i. By unanimity, a≺ ≺ ∗ b. Note that 

this reasoning is true for all rankings with the 

same relative preference between a and b. 

We now consider a sequence of profiles 

〈≺ i 1, . . . , ≺ i |A| 

〉 (from i = 0, . . . , |A|, starting with 

the one described above (i = 0)), where in step 

i, we let all agents numbered ≤ i change its 

profile by moving a above b (leaving all other 

rankings untouched).



There must be one step, let’s call it n ∗ , where 

a≺ ≺ n∗ − 1 b but b≺ ≺ n∗ a (because of unanimity and 

strict neutrality). (Here we denote by ≺ n∗ − 1 the 

ordering obtained from 〈≺ n∗ − 1 

1 , . . . , ≺ n∗ − 1 

|A| 

〉 and, 

similarly, for ≺ n∗ ). 

Note again that this reasoning is true not just for 

one profile 〈≺ i 1, . . . , ≺ i |A| 

〉, but for all such 

profiles with the same relative ranking of a and 

b. 

We claim that n ∗ is a dictator. 

n ∗



Take any pair of alternatives α, β and assume 

wlog β ≺ n ∗ α. 

Take c ∉ {α, β} and consider the new profile 

〈≺ ′ 1, . . . , ≺ ′ |A| 

〉 obtained as follows from 

〈≺ n∗ 

1 , . . . , ≺ n∗ 

|A| 〉: 

for 1 ≤ i n ∗ : we put c on top of each ≺ i . 

for n ∗ : we put c in between α and β. 

n ∗ 

for n ∗ i ≤ |A|: we put c to the bottom of 

each ≺ i . 

We are changing the profile 〈≺ n∗ 

1 , . . . , ≺ n∗ 

|A| 〉 by 

moving c around in a very particular way.



We apply strict neutrality to the pair (c, β) and 

(b, a). Because both pairs have the same relative 

ranking in 〈≺ ′ 1, . . . , ≺ ′ |A| 

〉, we have β≺n∗ 

≺ n∗ c. 

We also apply strict neutrality to the pair (α, c) 

and (a, b). Because both pairs have the same 

relative ranking in 〈≺ ′ 1, . . . , ≺ ′ |A| 

〉, we have c≺n∗ 

≺ n∗ α. 

By transitivity: β≺ ≺ n∗ α.



Ways out (of Arrow's theorem): 

1 Choice function is not always defined. 

2 Independence of alternatives is dropped.



Coomb's method: Each voter ranks all candidates in linear 

order. If there is no candidate ranked first by a 

majority of all voters, the candidate which is 

ranked last (by a majority) is eliminated. The 

last remaining candidate wins. 

d'Hondt's method: Each voter cast his votes. Seats are 

V 

allocated according to the quotient (V the 

s+1 

number of votes received, s the number of 

seats already allocated). 

Proportional Approving voting: Each voter gives points 1, 

1 

, 1, . . . , 1 for his candidates (he can choose as 

2 3 n 

many or few as she likes). The winning 

candidates are those, where the sum of all 

points is maximal (across all voters).


4. Decision Making (2) 2. Tactical Voting 

4.2 Tactical Voting



Lying and manipulation 

What if agents vote tactically? I.e. they know the 

voting design and do not vote truthfully, but such 

that their preferred choice is elected after all? 

Is it possible to come up with an implementation of a 

social choice function (or correspondence) that can 

not be manipulated?

Suppose Ray hates playing basketball but knows that his 

fellows do like it. How can he vote to avoid ending up 

playing basketball? 



Example 4.13 (Babysitting, Shoham) 

You are babysitting 4 kids (Will, Liam, Vic and Ray). They 

can chose among (a: going to the video arcade, b: 

playing baseball, c: going for a leisurely car ride). The kids 

give their true preferences as follows: 

Will Liam Vic Ray 

1 b b a c 

2 a a c a 

3 c c b b 

You announce to use majority voting, breaking ties 

alphabetically.



Definition 4.14 (Mechanism) 

A mechanism wrt. a set of agents A and a set of outcomes 

O is a pair 〈Act, M〉 where 

1 Act = A 1 × . . . × A |A| , where A i is the set of actions 

available to agent i. 

2 M : Act → Π(O), where Π(O) is the set of all 

probability distributions over the set of outcomes. 

This is a very general definition: it allows arbitrary actions. 

What about our babysitter example?



What is the relation between a mechanism and a 

game? 

Let µ = 〈µ 1 , . . . µ i , . . . µ n 〉 where µ i : O −→ R is a real-valued 

utility (payoff) function for player i (see Definition 3.5). 

Mechanisms and games 

If µ is a utility profile and 〈Act, M〉 is a mechanism wrt. A 

and O, then 〈N, A, O, M, µ〉 is a |A|-person normal form 

game.



What is the difference between a social choice function 

and a mechanism? 

Social choice function is often considered to be 

truthful: it is based on the real preferences. 

A mechanism is an implementation (or not!) of a social 

choice function.



Definition 4.15 (Mechanism as an Implementation) 

Let A and O be fixed. Let a social choice function C ∗ be 

given. 

We say that a mechanism 〈Act, M〉 implements the 

function C ∗ in dominant strategies, if for all utility profiles 

µ the game 〈N, A, O, M, µ〉 has an equilibrium in dominant 

strategies and for any equilibrium 〈s ∗ 1, s ∗ 2, . . . , s ∗ n〉 we have: 

M(〈s ∗ 1, s ∗ 2, . . . , s ∗ n〉) = C ∗ (µ)



What about our example? Is there a mechanism 

implementing our social choice function in dominant 

strategies? 

Lying 

There are situations, where one does not want to reveal its 

true preferences. Lying might pay off: not only to get the 

desired result, but also to ensure that critical information is 

not disclosed.



Often, mechanisms are used that are much more 

restricted. 

Definition 4.16 (Direct Mechanism) 

A direct mechanism wrt. a set of agents A and a set of 

outcomes O is a mechanism 〈Act, M〉 with 

A i = {µ i : µ i are utility functions }. 

But: agents may lie and not reveal their true utilities. 

Truthful or strategy-proof mechanism 

A mechanism is truthful (later also called strategy-proof) 

in dominant strategies, if for any utility profile, in the 

resulting game it is a dominant strategy for each agent to 

announce its true utility function.



Theorem 4.17 (Revelation) 

If there exists an implementation of a social choice rule in 

dominant strategies, then there is also a direct and 

truthful mechanism implementing the same function.



Relaxing our assumptions 

We assume that the preferences of our agents satisfy 

irreflexivity, transitivity and antisymmetry: again, ties 

are allowed. For a preference ≺ we define 

top(≺) := {o ∈ O : there is no o ′ with o≺o ′ } 

We also define a ⊲⊳ ≺ b iff a and b are incomparable wrt. ≺.



We call a social correspondence strategy-proof, if it is 

best for each agent to choose its preferences truthfully: 

deviating does not pay off. 

Gibbard/Satterthwaite 

If a social correspondence is strategy-proof and each 

outcome is in principle votable, then there is a dictator. 

The original theorem was based on total orders. The 

stronger version here is due to Pini/Rossi/Venable/Walsh 

(AAMAS 2006).



What do we mean by each outcome is in principle 

votable? This condition is also called citizen sovereignty: 

Definition 4.18 (Citizen Sovereignty) 

A social correspondence 

W ∗ : Ord par → 2 O ; 〈≺ 1 , . . . , ≺ |A| 〉 ↦→ v ′ 

satisfies citizen sovereignty (or is onto) if for any v ′ ⊂ O 

there is a 〈≺ 1 , . . . , ≺ |A| 〉 ∈ Ord par such that 

W ∗ (〈≺ 1 , . . . , ≺ |A| 〉) = v ′



Intuitively, strategy-proof means that the social 

correspondence is non-manipulable: there is no tactic 

voting (disclosing his true preferences) to ensure the 

desired result.



Definition 4.19 (Strategy-proof social correspondence) 

A social correspondence 

W ∗ : Ord par → 2 O ; 〈≺ 1 , . . . , ≺ |A| 〉 ↦→ v ′ 

is called strategy-proof, if the following two conditions 

hold for every agent i, for any two profiles 

p = 〈≺ 1 , . . . , ≺ i , . . . , ≺ |A| 〉 and p ′ = 〈≺ 1 , . . . , ≺ ′ i, . . . , ≺ |A| 〉 

that only differ in agent i’s preference: 

1 ∀a ∈ W ∗ (p) \ W ∗ (p ′ ), ∀b ∈ W ∗ (p ′ ): 

if a ⊲⊳ ≺i b then a ⊲⊳ ≺ ′ i b or a≺′ ib, and 

if a≺ ib then a≺ ′ ib. 

2 ∀a ∈ W ∗ (p) \ W ∗ (p ′ ), ∃b ∈ W ∗ (p ′ ) such that: 

if b≺ ia then a ⊲⊳ ≺ ′ i b or a≺′ ib, and 

if a ⊲⊳ ≺i b then a≺ ′ ib.



Here are two examples for social correspondences. 

1 Let W ∗ 1 : Ord par → 2 O ; 

|A| 

⋃ 

W ∗ 1 (〈≺ 1 , . . . , ≺ |A| 〉) := top(≺ i ) 

i=1 

2 Let W ∗ 2 : Ord par → 2 O ; 

W ∗ 2 (〈≺ 1 , . . . , ≺ |A| 〉) := top(≺ pareto ) 

where a≺ pareto b iff a≺ i b for all 1 ≤ i ≤ |A|, otherwise 

a, b are incomparable. 

Exercise 

Are these correspondences strategy-proof? Prove or give 

counterexamples.



To formulate our version of the Gibbard-Satterthwaite 

Theorem, we need to introduce the notion of a weak 

dictator. 

Definition 4.20 (Weak dictator) 

An agent i is called a weak dictator for the social 

correspondence W ∗ , if for all profiles 〈≺ 1 , . . . , ≺ |A| 〉 the 

following holds: 

W ∗ (〈≺ 1 , . . . , ≺ |A| 〉) ∩ top(≺ i ) ≠ ∅



Exercise 

A strong dictator is one where for all profiles 

〈≺ 1 , . . . , ≺ |A| 〉 the following holds: 

W ∗ (〈≺ 1 , . . . , ≺ |A| 〉) = top(≺ i ). 

Can you think of a definition in between strong and weak 

dictator?



Theorem 4.21 (Gibbard/Satterthwaite, partial order) 

Let |O| ≥ 3, |A| ≥ 2 and let W ∗ be any social 

correspondence based on partial orderings 

W ∗ : Ord par → 2 O 

If W ∗ is strategy-proof and satisfies citizen-sovereignty (W ∗ 

is onto), then there always exists a weak dictator.


4. Decision Making (2) 3. Auctions 

4.3 Auctions



While voting binds all agents, auctions are 

always deals between 2. 

Types of auctions: 

first-price open cry: (English auction), as usual. 

first-price sealed bid: one bids without 

knowing the other bids. 

dutch auction: (descending auction) the seller 

lowers the price until it is taken. 

second-price sealed bid: (Vickrey auction) 

Highest bidder wins, but the price is 

the second highest bid!



Three different auction settings: 

private value: Value depends only on the 

bidder (cake). 

common value: Value depends only on other 

bidders (treasury bills). 

correlated value: Partly on own’s values, partly 

on others.



What is the best strategy in Vickrey auctions? 

Theorem 4.22 (Private-value Vickrey auctions) 

The dominant strategy of a bidder in a 

Private-value Vickrey auction is to bid the true 

valuation. 

Therefore it is equivalent to English auctions. 

Vickrey auctions are used to 

allocate computation resources in operating 

systems, 

allocate bandwidth in computer networks, 

control building heating.



Are first-price auctions better for the 

auctioneer than second-prize auctions? 

Theorem 4.23 (Expected Revenue) 

All 4 types of protocols produce the same expected 

revenue to the auctioneer (assuming (1) private 

value auctions, (2) values are independently 

distributed and (3) bidders are risk-neutral).



Why are second price auctions not so popular 

among humans? 

1 Lying auctioneer. 

2 When the results are published, 

subcontractors know the true valuations 

and what they saved. So they might want 

to share the profit.



Inefficient Allocation 

Auctioning heterogenous, interdependent 

items. 

Example 4.24 (Task Allocation) 

Two delivery tasks t 1 , t 2 . Two agents.



The global optimal solution is not reached by 

auctioning independently and truthful 

bidding. 

t 1 goes to agent 2 (for a price of 2) and t 2 goes to 

agent 1 (for a price of 1.5). 

Even if agent 2 considers (when bidding for t 2 ) 

that she already got t 1 (so she bids 

cost({t 1 , t 2 }) − cost({t 1 }) = 2.5 − 1.5 = 1) she 

will get it only with a probability of 0.5.



What about full lookahead? blackboard. 

Therefore: 

It pays off for agent 1 to bid more for t 1 (up to 

1.5 more than truthful bidding). 

It does not pay off for agent 2, because agent 

2 does not make a profit at t 2 anyway. 

Agent 1 bids 0.5 for t 1 (instead of 2), agent 

2 bids 1.5. Therefore agent 1 gets it for 1.5. 

Agent 1 also gets t 2 for 1.5.



Lying at Vickrey 

Does it make sense to countersperculate at 

private value Vickrey auctions? 

Vickrey auctions were invented to avoid 

counterspeculation. 

But what if the private value for a bidder is 

uncertain? 

The bidder might be able to determine it, but 

she needs to invest c.



Example 4.25 (Incentive to counterspeculate) 

Suppose bidder 1 does not know the (private-) 

value v 1 of the item to be auctioned. To 

determine it, she needs to invest cost. We also 

assume that v 1 is uniformly distributed: 

v 1 ∈ [0, 1]. 

For bidder 2, the private value v 2 of the item is 

fixed: 0 ≤ v 2 < 1 2 

. So his dominant strategy is to 

bid v 2 . 

Should bidder 1 try to invest cost to 

determine his private value? How does this 

depend on knowing v 2 ?



blackboard. 

Answer: Bidder 1 should invest cost if and only if 

v 2 v2 ≥ (2cost) 1 2 .


4. Decision Making (2) 4. Market Mechanisms 

4.4 Market Mechanisms



A theory for efficiently allocating goods and 

resources among agents, based on market 

prices. 

Goods: Given n > 0 goods g (coffee, mirror sites, 

parameters of an airplane design). We assume 

g ≠ g ′ but within g everything is 

indistinguishable. 

Prices: The market has prices p = [p 1 , p 2 , ..., p n ] ∈ R n : 

p i is the price of the good i.



Consumers: Consumer i has µ i (x) encoding its 

preferences over consumption bundles 

x i = [x i1 , ..., x in ] t , where x ig ∈ R + is consumer 

i’s allocation of good g. Each consumer also 

has an initial endowment e i = [e i1 , ..., e in ] t ∈ R. 

Producers: Use some commodities to produce others: 

y j = [y j1 , ..., y jn ] t , where y jg ∈ R is the amount 

of good g that producer j produces. Y j is a set 

of such vectors y. 

Profit of producer j: p × y j , where y j ∈ Y j . 

Profits: The profits are divided among the consumers 

(given predetermined proportions ∆ ij ): ∆ ij is 

the fraction of producer j that consumer i 

owns (stocks). Profits are divided according to 

∆ ij . 

µ i



Definition 4.26 (General Equilibrium) 

(p ∗ , x ∗ , y ∗ ) is in general equilibrium, if the 

following holds: 

I. The markets are in equilibrium: 

∑ 

x ∗ i = ∑ e i + ∑ yj 

∗ 

i 

i j 

II. Consumer i maximises preferences 

according the prices 

x ∗ i = arg max {xi ∈R R n + | cond i } µ i (x i ) 

where cond i stands for 

p ∗ × x i ≤ p ∗ × e i + ∑ j ∆ ijp ∗ × y j .



III. Producer j maximises profit wrt. the 

market 

yi ∗ = arg max {yj ∈Y Y j }p ∗ × y j



Theorem 4.27 (Pareto Efficiency) 

Each general equilibrium is pareto efficient. 

Theorem 4.28 (Coalition Stability) 

Each general equilibrium with no producers is 

coalition-stable: no subgroup can increase their 

utilities by deviating from the equilibrium and 

building their own market.



Theorem 4.29 (Existence of an Equilibrium) 

Let the sets Y j 

µ i 

Y j be closed, convex and bounded 

above. Let µ i be continuous, strictly convex and 

strongly monotone. Assume further that at least 

one bundle x i is producible with only positive 

entries x il . 

Under these assumptions a general equilibrium 

exists.



Meaning of the assumptions 

Formal definitions: blackboard. 

Convexity of Y j : Economies of scale in 

production do not satisfy it. 

Continuity of the µ i : Not satisfied in bandwidth 

allocation for video conferences. 

Strictly convex: Not satisfied if preference 

increases when she gets more of this 

good (drugs, alcohol, dulce de leche).



In general, there exist more than one 

equilibrium. 

Theorem 4.30 (Uniqueness) 

If the society-wide demand for each good is 

non-decreasing in the prices of the other goods, 

then a unique equilibrium exists. 

Positive example: increasing price of meat 

forces people to eat potatoes (pasta). 

Negative example: increasing price of bread 

implies that the butter consumption decreases.



4.5 References



[Geana96] Geanakoplos, J. (1996, April). 

Three brief proofs of arrow’s impossibility theorem. 

Cowles Foundation Discussion Papers 1123R3, 

Cowles Foundation, Yale University. 

available at 

http://ideas.repec.org/p/cwl/cwldpp/1123r3.html. 

[sand-99] Sandholm, T. W. (1997). 

Chapter 5: Distributed Rational Decision Making. 

In G. Weiss (Ed.), Multiagent Systems. Cambridge, 

MA: MIT Press.


5. Nets and coalitions 

Chapter 5. Nets and coalitions 

Nets and coalitions 

5.1 General Contract Nets 

5.2 Four Types of Nets 

5.3 Coalition Formation 

5.4 Payoff Division 

5.5 References


5. Nets and coalitions 

We define the task allocation problem in precise terms 

and present different types of contracts between agents. 

We show that no IR-contract leads to the global 

optimum, even if all types are allowed. 

We then consider abstract coalition formation for 

characteristic function games (CFG) and introduce 

several algorithms for searching the coalition structure 

graph. We introduce the core of a CFG and end with the 

Shapley value for payoff division. 

This chapter is based mainly on work done by Tuomas 

Sandholm.


5. Nets and coalitions 1. General Contract Nets 

5.1 General Contract Nets



How to distribute tasks? 

Global Market Mechanisms. Implementations 

use a single centralised mediator. 

Announce, bid, award -cycle. Distributed 

Negotiation. 

We need the following: 

1 Define a task allocation problem in precise 

terms. 

2 Define a formal model for making bidding 

and awarding decisions.



Definition 5.1 (Task-Allocation Problem) 

A task allocation problem is given by 

1 a set of tasks T , 

2 a set of agents A, 

3 a cost function cost i : 2 T −→ R ∪ {∞} 

(stating the costs that agent i incurs by 

handling some tasks), and 

4 the initial allocation of tasks 

〈T init 

1 , . . . , T init 

|A| 〉, 

where T = ⋃ i∈A T init 

i 

, T init 

i 

∩ T init 

j 

= ∅ for i ≠ j.



Definition 5.2 (Accepting Contracts, Allocating Tasks) 

A contractee q accepts a contract if it gets paid more than 

the marginal cost of handling the tasks of the contract 

MC add (T contract |T q ) = def cost q (T contract ∪ T q ) 

−cost q (T q ). 

A contractor r is willing to allocate the tasks T contract from 

its current task set T r to a contractee, if it has to pay less 

than it saves by handling them itself: 

MC remove (T contract |T r ) = def cost r (T r ) 

−cost r (T r − T contract ).



Definition 5.3 (The Protocol) 

Agents suggest contracts to others and make 

their decisions according to the above MC add 

and MC remove sets. 

Agents can be both contractors and contractees. 

Tasks can be recontracted. 

The protocol is domain independent. 

Can only improve at each step: Hill-climbing 

in the space of all task allocations. 

Maximum is social welfare: − ∑ i∈A cost i(T i ). 

Anytime algorithm!



Figure 13: TSP as Task Allocation Problem.


5. Nets and coalitions 2. Four Types of Nets 

5.2 Four Types of Nets



Definition 5.4 (O-, C-, S-, M- Contracts) 

A contract is called of type 

O (Original): only one task is moved: 〈T i,j , ρ i,j 〉, 

|T i,j | = 1. 

C (Cluster): a set of tasks is moved: 〈T i,j , ρ i,j 〉, 

|T i,j | 1. 

S (Swap): if a pair of agents swaps a pair of tasks: 

〈T i,j , T j,i , ρ i,j , ρ j,i 〉, |T i,j | = |T j,i | = 1. 

M (Multi): if more than two agents are involved 

in an atomic exchange of tasks: 〈T, ρ〉, 

both are |A| × |A| matrices. At least 3 

elements are non-empty, |T i,j | ≤ 1.



Lemma 5.5 (O-Path reaches Global Optimum) 

A path of O-contracts always exists from any task allocation 

to the optimal one. The length of the shortest such path is at 

most |T |. 

Does that solve our problem? (Lookahead) 

Definition 5.6 (Task Allocation Graph) 

The task allocation graph has as vertices all possible task 

allocations (i.e. |A| |T | ) and directed edges from one vertex 

to another if there is a possible contract leading from one 

to the other. 

For O-contracts, searching the graph using 

breadth-first search takes how much time?



Lemma 5.7 (Allocation graph for O contracts is sparse) 

We assume that there are at least 2 agents and 2 

tasks. We consider the task allocation graph for O 

contracts. Then the fraction 

number of edges 

number of edges in the fully connected graph 

converges to 0 both for |T | → ∞ as well as 

|A| → ∞



Lemma 5.8 (IR-Path may not reach Global Optimum) 

There are instances where no path of IR O 

contracts exists from the initial allocation to the 

optimal one. The length of the shortest IR path (if 

it exists) may be greater than |T |. 

But the shortest IR path is never greater than 

|A| |T | − (|A| − 1)|T |. 

Problem: local maxima. 

A contract may be individually rational but the 

task allocation is not globally optimal.



Lemma 5.9 (No Path) 

There are instances where no path of C-contracts 

(IR or not) exists from the initial allocation to the 

optimal one.



A task allocation is called k-optimal if no 

beneficial C-contract with clusters of k tasks can 

be made between any two agents. 

Let m n. Does 

m optimality imply n optimality; 

n optimality imply m optimality?




There are instances where no path of 

S-contracts (IR or not) exists from the initial 

allocation to the optimal one. 


There are instances where no path of 

M-contracts (IR or not) exists from the initial 

allocation to the optimal one.



Lemma 5.12 (Reachable allocations for S contracts) 

We assume that there are at least 2 agents and 2 tasks. We 

consider the task allocation graph for S contracts. Given any 

vertex v the fraction 

number of vertices reachable from v 

number of all vertices 

converges to 0 both for |T | → ∞ as well as for |A| → ∞. 

Proof. 

S contracts preserve the number of tasks of each agent. So 

any vertex has certain allocations t 1 , . . . , t |A| for the agents. 

How many allocations determined by this sequence are 

there? There are exactly |T |! many. This is to be divided by 

|A| |T | .



Theorem 5.13 (Each Avoids Local Optima of the Others) 

For each of the 4 types there exist task allocations 

where no IR contract with the remaining 3 

types is possible, but an IR contract with the 

fourth type is.



Proof. 

Consider O contracts (as fourth type). One task 

and 2 agents: T 1 = {t 1 }, T 2 = ∅. 

c 1 (∅) = 0, c 1 ({t 1 }) = 2, c 2 (∅) = 0, c 2 ({t 1 }) = 1. The 

O contract of moving t 1 would decrease global 

cost by 1. No C-, S-, or M-contract is possible. 

To show the same for C or S contracts, two 

agents and two tasks suffice. 

For M-contracts 3 agents and 3 tasks are 

needed.



Theorem 5.14 (O-, C-, S-, M- Global Optima) 

There are instances of the task allocation problem 

where no IR sequence from the initial task 

allocation to the optimal one exists using O-, 

C-, S-, and M- contracts. 

Proof. 

Construct cost functions such that the deal 

where agent 1 gives one task to agent 2 and agent 

2 gives 2 tasks to agent 1 is the only one 

increasing welfare. This deal is not possible with 

O-, C-, S-, or M-contracts.



Corollary 5.15 (O-, C-, S-, M- Global Optima) 

There are instances of the task allocation problem 

where no IR sequence from the initial task 

allocation to the optimal one exists using any pair 

or triple of O-, C-, S-, or M- contracts.



Definition 5.16 (OCSM Nets) 

A OCSM-contract is a pair 〈T , ρ〉 of |A| × |A| 

matrices. An element T i,j stands for the set of 

tasks that agent i gives to agent j. ρ i,j is the 

amount that i pays to j. 

How many OCSM contracts are there? 

How much space is needed to represent 

one?



Theorem 5.17 (OCSM-Nets Suffice) 

Let |A| and |T | be finite. If a protocol allows 

OCSM-contracts, any hill-climbing algorithm 

finds the globally optimal task allocation in a 

finite number of steps without backtracking.



Proof. 

An OCSM contract can move from any task 

allocation to any other (in one step). So moving 

to the optimum is IR. Any hill-climbing algorithm 

strictly improves welfare. As there are only 

finitely many allocations, the theorem 

follows.



Theorem 5.18 (OCSM-Nets are Necessary) 

If a protocol does not allow a certain OCSM 

contract, then there are instances of the task 

allocation problem where no IR-sequence exists 

from the initial allocation to the optimal one.



Proof. 

If one OCSM contract is not allowed, then the 

task allocation graph contains two vertices 

without an edge. We let the initial and the 

optimal allocation be these two vertices. We 

construct it in such a way, that all adjacent 

vertices to the vertex with the initial allocation 

have lower social welfare. So there is no way out 

of the initial allocation.



[AnderssonS98] consider the multiagent version 

of the TSP problem and apply different sorts of 

contracts to it. 

Several salesmen visit several cities on the unit 

square. Each city must be visited by exactly one 

salesman. They all have to return home and 

want to minimise their travel costs. 

salesman = agent, task = city



Experiments with up to 8 agents and 8 tasks. Initial 

allocation randomly chosen. 

Ratio bound (welfare of obtained local optimum 

divided by global optimum) and mean ratio bound 

(over 1000 TSP instances) were computed (all for fixed 

number of agents and tasks). Global optimum was 

computed using IDA ∗ . 

A protocol consisting of 5 intervals is considered. In 

each interval, a particular contract type was 

considered (and all possible contracts with that type): 

1024 different sequences. 

In each interval a particular order for the agents and 

the tasks is used. First all contracts involving agent 1. 

Then all involving agent 2 etc. Thus it makes sense to 

have subsequent intervals with the same contract type.



Table 1: The best Contract sequences.



Table 2: Best, average and worst Contracts.



Figure 14: Ratio bounds (second graph is zoomed in).



Figure 15: Ratio bounds for the 4 best and single type sequences.



Figure 16: Contracts performed and tried.



1 Best protocol is 3.1% off the optimum. 

2 12 best protocols are between 3% and 4% off 

the optimum. 

3 Single type contracts do not behave well. 

4 More contracts are tried and performed in 

the second interval (compared with the first). 

5 The more mixture between contract types, 

the higher the social welfare.


5. Nets and coalitions 3. Coalition Formation 

5.3 Coalition Formation



Idea: Consider a protocol (to build 

coalitions) as a game and consider 

Nash-equilibrium. 

Problem: Nash-Eq is too weak! 

Definition 5.19 (Strong Nash Equilibrium) 

A profile is in strong Nash-Eq if there is no 

subgroup that can deviate by changing 

strategies jointly in a manner that increases the 

payoff of all its members, given that 

nonmembers stick to their original choice. 

This is often too strong and does not exist.



Definition 5.20 (Characteristic Function Game (CFG)) 

In a CFG the value of a coalition S is given by a 

characteristic function v S . 

Thus it is independent of the nonmembers. 

But 

1 Positive Externalities: Overlapping goals. 

Nonmembers perform actions and move the 

world closer to the coalition’s goal state. 

2 Negative Externalities: Shared resources. 

Nonmembers may use the resources so that 

not enough is left.



Definition 5.21 (Coalition Formation in CFG’s) 

Coalition Formation in CFG’s consists of: 

Forming CS: formation of coalitions such that 

within each coalition agents coordinate 

their activities. This partitioning is 

called coalition structure CS. 

Solving Optimisation Problem: For each 

coalition the tasks and resources of the 

agents have to be pooled. Maximise 

monetary value. 

Payoff Division: Divide the value of the 

generated solution among agents.



Definition 5.22 (Super-additive Games) 

A game is called super-additive, if 

v S∪T ≥ v S + v T , 

where S, T ⊆ A and S ∩ T = ∅. 

Lemma 5.23 

Coalition formation for super-additive games is 

trivial. 

Conjecture 

All games are super-additive.



The conjecture is wrong, because the coalition 

process is not for free: 

communication costs, penalties, time limits. 

Definition 5.24 (Sub-additive Games) 

A game is called sub-additive, if 

v S∪T v S + v T , 

where S, T ⊆ A and S ∩ T = ∅. 

Coalition formation for sub-additive games is 

trivial.



Example 5.25 (Treasure of Sierra Madre Game) 

There are n people finding a treasure of many 

gold pieces in the Sierra Madre. Each piece can 

be carried by two people, not by a single person. 

Example 5.26 (3-player majority Game) 

There are 3 people that need to agree on 

something. If they all agree, there is a payoff of 

1. If just 2 agree, they get a payoff of α 

(0 ≤ α ≤ 1). The third player gets nothing. 

How do the v S look like?



Definition 5.27 (Payoff Vector, Core of a game) 

A payoff vector for a CFG is a tuple 〈x 1 , . . . , x n 〉 

such that x i ≥ 0 and ∑ n 

i=1 x i = v A . 

The core of a CFG is the set of all payoff vectors 

such that the following holds: 

∀S ⊆ A : ∑ i∈S 

x i ≥ v S 

Thus the core corresponds to the strong Nash 

equilibrium mentioned in the beginning. 

What about the core in the above two 

examples?



Maximise the social welfare of the agents A by 

finding a coalition structure 

CS ∗ = arg max CS 

where 

Val(CS 

CS) := ∑ v S . 

S∈CS 

CS), 

CS∈part(A) Val(CS 

How many coalition structures are there?



A lot: Ω(|A| |A| 

2 ). Enumerating is feasible for |A| < 15. 

Figure 17: Number of Coalition (Structures).



How can we approximate Val(CS 

CS)? 

Choose set N (a subset of all partitions of A) and 

pick the best coalition seen so far: 

CSN ∗ = arg max CS∈N Val(CS 

CS).



Figure 18: Coalition Structure Graph.



We want our approximation as good as possible. 

We want to find a small k and a small N such 

that 

Val(CS 

∗ ) 

Val(CS 

∗ N ) ≤ k. 

k is the bound (best value would be 1) and N is 

the part of the graph that we have to search 

exhaustively.



We consider 3 search algorithms: 

MERGE: Breadth-first search from the top. 

SPLIT: Breadth first from the bottom. 

Coalition-Structure-Search (CSS1): First the 

bottom 2 levels are searched, then a 

breadth-first search from the top. 

MERGE might not even get a bound, without looking at 

all coalitions. 

SPLIT gets a good bound (k = |A|) after searching the 

bottom 2 levels (see below). But then it can get slow. 

CSS1 combines the good features of MERGE and SPLIT.



Theorem 5.28 (Minimal Search to get a bound) 

To bound k, it suffices to search the lowest two 

levels of the CS-graph. Using this search, the 

bound k = |A| can be taken. This bound is tight 

and the number of nodes searched is 2 |A|−1 . 

No other search algorithm can establish the 

bound k while searching through less than 

2 |A|−1 nodes.



Proof. 

There are at most |A| coalitions included in CS ∗ . 

Thus 

Val(CS 

∗ ) ≤ |A| max 

S 

v S ≤ |A| max Val(CS 

CS) = |A|Val(CS 

N ∗ ) 

CS∈N 

Number of coalitions at the second lowest level: 

2 A − 2. 

Number of coalition structures at the second 

lowest level: 1 2 (2A − 2) = 2 A−1 − 1. 

Thus the number of nodes visited is: 2 A−1 .



What exactly does the last theorem mean? Let 

n min be the smallest size of N such that a bound 

k can be established. 

Positive result: 

|A| −→ ∞. 

n min 

partitions of A 

approaches 0 for 

Negative result: To determine a bound k, one 

needs to search through exponentially 

many coalition structures.



Algorithm (CS 

CS-Search-1) 

The algorithm comes in 3 steps: 

1 Search the bottom two levels of the 

CS-graph. 

2 Do a breadth-first search from the top of the 

graph. 

3 Return the CSwith the highest value. 

This is an anytime algorithm.



Theorem 5.29 (CS 

CS-Search-1 up to Layer l) 

With the algorithm CS-Search-1 we get the 

following bound for k after searching through 

layer l: 

{ 

⌈ |A| 

h 

⌉ if |A| ≡ h − 1 mod h and |A| ≡ l mod 2, 

⌊ |A| 

h ⌋ otherwise. 

where h = def ⌊ |A|−l 

2 

⌋ + 2. 

Thus, for l = |A| (check the top node), k 

switches from |A| to |A| 

2 .



Experiments 

6-10 agents, values were assigned to each 

coalition using the following alternatives 

1 values were uniformly distributed between 0 

and 1; 

2 values were uniformly distributed between 0 

and |A|; 

3 values were super additive; 

4 values were sub additive.



Figure 19: MERGE.



Figure 20: SPLIT.



Figure 21: CS-Search-1.



Figure 22: Subadditive Values.



Figure 23: Superadditive Values.



Figure 24: Coalition values chosen uniformly from [0, 1].



Figure 25: Coalition values chosen uniformly from [0, |S|].



Figure 26: Comparing CS-Search-1 with SPLIT.



1 Is CS-Search-1 the best anytime algorithm? 

2 The search for best k for n ′ > n is perhaps not 

the same search to get best k for n. 

3 CS-Search-1 does not use any information 

while searching. Perhaps k can be made 

smaller by not only considering Val(CS 

CS) but 

also v S in the searched CS ′ .


5. Nets and coalitions 4. Payoff Division 

5.4 Payoff Division



We assume now that games are superadditive, 

therefore the grand coalition will be formed. 

The payoff division should be fair between the 

agents, otherwise they leave the coalition. 

Definition 5.30 (Dummies, Interchangeable) 

Agent i is called a dummy, if 

for all coalitions S with i ∉ S: v S∪{i} − v S = v {i} . 

Agents i and j are called interchangeable, if 

for all coalitions S with i ∈ S and j ∉ S: v S\{i}∪{j} = v S



Three axioms: 

Symmetry: If i and j are interchangeable, then 

x i = x j . 

Dummies: For all dummies i: x i = v {i} . 

Additivity: For any two games v, w: 

x v⊕w 

i 

= x v i + x w i , 

where v ⊕ w denotes the game defined 

by (v v ⊕ w) S = v S + w S .



Theorem 5.31 (Shapley-Value) 

There is only one payoff division satisfying the above three 

axioms. It is called the Shapley value of agent i and is 

defined by 

x i = 1 

|A|! 

∑ 

(|A| − |S| − 1)!|S|!(v S∪{i} − v S ) 

S⊆A 

(|A| − S)! is the number of all possible joining orders of the 

agents (to form a coalition). 

(v S∪{i} − v S ) is i’s marginal contribution when added to set S. 

There are |S|! ways for S to be built before i’s joining. There are 

(|A| − |S| − 1)! ways for the remaining agents to form S (after i). 

The Shapley value sums up the marginal contributions of agent i 

averaged over all joining orders. 

An expected gain can be computed by taking a random joining 

order and computing the Shapley value.


5. Nets and coalitions 5. References 

5.5 References



[AnderssonS98] Andersson, M. and T. Sandholm (1998). 

Sequencing of contract types for anytime task 

reallocation. 

In AMET, pp. 54–69. 

[AnderssonS98] Andersson, M. and T. Sandholm (2000). 

Contract type sequencing for reallocative 

negotiation. 

In ICDCS, pp. 154–160.



[SandholmLAST98] Sandholm, T., K. Larson, 

M. Andersson, O. Shehory, and F. Tohmé (1998). 

Anytime coalition structure generation with worst 

case guarantees. 

In AAAI/IAAI, pp. 46–53. 

[SandholmLAST99] Sandholm, T., K. Larson, 

M. Andersson, O. Shehory, and F. Tohmé (1999). 

Coalition structure generation with worst case 

guarantees. 

Artif. Intell. 111(1-2), 209–238.


6. Propositional Logic 

Chapter 6. Propositional Logic 

Propositional Logic 

6.1 Why Logic? 

6.2 Sentential Logic 

6.3 Sudoku 

6.4 Calculi for SL 

6.5 Wumpus in SL


6. Propositional Logic 

We make the case for using formal logic as a 

representation formalism for reasoning about the 

world. 

We present classical propositional logic to introduce 

the standard language and formalism. 

The semantics of propositional logic is the basis of all 

other logics. It can be used to model many things: we 

show its versatility by modeling Sudoku-puzzles. 

We also describe two calculi for this logic and consider 

briefly correctness and completeness issues. 

We illustrate how to use propositional logic with the 

well-known Wumpus world.


6. Propositional Logic 1. Why Logic? 

6.1 Why Logic?



Why logic at all? 

framework for thinking about systems 

makes one realise many assumptions 

. . . and then we can: 

investigate them, accept or reject them 

relax some of them and still use a part of the 

formal and conceptual machinery



Why logic at all? 

verification: check specification against 

implementation 

executable specification 

planning as model checking



Symbolic AI: Symbolic representation, 

e.g. sentential or first order logic. 

Agent as a theorem prover. 

Traditional: Theory about agents. 

Implementation as stepwise process 

(Software Engineering) over many 

abstractions. 

Symbolic AI: View the theory itself as 

executable specification. 

Internal state: Knowledge Base (KB), 

often simply called D (database).



Our running example for propositional logic is The 

Wumpus-World.

¤ 

¢ 

£ 

¡ 

¡ 

¢ 

¤ 

£ 

¡ 

£ 

¤ 


¤ 

£ 

¡ 

¢ 

¢ 

£ 

¡ 

¡ 

¢ 

¡ 


1,4 2,4 3,4 4,4 

1,3 2,3 3,3 4,3 

1,2 2,2 3,2 4,2 

OK 

1,1 2,1 3,1 4,1 

A 

OK 

OK 

¥ 

(a) 

1,4 2,4 3,4 4,4 

1,3 2,3 3,3 4,3 

W! 

1,2 

A 

2,2 3,2 4,2 

S 

OK OK 

1,1 2,1 3,1 4,1 

B P! 

£ 

V V 

OK OK 

¥ 

(a) 

A = Agent 

B = Breeze 

G = Glitter, Gold 

OK = Safe square 

P = Pit 

S = Stench 

V = Visited 

W = Wumpus 

¢ 

A = Agent 

B = Breeze 



P = Pit 

S = Stench 

V = Visited 

W = Wumpus 

1,4 2,4 3,4 4,4 

1,3 2,3 3,3 4,3 

1,2 2,2 3,2 4,2 

P? 

OK 

1,1 2,1 3,1 4,1 

A P? 

¤ 

V B 

OK OK 

¥ 

(b) 

1,4 2,4 3,4 4,4 

P? 

1,3 

W! 

2,3 

A 

3,3 

P? 

4,3 

S G 

B 

1,2 2,2 3,2 4,2 

S 

£ 

V V 

OK OK 

1,1 2,1 3,1 4,1 

B P! 

£ 

V V 

OK OK 

(b)


6. Propositional Logic 2. Sentential Logic 

6.2 Sentential Logic



Definition 6.1 (Sent. Logic L SL , Language L ⊆ L SL ) 

The language L SL of propositional (or sentential) logic 

consists of 

□ and ⊤: falsum and verum, 

p, q, r, x 1 , x 2 , . . . x n , . . .: a countable set AT of 

SL-constants, 

¬, ∧, ∨, →: the sentential connectives (¬ is unary, all 

others are binary operators), 

(, ): the parantheses to help readability. 

Generally we consider only a finite set of SL-constants. 

They define a language L ⊆ L SL . The set of L-formulae 

F ml L is defined inductively.



Definition 6.2 (Semantics, Valuation, Model) 

A valuation v for a language L ⊆ L SL is a mapping from 

the set of SL-constants defined through L and {⊤, □} into 

the set {true, false} with v(□) = false, v(⊤) = true. 

Each valuation v can be uniquely extended to a 

function ¯v : F ml L → {true, false} so that: 

{ 

true, if ¯v(p) = false, 

¯v(¬p) = 

false, if ¯v(p) = true. 

{ 

true, if ¯v(ϕ) = true and ¯v(γ) = true, 

¯v(ϕ ∧ γ) = 

false, else 

{ 

true, if ¯v(ϕ) = true or ¯v(γ) = true, 

¯v(ϕ ∨ γ) = 

false, else



Definition (continued) 

¯v(ϕ → γ) = 

{ 

true, if ¯v(ϕ) = false or (¯v(ϕ) = true and ¯v(γ) = true), 

false, else 

Thus each valuation v uniquely defines a ¯v. We call ¯v 

L-structure or model A v . From now in we will speak of 

models and valuations. 

A model determines for each formula if it is true or false. 

The process of mapping a set of L-formulae into 

{true, false} is called semantics.



Definition 6.3 (Validity of a Formula, Tautology) 

1 A formula ϕ ∈ F ml L holds under the 

valuation v if ¯v(ϕ) = true. We also write 

¯v |= ϕ or simply v |= ϕ. 

2 A theory is a set of formulae: T ⊆ F ml L . v 

satisfies T if ¯v(ϕ) = true for all ϕ ⊆ T . We 

write v |= T . 

3 A L-formula ϕ is called L-tautology if for all 

possible valuations (models) v in L v |= ϕ 

holds. 

From now on we suppress the language L, because it is 

obvious from context. Nevertheless it needs to be carefully 

defined.



Definition 6.4 (Consequence Set Cn(T )) 

A formula ϕ results from T if for all models v 

with v |= T also v |= ϕ holds. We write: T |= ϕ. 

We call 

Cn L (T ) = def {ϕ ∈ F ml L : T |= ϕ}, 

or simply Cn(T ), the semantic consequence 

operator.


Lemma 6.5 (Properties of Cn(T )) 

The semantic consequence operator has the following 

properties: 

1 T -expansion: T ⊆ Cn(T ), 

2 Monotony: T ⊆ T ′ ⇒ Cn(T ) ⊆ Cn(T ′ ), 

3 Closure: Cn(Cn(T )) = Cn(T ). 

Lemma 6.6 (ϕ ∉ Cn(T)) 

ϕ ∉ Cn(T ) if and only if there is a model v with v |= 

T and ¯v(ϕ) = false. 




Definition 6.7 (MOD(T), Cn(U)) 

If T ⊆ F ml L then we denote with MOD(T ) the set of all 

L-structures A which are models of T : 

MOD(T ) = def {A : A |= T }. 

If U is a set of models, we consider all those sentences, 

which are valid in all models of U. We call this set Cn(U): 

Cn(U) = def {ϕ ∈ F ml L : ∀v ∈ U : ¯v(ϕ) = true}. 

MOD is obviously dual to Cn: 

Cn(MOD(T )) = Cn(T ), MOD(Cn(T )) = MOD(T ).



Definition 6.8 (Completeness of a Theory T ) 

T is called complete if for each formula ϕ ∈ F ml: T |= ϕ or 

T |= ¬ϕ holds. 

Attention: 

Do not mix up this last condition with the property of a 

valuation (model) v: each model is complete in the 

above sense.



Definition 6.9 (Consistency of a Theory) 

T is called consistent if there is a valuation (model) v with 

¯v(ϕ) = true for all ϕ ∈ T . 

Lemma 6.10 (Ex Falso Quodlibet) 

T is consistent if and only if Cn(T ) ≠ F ml L .


6. Propositional Logic 3. Sudoku 

6.3 Sudoku


Since some time, Sudoku puzzles are becoming 

quite famous. 

Table 3: A simple Sudoku (S 1) 




Can they be solved with sentential logic? 

Idea: Given a Sudoku-Puzzle S, construct a 

language L Sudoku and a theory T S ⊆ F ml LSudoku 

such that 

MOD(T S ) = Solutions of the puzzle S 

Solution 

In fact, we construct a theory T Sudoku and for 

each (partial) instance of a 9 × 9 puzzle S a 

particular theory T S such that 

MOD(T Sudoku ∪ T S ) = {S : S is a solution of S}



We introduce the following language L Sudoku : 

1 eins i,j , 1 ≤ i, j ≤ 9, 

2 zwei i,j , 1 ≤ i, j ≤ 9, 

3 drei i,j , 1 ≤ i, j ≤ 9, 

4 vier i,j , 1 ≤ i, j ≤ 9, 

5 fuenf i,j , 1 ≤ i, j ≤ 9, 

6 sechs i,j , 1 ≤ i, j ≤ 9, 

7 sieben i,j , 1 ≤ i, j ≤ 9, 

8 acht i,j , 1 ≤ i, j ≤ 9, 

9 neun i,j , 1 ≤ i, j ≤ 9. 

This completes the language, the syntax. 

How many symbols are these?



We distinguished between the puzzle S and a 

solution S of it. 

What is a model (or valuation) in the sense of 

Definition 6.2? 

Table 4: How to construct a model S?



We have to give our symbols a meaning: the 

semantics! 

eins i,j means i, j contains a 1 

zwei i,j means i, j contains a 2 

. 

neun i,j means i, j contains a 9 

To be precise: given a 9 × 9 square that is 

completely filled in, we define our valuation v as 

follows (for all 1 ≤ i, j ≤ 9). 

{ true, if 1 is at position (i, j), 

v(eins i,j ) = 

false, else .




v(zwei i,j ) = 

false, else . 


v(drei i,j ) = 

false, else . 


v(vier i,j ) = 

false, else . 

usw. 


v(neun i,j ) = 

false, else . 

Therefore any 9 × 9 square can be seen as a 

model or valuation with respect to the language 

L Sudoku .



How does T S look like? 

T S = { eins 1,4 , eins 5,8 , eins 6,6 , 

zwei 2,2 , zwei 4,8 , 

drei 6,8 , drei 8,3 , drei 9,4 , 

vier 1,7 , vier 2,5 , vier 3,1 , vier 4,3 , vier 8,2 , vier 9,8 , 

} 

. 

neun 3,4 , neun 5,2 , neun 6,9 ,



How should the theory T Sudoku look like (s.t. 

models of T Sudoku ∪ T S correspond to solutions of the 

puzzle)? 

First square: T 1 

1 eins 1,1 ∨ . . . ∨ eins 3,3 

2 zwei 1,1 ∨ . . . ∨ zwei 3,3 

3 drei 1,1 ∨ . . . ∨ drei 3,3 

4 vier 1,1 ∨ . . . ∨ vier 3,3 

5 fuenf 1,1 ∨ . . . ∨ fuenf 3,3 

6 sechs 1,1 ∨ . . . ∨ sechs 3,3 

7 sieben 1,1 ∨ . . . ∨ sieben 3,3 

8 acht 1,1 ∨ . . . ∨ acht 3,3 

9 neun 1,1 ∨ . . . ∨ neun 3,3



The formulae on the last slide are saying, that 

1 The number 1 must appear somewhere in the 

first square. 


first square. 


first square. 

4 etc 

Does that mean, that each number 1, . . . , 9 

occurs exactly once in the first square?



No! We have to say, that each number occurs 

only once: 

T ′ 1: 

1 ¬(eins i,j ∧ zwei i,j ), 1 ≤ i, j ≤ 3, 

2 ¬(eins i,j ∧ drei i,j ), 1 ≤ i, j ≤ 3, 

3 ¬(eins i,j ∧ vier i,j ), 1 ≤ i, j ≤ 3, 

4 etc 

5 ¬(zwei i,j ∧ drei i,j ), 1 ≤ i, j ≤ 3, 

6 ¬(zwei i,j ∧ vier i,j ), 1 ≤ i, j ≤ 3, 

7 ¬(zwei i,j ∧ fuenf i,j ), 1 ≤ i, j ≤ 3, 

8 etc 

How many formulae are these?


Second square: T 2 

1 eins 1,4 ∨ . . . ∨ eins 3,6 

2 zwei 1,4 ∨ . . . ∨ zwei 3,6 

3 drei 1,4 ∨ . . . ∨ drei 3,6 

4 vier 1,4 ∨ . . . ∨ vier 3,6 

5 fuenf 1,4 ∨ . . . ∨ fuenf 3,6 

6 sechs 1,4 ∨ . . . ∨ sechs 3,6 

7 sieben 1,4 ∨ . . . ∨ sieben 3,6 

8 acht 1,4 ∨ . . . ∨ acht 3,6 

9 neun 1,4 ∨ . . . ∨ neun 3,6 

And all the other formulae from the previous 

slides (adapted to this case): T ′ 2 




The same has to be done for all 9 squares.



What is still missing: 

Rows: Each row should contain exactly the 

numbers from 1 to 9 (no number 

twice). 

Columns: Each column should contain exactly 

the numbers from 1 to 9 (no number 

twice).



First Row: T Row 1 

1 eins 1,1 ∨ eins 1,2 ∨ . . . ∨ eins 1,9 

2 zwei 1,1 ∨ zwei 1,2 ∨ . . . ∨ zwei 1,9 

3 drei 1,1 ∨ drei 1,2 ∨ . . . ∨ drei 1,9 

4 vier 1,1 ∨ vier 1,2 ∨ . . . ∨ vier 1,9 

5 fuenf 1,1 ∨ fuenf 1,2 ∨ . . . ∨ fuenf 1,9 

6 sechs 1,1 ∨ sechs 1,2 ∨ . . . ∨ sechs 1,9 

7 sieben 1,1 ∨ sieben 1,2 ∨ . . . ∨ sieben 1,9 

8 acht 1,1 ∨ acht 1,2 ∨ . . . ∨ acht 1,9 

9 neun 1,1 ∨ neun 1,2 ∨ . . . ∨ neun 1,9



Analogously for all other rows, eg. 

Ninth Row: T Row 9 

1 eins 9,1 ∨ eins 9,2 ∨ . . . ∨ eins 9,9 

2 zwei 9,1 ∨ zwei 9,2 ∨ . . . ∨ zwei 9,9 

3 drei 9,1 ∨ drei 9,2 ∨ . . . ∨ drei 9,9 

4 vier 9,1 ∨ vier 9,2 ∨ . . . ∨ vier 9,9 




8 acht 9,1 ∨ acht 9,2 ∨ . . . ∨ acht 9,9 

9 neun 9,1 ∨ neun 9,2 ∨ . . . ∨ neun 9,9 

Is that sufficient? What if a row contains 

several 1's?


First Column: T Column 1 

1 eins 1,1 ∨ eins 2,1 ∨ . . . ∨ eins 9,1 

2 zwei 1,1 ∨ zwei 2,1 ∨ . . . ∨ zwei 9,1 

3 drei 1,1 ∨ drei 2,1 ∨ . . . ∨ drei 9,1 

4 vier 1,1 ∨ vier 2,1 ∨ . . . ∨ vier 9,1 




8 acht 1,1 ∨ acht 2,1 ∨ . . . ∨ acht 9,1 

9 neun 1,1 ∨ neun 2,1 ∨ . . . ∨ neun 9,1 

Analogously for all other columns. 

Is that sufficient? What if a column contains 

several 1's? 




All put together: 

T Sudoku = T 1 ∪ T ′ 1 ∪ . . . ∪ T 9 ∪ T ′ 9 

T Row 1 ∪ . . . ∪ T Row 9 

T Column 1 ∪ . . . ∪ T Column 9



Here is a more difficult one. 

Table 5: A difficult Sudoku S schwer



The above formulation is strictly formulated 

in propositional logic. 

Theorem provers, even if they consider only 

propositional theories, often use predicates, 

variables etc. 

smodels uses a predicate logic formulation, 

including variables. But as there are no 

function symbols, such an input can be seen 

as a compact representation. 

It allows to use a few rules as a shorthand for 

thousands of rules using propositional 

constants.



For example smodels uses the following 

constructs: 

1 row(0..8) is a shorthand for row(0), 

row(1), ..., row(8). 

2 val(1..9) is a shorthand for val(1), 

val(2), ..., val(9). 

3 The constants 1, ..., 9 will be treated as 

numbers (so there are operations available to 

add, subtract or divide them).



1 Statements in smodels are written as 

head :- body 

This corresponds to an implication 

body → head. 

2 An important feature in smodels is that all 

atoms that do not occur in any head, are 

automatically false. 

For example the theory 

p(X, Y, 5) :- row(X), col(Y) 

means that the whole grid is filled with 5’s 

and only with 5's: eg. ¬p(X, Y, 1) is true for all 

X, Y , as well as ¬p(X, Y, 2) etc.


More constructs in smodels 

1 1 { p(X,Y,A) : val(A) } 1 

:- row(X), col(Y) 

this makes sure that in all entries of the grid, 

exactly one number (val()) is contained. 

2 1 { p(X,Y,A) : row(X) : col(Y) 

: eq(div(X,3), div(R,3)) 

: eq(div(Y,3), div(C,3) } 1 

:- val(A), row(R), col(C) 

this rule ensures that in each of the 9 squares 

each number from 1 to 9 occurs only once. 

3 More detailed info is on the web-page of this 

lecture. 



6. Propositional Logic 4. Calculi for SL 

6.4 Calculi for SL



Definition 6.11 (Axiom, Inference Rule) 

Axioms in SL are the following formulae: 

1 φ → ⊤, □ → φ, ¬⊤ → □, □ → ¬⊤, 

2 (φ ∧ ψ) → φ, (φ ∧ ψ) → ψ, 

3 φ → (φ ∨ ψ), ψ → (φ ∨ ψ), 

4 ¬¬φ → φ, (φ → ψ) → ((φ → ¬ψ) → ¬φ), 

5 φ → (ψ → φ), φ → (ψ → (φ ∧ ψ)). 

Note that φ, ψ stand for arbitrarily complex 

formulae (not just constants).




The only inference rule in SL is modus ponens: 

MP : F ml × F ml → F ml : (ϕ, ϕ → ψ) ↦→ ψ. 

or short 

(MP) ϕ, ϕ → ψ . 

ψ 

(ϕ, ψ are arbitrary complex formulae).



Definition 6.12 (Proof) 

A proof of a formula ϕ from a theory T ⊆ F ml L 

is a sequence ϕ 1 , . . . , ϕ n of formulae such that 

ϕ n = ϕ and for all i with 1 ≤ i ≤ n one of the 

following conditions holds: 

ϕ i is substitution instance of an axiom, 

ϕ i ∈ T , 

there is ϕ l , ϕ k = (ϕ l → ϕ i ) with l, k < i. Then 

ϕ i is the result of the application of modus 

ponens on the predecessor-formulae of ϕ i . 

We write: T ⊢ ϕ (ϕ can be derived from T ).



We show that: 

1 ⊢ ⊤, 

2 ⊢ ¬□ 

3 A ⊢ A ∨ B, 

4 ⊢ A ∨ ¬A, 

5 the rule 

A → ϕ, ¬A → ψ 

(R) 

ϕ ∨ ψ 

can be derived.



Theorem 6.13 (Completeness) 

A formula follows semantically from a theory T if 

and only if it can be derived: 

T |= ϕ if and only if T ⊢ ϕ 

Theorem 6.14 (Compactness) 

A formula semantically follows from a theory T if 

and only if it already follows from a finite subset 

of T : 

Cn(T ) = ⋃ {Cn(T ′ ) : T ′ ⊆ T, T ′ finite}.



Although the axioms from above and modus ponens 

suffice it is reasonable to consider more general systems. 

Therefore we introduce the following term: 

Definition 6.15 (Rule System MP + D) 

Let D be a set of general inference rules, e.g. mappings, 

which assign a formula ψ to a finite set of formulae 

ϕ 1 , ϕ 2 , . . . , ϕ n . We write 

ϕ 1 , ϕ 2 , . . . , ϕ n 

. 

ψ 

MP + D is the rule systems which emerges from adding 

the rules in D to modus ponens. For W ⊆ F ml be 

Cn D (W ) 

in the following the set of all formulae ϕ, which can be 

derived from W and the inference rules from MP+D.


We bear in mind that Cn(W ) is defined semantically but 

Cn D (W ) is defined syntactically (using the notion of 

proof). Both sets are equal according to the 

completeness-theorem in the special case D = ∅. 

Lemma 6.16 (Properties of Cn D ) 

D be a set of general inference rules and W ⊆ F ml. Then: 

1 Cn(W ) ⊆ Cn D (W ). 

2 Cn D (Cn D (W )) = Cn D (W ). 

3 Cn D (W ) is the smallest set which is closed in respect to 

D and contains W . 




Question: 

What is the difference between an inference rule ϕ and the 

ψ 

implication ϕ → ψ? 

Assume we have a set T of formulae and we choose two 

constants p, q ∈ L. We could either consider 

(1) T together with MP and { p q } 

or 

(2) T ∪ {p → q} together with MP



1. Case: Cn { p q } (T ), 

2. Case: Cn(T ∪ {p → q}). 

If T = {¬q}, then we have in (2): 

¬p ∈ Cn(T ∪ {p → q}), but not in (1).



It is well-known, that any formula φ can be 

written as a conjunction of disjunctions 

n∧ ∨m i 

i=1 j=1 

φ i,j 

The φ i,j are just constants or negated constants. 

The n disjunctions ∨ m i 

j=1 φ i,j are called clauses of 

φ. 

Normalform 

Instead of working on arbitrary formulae, it is 

sometimes easier to work on finite sets of 

clauses.



A resolution calculus for SL 

Given be a set M of clauses of the form 

A ∨ ¬B ∨ C ∨ . . . ∨ ¬E 

Such a clause is also written as the set 

{A, ¬B, C, . . . , ¬E}. 

We define the following inference rule: 

Definition 6.17 (SL resolution) 

Deduce the clause C 1 ∪ C 2 from C 1 ∪ {A} and C 2 ∪ {¬A}. 

Question: 

Is this calculus correct and complete?



Answer: 

No! 

But every problem of the kind “T |= φ” is equivalent to 

“T ∪ {¬φ} is unsatisfiable” 

or rather to 

T ∪ {¬φ} ⊢ □ 

(⊢ stands for the calculus introduced above). 

Theorem 6.18 (Completeness of Resolution Refutation) 

If M is an unsatisfiable set of clauses then the empty clause 

□ can be derived from M using only the resolution rule. 

We also say that resolution is refutation complete.


6. Propositional Logic 5. Wumpus in SL 

6.5 Wumpus in SL



4 

Stench 

Breeze 

PIT 

3 

Breeze 

Stench 

PIT 

Breeze 

Gold 

2 

Stench 

Breeze 

1 

Breeze 

PIT 

Breeze 

START 

1 2 3 4

¡ 

¢ 

¤ 

£ 


¢ 

¡ 

¢ 


1,4 2,4 3,4 4,4 

1,3 2,3 3,3 4,3 

1,2 2,2 3,2 4,2 

OK 

1,1 2,1 3,1 4,1 

A 

OK 

OK 

¥ 

(a) 

A = Agent 

B = Breeze 



P = Pit 

S = Stench 

V = Visited 

W = Wumpus 

1,4 2,4 3,4 4,4 

1,3 2,3 3,3 4,3 

1,2 2,2 3,2 4,2 

P? 

OK 

1,1 2,1 3,1 4,1 

A P? 

¤ 

V B 

OK OK 

¥ 

(b)

¤ 

¢ 

£ 

¡ 

¡ 

£ 

¤ 


¤ 

£ 

¡ 

¢ 

£ 

¡ 

¡ 


1,4 2,4 3,4 4,4 

1,3 2,3 3,3 4,3 

W! 

1,2 2,2 3,2 4,2 

A 

S 

OK 

OK 

1,1 2,1 3,1 4,1 

B P! 

£ 

V V 

OK OK 

¥ 

(a) 

¢ 

A = Agent 

B = Breeze 



P = Pit 

S = Stench 

V = Visited 

W = Wumpus 

1,4 2,4 3,4 4,4 

P? 

1,3 

W! 

2,3 

A 

3,3 

P? 

4,3 

S G 

B 

1,2 2,2 3,2 4,2 

S 

£ 

V V 

OK OK 

1,1 2,1 3,1 4,1 

B P! 

£ 

V V 

OK OK 

(b)



Language definition: 

S i,j stench 

B i,j breeze 

P it i,j is a pit 

Gl i,j glitters 

contains Wumpus 

W i,j 

General knowledge: 

¬S 1,1 −→ (¬W 1,1 ∧ ¬W 1,2 ∧ ¬W 2,1 ) 

¬S 2,1 −→ (¬W 1,1 ∧ ¬W 2,1 ∧ ¬W 2,2 ∧ ¬W 3,1 ) 

¬S 1,2 −→ (¬W 1,1 ∧ ¬W 1,2 ∧ ¬W 2,2 ∧ ¬W 1,3 ) 

S 1,2 −→ (W 1,3 ∨ W 1,2 ∨ W 2,2 ∨ W 1,1 )



Knowledge after the 3rd move: 

¬S 1,1 ∧ ¬S 2,1 ∧ S 1,2 ∧ ¬B 1,1 ∧ ¬B 2,1 ∧ ¬B 1,2 

Question: 

Can we deduce that the wumpus is located at (1,3)? 

Answer: 

Yes. Either via resolution or using our Hilbert-calculus.



Programming versus knowledge engineering 

programming 

choose language 

write program 

write compiler 

run program 

knowledge engineering 

choose logic 

define knowledge base 

implement calculus 

derive new facts


7. Modal Logic 

Chapter 7. Modal Logic 

Modal Logic 

7.1 Reasoning about Knowledge 

7.2 Kripke Semantics 

7.3 Axioms for Modal Logics 

7.4 Normative Reasoning 

7.5 References



We are extending propositional logic by modality 

operators. They can express beliefs and knowledge. 

We introduce the semantics of modal logics, based on 

Kripke-structures. 

Finally we consider variants of modal logics 

representing obligations and permissions.



Modal logic is an extension of classical logic by 

new connectives □ and ♦: necessity and 

possibility. 

“□p is true” means p is necessarily true, i.e. 

true in every possible scenario, 

“♦p is true” means p is possibly true, i.e. 

true in at least one possible scenario. 

Obviously, independently of the precise 

definition, the following holds 

♦p ↔ ¬□¬p



Various modal logics: 

knowledge → epistemic logic, 

beliefs → doxastic logic, 

obligations → deontic logic, 

actions → dynamic logic, 

time → temporal logic, 

and combinations of the above: 

most famous multimodal logics: 

BDI logics of beliefs, desires, intentions (and 

time).



Most of it can be translated to classical logic; 

. . . but it looks horribly UGLY there; 

. . . and in most cases it’s not automatizable 

any more. 

Remember Frege’s notation of predicate logic:



But: 

MSPASS is a theorem prover implementing 

many modal logics, 

also description logics, relational calculus, etc 

built upon SPASS, a resolution prover for 

first-order logic with equality, 

check out http: 

//www.cs.man.ac.uk/~schmidt/mspass/.


7. Modal Logic 1. Reasoning about Knowledge 

7.1 Reasoning about Knowledge



Example 7.1 (Muddy children – Shoham’s version) 

A group of n children enters the house after having played 

in the mud outside. They are greeted by their father, who 

notices that k of them have mud on their foreheads (no kid 

can see whether she herself has a muddy forehead, but 

they can see all other foreheads). 

Can the kids determine by pure thinking wether they 

have a muddy forehead? 

The father announces: At least one of you has mud on 

her forehead. 

He also says If you know (can prove) that your forehead is 

muddy, then raise your hands now. 

Nothing happens. The father keeps repeating the 

question. 

After exactly k rounds, all the children with muddy 

foreheads raise their hands.



How is that possible? The announcement of the 

father does not reveal anything, or does-it?



Definition 7.2 (Partition Model) 

An n-agent partition model over a language L is a tuple 

(W, I 1 , . . . , I n ), where 

W: is a set of possible worlds, i.e. a set of 

L-structures (each L sentence φ is either true 

in each of w ∈ W or not). 

I i : each I i is a partition of W: 

I i = {W i1 , W i2 , . . . , W ir } with W ij ∩ W ik = ∅ for 

j ≠ k and ⋃ 1≤j≤r W i j 

= W. 

We also use the notation (w ∈ W) 

I i (w) := {w ′ : w ∈ W ij and w ′ ∈ W ij } 

all worlds in the partition containing world w, 

according to agent i’s view of the universe.



The worlds in W can be propositional valuations 

or even first-order structures. In this lecture, we 

will mainly consider the propositional version. 

The language L in our example consists just of 

the propositional constants mud 1 , . . . , mud n . 

How can we formalise the notion of one agent 

knowing something? And reason about what 

agents know and draw conclusions?



We introduce new operators K i with the intuitive 

meaning K i φ “agent i knows that φ holds”. We 

can now formulate statements in a language 

L(K 1 , . . . , K n ) containing these operators. 

It remains to define a precise semantics for this 

notion.



Definition 7.3 (Semantics for partition models) 

Let an n-agent partition model 

A = (W, I 1 , . . . , I n ) over a language L be given. 

Then we define the relation |= for L(K 1 , . . . , K n ) 

formulae as follows: 

for φ ∈ L: A, w |= φ if and only if w |= φ, 

A, w |= K i φ if and only if for all worlds w ′ , if 

w ′ ∈ I i (w) then A, w ′ |= φ.



Muddy children revisited (blackboard).


7. Modal Logic 2. Kripke Semantics 

7.2 Kripke Semantics



A slight generalisation of partition models leads 

to modal logic. 

How can one look at a partition? It is like a set of 

equivalence classes: for agent i, all worlds in 

one partition are equivalent. They are all 

possible scenarios. 

So let’s generalise and introduce a binary 

relation R on all worlds: w 1 Rw 2 meaning that 

world w 2 can be accessed (is reachable) from 

world w 1 .



An Example 

s 

q 0 

x=0 

q 2 

q 1 

x=2 

x=1 

c 

x . = 1 → K s x . = 1 

x . = 2 → ¬K s x . = 2 

x . = 2 → K s ¬x . = 1 

x . = 2 → K c K s ¬x . = 1 

x . = 2 → ¬K s K c K s ¬x . = 1



Definition 7.4 (Modal Logic with n modalities) 

The language L modaln of modal logic with n 

modal operators □ 1 , . . . , □ n is the smallest set 

containing the propositional constants of L, and 

with formulae φ, ψ also the formulae 

□ i φ, ¬φ, φ ∧ ψ. We treat ∨, →, ↔, ♦ as macros 

(defined as usual). 

Note that the □ operators can be nested: 

(□ 1 □ 2 □ 1 p) ∨ □ 3 ¬p



Definition 7.5 (Semantics of Modal Logic) 

The truth of a L modal1 -formula is evaluated relative to a world w and a 

Kripke structure 〈W, R〉, where W is a set of (propositional) models 

(possible worlds) and R is a binary relation on them, the accessibility 

relation. 

The relation |= is defined as follows: 

〈W, R〉 |= w p if p is primitive and w |= p, 

〈W, R〉 |= w φ ∧ ψ if 〈W, R〉 |= w φ and 〈W, R〉 |= w ψ. 

〈W, R〉 |= w ¬φ if not 〈W, R〉 |= w φ 

〈W, R〉 |= w □φ if for any w ′ ∈ W with wRw ′ : 〈W, R〉 |= w ′ φ


7. Modal Logic 3. Axioms for Modal Logics 

7.3 Axioms for Modal Logics



As in classical logic, one can ask about a 

complete axiom system. Is there a calculus that 

allows to derive all sentences true in all Kripke 

models? 

Definition 7.6 (System K) 

The system K is an extension of the propositional 

calculus by the axiom 

Axiom K (□φ ∧ □(φ → ψ)) → □ψ 

and the inference rule 

ϕ 

(Necessitation) 

□ϕ .



Theorem 7.7 (Soundness/completeness of System K) 

System K is sound and complete with respect to 

arbitrary Kripke models. 

If we allow n modalities, the theorem as well as 

the definitions extend in an obvious way. The 

calculus is then called System K n to account for 

the n modalities.



Note that we have not assumed any 

properties of the accessibility relation R: it is 

just any binary relation. 

How does our partition model fit in this 

framework? 

A partition is nothing else than an 

equivalence relation! 

Assuming that R is an equivalence relation, 

what additional statements (axioms) are true 

in all Kripke models?



To which axioms do the following properties 

lead? 

Reflexivity: xRx for all x. 

Transitivity: xRy and yRz implies xRz for all 

x, y, z. 

Euclidean: xRy and xRz implies yRz for all for 

all x, y, z. 

Serial: For all x there is a y with xRy. 

Symmetry: For all x and y: xRy implies yRx.



Lemma 7.8 

A binary relation is an equivalence relation if and 

only if it is reflexive, transitive and euclidean.



Definition 7.9 (Extending K by Axioms D, T, 4,5) 

The system K is often extended by (a subset of) 

the following axioms: 

K (K i φ ∧ K i (φ → ψ)) → K i ψ 

D ¬K i (ϕ ∧ ¬ϕ) 

T K i ϕ → ϕ 

4 K i ϕ → K i K i ϕ 

5 ¬K i ϕ → K i ¬K i ϕ 

called as above by historical reasons. The system 

consisting of KT45 is also called S5.



Theorem 7.10 (Sound/complete Subsystems of KDT45) 

Let X be any subset of {D, T, 4, 5, ϕ → K i ¬K i ¬ϕ} and let 

X be any subset of 

{serial, reflexive, transitive, euclidean, symmetry} 

corresponding to X. 

Then K ∪ X is sound and complete with respect to Kripke 

models the accessibility relation of which satisfies X . 

Corollary 7.11 (Sound-, completeness of KT45 (S5)) 

System KT45 is sound and complete with respect to Kripke 

models the accessibility relation of which is reflexive, 

transitive and euclidean.



Exercise: Show that 

1 The axiom D follows from KT45. 

2 Show that KD45 is not equivalent to K45: 

axiom D does not follow from K45. 

3 Show that ¬K i ¬ϕ does not follow from 

KDT45. 

4 Show that K i ϕ → ϕ is not a tautology in K. 

5 Show that K i ϕ → K i K i ϕ is not a tautology in 

K+K i ϕ → ϕ.



Up to now we were thinking of K i ϕ (the box 

operator □ i ) as agent i knows that ϕ. What if 

we interpret the operator as belief? 

Under such an interpretation axiom T has to be 

dropped. But all other axioms make sense. 

Axiom system KD45 is called the standard 

logic of beliefs. Axiom K is called logical 

omniscience, axiom D is called consistency, 

axiom 4 (resp. axiom 5) is called positive 

(resp. negative) introspection.


7. Modal Logic 4. Normative Reasoning 

7.4 Normative Reasoning



Deontic Logic is concerned with normative law 

and normative reasoning in law. The operators 

used denote obligation, permission, 

prohibition etc. 

We describe two systems: system OS 

([vonWright51]) and KD ([aqvist84]).



Definition 7.12 (System OS) 

The system OS is based on two modalities 

(deontic operators): O, meaning obligation, 

and P meaning permission. It is an extension of 

the propositional calculus by the axioms 

OS1 Op ↔ ¬P¬p 

OS2 Pp ∨ P¬p 

OS3 P(p ∨ q) ↔ Pp ∨ Pq 

and the inference rule 

ϕ ↔ ψ 

(OS4) 

Pϕ ↔ Pψ .



It can be shown that this system can be seen as a 

modal logic with O as primitive modality. But 

then, because of necessitation, one has to accept 

the derivable sentence 

(O⊤) O(p ∨ ¬p) 

(stating the existence of an empty normative 

system): von Wright originally rejected this as an 

axiom.



The KD System 

The KD System is a von Wright-type system 

including the F (forbidden) operator and 

consisting of the following additional axioms 

and rules (again the classical axioms and modus 

ponens are included). 

(KD1) O(p → q) → (Op → Oq) 

(KD2) Op → Pp 

(KD3) Pp ≡ ¬O¬p 

(KD4) Fp ≡ ¬Pp 

(KD5) O-necessitation: 

p 

Op



Axiom (KD1) is the so called K-axiom; (KD2) is 

the D-axiom, stating that obligatory implies 

permitted; (KD3) states that permission is the 

dual of obligation and (KD4) says that forbidden 

is not permitted. (KD1) holds for any modal 

necessity operator. 

O is the basic operator (all others are definable 

with it). The axioms reflect the idea that 

something is obligated if it holds in all perfect 

(ideal) worlds (relative to the world where one 

is).


7. Modal Logic 5. References 

7.5 References



[aqvist84] Åqvist, L. (1984). 

Deontic logic. 

In D. M. Gabbay and F. Guenther (Eds.), Handbook of 

Philosophical Logic, Vol II, pp. 605–714. Reidel. 

[Alur et al. 2002] R. Alur, T. A. Henzinger, and 

O. Kupferman. 

Alternating-time Temporal Logic. 

Journal of the ACM, 49:672–713, 2002.



[Broersen et al., 2001a] Broersen, J., Dastani, M., Huang, 

Z., and van der Torre, L. (2001a). 

The BOID architecture: conflicts between beliefs, 

obligations, intentions and desires. 

In Müller, J., Andre, E., Sen, S., and Frasson, C., 

editors, Proceedings of the Fifth International 

Conference on Autonomous Agents, pages 9–16. ACM 

Press. 

[Cohen and Levesque, 1990] Cohen, P. and Levesque, H. 

(1990). 

Intention is choice with commitment. 

Artificial Intelligence, 42:213–261.



[Emerson 1990] E. A. Emerson. 

Temporal and modal logic. 

Handbook of Theoretical Computer Science, volume B, 

995–1072. Elsevier, 1990. 

[Fagin et al. 1995] Fagin, R., Halpern, J. Y., Moses, Y. & 

Vardi, M. Y. 

Reasoning about Knowledge. 

MIT Press: Cambridge, MA.



[GiunchigliaTraverso00] Fausto Giunchiglia and Paolo 

Traverso. 

In Susanne Biundo and Maria Fox (Eds.) Recent 

Advances in AI Planning, 5th European Conference on 

Planning, ECP’99, Durham, UK, September 8-10, 1999, 

Proceedings, 1–20, LNCS 1809, Springer 2000. 

[HiDBoHoMe98] Hindriks, K. V., F. S. de Boer, H. Hoek, 




Rationality, pp. 20–29.



[Huth00] Huth, M. & Ryan, M. 

Logic in Computer Science: Modeling and reasoning 

about systems. 

Cambridge University Press. 

[Jamroga, 2004] Jamroga, W. (2004). 

Strategic planning through model checking of ATL 

formulae. 

In et al., L. R., editor, Artificial Intelligence and Soft 

Computing: Proceedings of ICAISC’04, volume 3070 of 

Lecture Notes in Computer Science, pages 879–884. 

Springer Verlag.



[Jamroga/Dix1] Wojtek Jamroga and Jürgen Dix. 

Model Checking Strategic Abilities of Agents under 

Incomplete Information. 

In Mario Coppo, Elena Lodi and G. Michele Pinna 

(Eds.), Proceedings of the Italian Conference on 

Theoretical Computer Science (ICTCS ’05), pages 

295–308. LNCS 3701. Springer, 2005.




Do Agents Make Model Checking Explode 

(Computationally)? 

M. Pe(chouc(ek and P. Petta and L.Z. Varga (Eds.), 

Proceedings of the 4th International Central and 

Eastern European Conference on Multi-Agent 

Systems (CEEMAS ’05), pages 398-407. LNCS 3690. 

Springer, 2005.



[kripke63b] Kripke, S. (1963a). 

Semantical analysis of modal logic I. Normal 

propositional calculi. 

Zeitschrift fur math. Logik und Grundlagen der 

Mathematik 9, 67–96. 

[kripke63a] Kripke, S. (1963b). 

Semantical considerations on modal logic. 

Acta Philosophica Fennica 16, 83–94.



[kripke65] Kripke, S. (1965). 

Semantical analysis of modal logic II. Non-normal 

modal propositional calculi. 

In Addison, Henkin, and Tarski (Eds.), The theory of 

models, Amsterdam, North-Holland, pp. 206–220. 

[Pistore and Traverso, 2001] Pistore, M. and Traverso, P. 

(2001). 

Planning as model checking for extended goals in 

non-deterministic domains. 

In Proceedings of IJCAI, pages 479–486.




(1991). 



Proceedings of the International Conference on 

Knowledge Representation and Reasoning, Cambridge, 

MA, pp. 473–484. Morgan Kaufmann.



[vonWright51] von Wright, G. H. (1951). 


Mind 60, 1–15. 

Reprinted in G. H. von Wright, Logical Studies, pp. 

58–74. Routledge and Kegan Paul, 1957. 

[Wooldridge, 2000] Wooldridge, M. (2000). 


MIT Press : Cambridge, Mass.


8. Action/Time as Modalities 

Chapter 8. Action/Time as 

Modalities 

Action/Time as Modalities 

8.1 Time: LTL, CTL 

8.2 Action: Dynamic Logic 

8.3 ATL 

8.4 BDI 

8.5 Verification 

8.6 References


8. Action/Time as Modalities 

Up to now: 

Several operators K i , each defines an 

accessibility relation on worlds. 

Description of static systems: no possibility 

of change 

MAS are dynamic: 

As an example of a temporal logic, we 

introduce CTL. 

Then we add programming constructs 

[a; b] [φ?] and define dynamic logic, where 

we can talk about programs and their 

behaviour. 

We end this section with introducing ATL: 

Temporal logic meets game theory.


8. Action/Time as Modalities 1. Time: LTL, CTL 

8.1 Time: LTL, CTL



Reasoning about Time: CTL 

Ideas: 

The accessibility relation can be seen as representing 

time. 

time: linear vs. branching 

start 

start



Typical temporal operators 

○ϕ 

□ϕ 

♦ϕ 

ϕ U ψ 

ϕ is true in the next moment in time 

ϕ is true in all future moments 

ϕ is true in some future moment 

ϕ is true until the moment when ψ 

becomes true 

□((¬passport ∨ ¬ticket) → ○¬board_flight) 

send(msg, rcvr) → ♦receive(msg, rcvr)



Temporal logic was originally developed in order to 

represent tense in natural language. 

Within Computer Science, it has achieved a significant 

role in the formal specification and verification of 

concurrent and distributed systems. 

Much of this popularity has been achieved because a 

number of useful concepts can be formally, and concisely, 

specified using temporal logics, e.g. 

safety properties 

liveness properties 

fairness properties



Safety: 

“something bad will not happen” 

“something good will always hold” 

Typical examples: 

□¬bankrupt 

□(fuelOK ∨ 

❧ fuelOK) 

and so on . . . 

Usually: □¬....



Liveness: 

“something good will happen” 


♦rich 

roL → ♦roP 

and so on . . . 

Usually: ♦....



Combinations of safety and liveness possible: 

□(roL → ♦roP ) 

♦□roP



Fairness: Often only really useful when 

scheduling processes, responding to messages, 

etc. 

Fairness (strong): 

“if something is attempted/requested, then 

it will be successful/allocated” 


□(attempt → ♦success) 

□♦attempt → □♦success



Linear Time Logic: Semantics 

Definition 8.1 (Models of LTL) 

A model of LTL is a sequence of time moments. We call 

such models paths, and denote them by λ. 

Evaluation of atomic propositions at particular time 

moments, denoted by λ[i] is also needed. We also use the 

notation λ[i . . . j] to denote all time moments between i 

and j (including both). λ[i . . . ∞] denotes the denumerable 

infinite set of all timepoints from i on into the future.



Linear Time Logic: Semantics 

Evaluating a formula ϕ in a path means to evaluate it in 

the first element of the path. 

Definition 8.2 (Semantics of LTL) 

λ[0 . . . ∞] |= p iff p is true at moment λ[0]; 

λ[0 . . . ∞] |= ❥ ϕ iff λ[1 . . . ∞] |= ϕ; 

λ[0 . . . ∞] |= ♦ϕ iff λ[i . . . ∞] |= ϕ for some i 0; 

λ[0 . . . ∞] |= □ϕ iff λ[i . . . ∞] |= ϕ for all i 0; 

λ[0 . . . ∞] |= ϕ U ψ iff λ[i . . . ∞] |= ψ for some i ≥ 0, and 

λ[j . . . ∞] |= ϕ for all 0 ≤ j i. 

Note that: 

□ϕ ≡ ¬♦¬ϕ, ♦ϕ ≡ ¬□¬ϕ, ♦ϕ ≡ ⊤ U ϕ



Branching Time: CTL 

CTL: Computational Tree Logic. 

Reasoning about possible computations of a system 

Time is branching: we want all alternative paths 

included! 

Models: states (time points, situations), transitions 

(changes) 

Paths: courses of action, computations.



CTL models: transition systems 

Definition 8.3 (Transition System) 

A transition system is a pair 

〈Q, −→〉 

where Q is a non-empty set of states −→⊆ Q × Q is a 

transition relation. 

Note that, formally, a transition relation is just a modal 

accessibility relation.



Definition 8.4 (Paths) 

A path λ is an infinite sequence of states that can 

be affected by subsequent transitions. 

A path must be full, i.e. either infinite, or ending 

in a state with no outgoing transition. 

Usually, we assume that the transition relation is 

serial (time flows forever). 

Then, all paths are infinite.



Reasoning about Time: CTL 

Path quantifiers: A (for all paths), E (there is a path); 

Temporal operators: ❥ (nexttime), ♦ (sometime), □ 

(always) and U (until); 

“Vanilla” CTL: every temporal operator must be 

immediately preceded by exactly one path quantifier; 

CTL*: no syntactic restrictions; 

Reasoning in “vanilla” CTL can be automatized.



Example: Rocket and Cargo 

A rocket and a cargo, 

The rocket can be moved between London 

(proposition roL) and Paris (proposition roP ), 

The cargo can be in London (caL), Paris 

(caP ), or inside the rocket (caR), 

The rocket can be moved only if it has its fuel 

tank full (fuelOK), 

When it moves, it consumes fuel, and nofuel 

holds after each flight.




roL 

nofuel 

1 caL 

roL 2 

roP 

roP 

fuelOK nofuel fuelOK 

caL caL 3 caL 4 

5 roL 

roL 6 

roP 

roP 

nofuel fuelOK nofuel fuelOK 

caR 

caR 7 caR caR 8 

roL → E♦roP 

A□(roL ∨ roP ) 

roL → A ❤ (roP → nofuel) 

roL 

roL 

roP 

roP 


9 caP 10 caP caP 11 

caP 12



Definition 8.5 (Semantics of CTL*: state formulae) 

M, q |= Eϕ if, by definition, there is a path λ, 

starting from q, such that M, λ |= 

ϕ; 

M, q |= Aϕ if, by definition, for all paths λ, 

starting from q, we have M, λ |= ϕ. 

Definition 8.6 (Semantics of CTL*: path formulae) 

Exactly like for LTL!




roL 

nofuel 

1 caL 

roL 2 

roP 

roP 


caL caL 3 

caL 4 

5 roL 

roL 6 

roP 

roP 


caR 


E♦caP 

roL 

roL 

roP 

roP 



caP 12



How to express that there is no possibility of a 

deadlock?


8. Action/Time as Modalities 2. Action: Dynamic Logic 

8.2 Action: Dynamic Logic



1st. Idea: Consider actions or programs a. Each 

action a defines a transition 

(accessibility relation) from worlds into 

worlds. 

2nd. Idea: We need statements about the 

outcome of actions: 

[a]φ: “after every execution of a, φ 

holds; 

〈a〉φ: “after some executions of a, φ 

holds. 

Naturally, 〈a〉φ ≡ ¬[a]¬φ.



3rd. Idea: Programs/actions can be combined 

(sequentially, nondeterministically, 

iteratively): 

[a; b]φ → [c]φ ′ 

meaning “if after execution of a and 

then b the formula φ holds, then after 

executing c the formula φ ′ holds”.



Models for DL: Labelled Transition Systems 

Definition 8.7 (Labelled Transition System) 

A labelled transition system is a pair 

〈Q, { a 

−→: a ∈ L}〉 

where Q is a non-empty set of states and L is a 

non-empty set of labels and for each a ∈ L: 

a 

−→⊆ Q × Q.



The Rocket Example: Adding Actions 

fuel 

roL 

nofuel 

1 caL 

roL 2 

roP 

roP 



5 roL 

roL 6 

roP 

roP 


caR 


roL 

roL 

roP 

roP 



caP 12 

[fuel]fuelOK



Definition 8.8 (Dynamic Logic: Models) 

A model of propositional dynamic logic is given 

by a labelled transition system and a valuation of 

propositions. 

Definition 8.9 (Composite labels) 

We assume that the set of labels forms a 

Kleene algebra 〈L, ;, ∪, ∗〉. We also assume that 

the set of labels contains constructs of the form 

“φ?”, whenever φ is a formula not involving any 

modalities.



“;” means sequential composition, 

∪ means nondeterministic choice, 

∗ means finite iteration (regular expr.), 

[φ?] means test. 

Thus we assume that the labels obey the 

following conditions: 

s a;b 

−→ t iff s 

s a∪b 

−→ t iff s 

ab 

−→ t, 

−→ a 

t or s 

b 

−→ t, 

s −→ a∗ 

t is the reflexive and transitive closure of 

s −→ a 

t, 

s −→ φ? 

t iff s |= M φ.



What has this to do with programs? 

if φ then a else b 

while φ do a 

(φ?;a) ∪ (¬φ?;b) 

(φ?;a) ∗ ; ¬φ?



(nofuel?;fuel)(fuelOK?;load) 

roL 

nofuel 

1 caL 

roL 2 

roP 

roP 



5 roL 

roL 6 

roP 

roP 


caR 


roL 

roL 

roP 

roP 



caP 12 

caR → [(nofuel?; fuel) ∪ (fuelOK?; load)]caR


8. Action/Time as Modalities 3. ATL 

8.3 ATL



ATL: What Agents Can Achieve 

ATL: Agent Temporal Logic [Alur et al. 1997] 

Temporal logic meets game theory 

Main idea: cooperation modalities 

〈〈A〉〉Φ: coalition A has a collective strategy to 

enforce Φ



〈〈jamesbond〉〉♦win: 

“James Bond has an infallible plan to 

eventually win” 

〈〈jamesbond, bondsgirl〉〉fun U shot: 

“James Bond and the current Bond’s girl have 

a collective way of having fun until someone 

shoots at them”



ATL Models: Concurrent Game Structures 

Definition 8.10 (Concurrent Game Structure) 

A concurrent game structure is a tuple 

M = 〈Agt, Q, π, Act, d, o〉, where: 

Agt: a finite set of all agents 

Q: a set of states 

π: a valuation of propositions 

Act: a finite set of (atomic) actions 

d : Agt × Q → P(Act) defines actions available to an 

agent in a state 

o: a deterministic transition function that assigns 

outcome states q ′ = o(q, α 1 , . . . , α k ) to states and tuples 

of actions



Rocket Example: Simultaneous Actions 

< load 1,nop 2,fuel> 

< nop 

1,nop 2,fuel> 

< nop 

1,nop 2,nop3> 

< load 

1,unload 2,fuel> 

< unload 

< load 

1,unload 2,nop3> 

1,nop 2,fuel> 

< unload 

1,unload < unload 1,nop 2,nop3> 

< nop 

< unload 

1,unload 1,unload 2,fuel> 

< nop ,unload ,load > 

1 2 3 

< load 

1,nop 2,nop3> 

< nop < load 

1,nop 2,load3> 

< load ,unload ,load > 

1 2 3 

roL 

nofuel 

1 caL 

roL 2 

roP 

roP 



5 roL 

roL 6 

roP 

roP 


caR 


roL 

roL 

roP 

roP 



caP 12



Definition 8.11 (Strategy) 

A strategy is a conditional plan. 

We represent strategies with functions 

s a : Q → Act. 

The function out(q, S A ) returns the set of all 

paths that may result from agents A 

executing strategy S A from state q onward.



Definition 8.12 (Semantics of ATL) 

1 M, q |= 〈〈A〉〉Φ if, by definition,there is a 

collective strategy S A such that, for every 

path λ ∈ out(q, S A ), we have M, λ |= Φ




< nop 

1,nop 2,fuel> 

< nop 

1,nop 2, 

nop 3> 

< load 


< load 

1,unload 2, 

nop < unload 

3> 

1,nop 2,fuel> 

< unload 

1,unload < unload 1,nop 2, 

< unload 

1,unload nop 2, 

> 

< nop ,unload ,load 

nop 3> 

< nop 


3 

> 

1 2 3 

< load 

1,nop 2, 

< nop 2,load 

nop 3> 

3> 

< load 



1 2 3 

roL 

nofuel 

1 caL 

roL 2 

roP 

roP 



5 roL 

roL 6 

roP 

roP 


caR 


roL 

roL 

roP 

roP 



caP 12 

nofuel → 〈〈3〉〉□nofuel



Practical Importance of Temporal and Strategic Logics: 

Automatic verification in principle possible 

(model checking). 

Can be used for automated planning. 

Executable specifications can be used for 

programming. 

see Slide 565.


8. Action/Time as Modalities 4. BDI 

8.4 BDI



Beliefs, Desires, Intentions 

BDI according to Cohen and Levesque: 

Mental primitives: beliefs and goals, 

Separate operators and relations for each 

agent 

Time and action: LTL and DL. 

Altogether: multi-modal logic



Operator Meaning 

B i ϕ agent i believes ϕ 

G i ϕ agent i has goal of ϕ 

○ α action α will happen next 

Done α action α has just happened 

We also use the constructors “;” and “?” as 

introduced previously.



The primitive constructs above allow us to define the 

following operators: 

Derived Operator Definition Meaning 

♦α ∃x(○ x;α?) sometime α 

□α ¬♦¬α always α 

(Later ϕ) ¬ϕ ∧ ♦ϕ strict sometime 

(Before ϕ , ψ) ϕ → □ψ ϕ holds before ψ 

Examples: 

G citizen □safe a 

G police B citizen □safe citizen



All goals have to be dropped eventually: 

♦¬(G i (Later ϕ)) 

How to define a persistent goal? 

P-Goal i ϕ = (G i (Later ϕ)) ∧ 

B i ¬ϕ ∧ 

Before ((B i ϕ) ∨ (B i □¬ϕ) , (¬(G i Later ϕ)) 

How to define intention? 

Intend i ϕ = (P-Goal i [Done i (B i (○ ϕ))? ; ϕ])



BDI according to Rao and Georgeff: 

Mental primitives: beliefs, desires and 

intentions; 

Time: CTL.



W 1 

W 2 

W 3 

W 4 

hasA 

q 0 

AK 1 

D a 

q 0 

q 0 

q 0 

B a 

AK 1 

hasA hasA 

AQ 1 

D a 

hasA 

AQ 1 

B opt 

B opt 

hasA 

win 

AK 2 

hasA 

win 

AK 2 

QK hasA 

2 AQ 2 KQ 2 

win 

hasA 

win KQ 2



hasA ∧ Bel a hasA : the agent has the Ace, and 

he is aware of it; 

Bel a E ○ win : he believes there is a way to 

win in the next step;



Des a A ○ win : he desires that every path 

leads to a victory, so he does not have to 

worry about his decisions; 

However, he does not believe it is possible, so 

his desires are rather unrealistic: 

¬Bel a A ○ win; 

Moreover, he believes that a real optimist 

would believe that winning forever is 

inevitable: Bel a Bel opt A ○ □win.



What do we use these frameworks for? 

Analysis & Design 

Modeling systems (the frameworks provide 

intuitive conceptual structures, and a 

systematic approach); 

Specifying desireable properties of systems. 

Verification & Exploration 

Reasoning about concrete systems; 

Correctness testing.



What do we use these frameworks for? 

Automatic Generation of Behaviours 

Programming with executable 

specifications; 

Automatic planning. 

Philosophy of Mind and Agency 

Characterization of mental attitudes; 

Discussion of rational agents; 

Testing rationality assumptions.


8. Action/Time as Modalities 5. Verification 

8.5 Verification



Model Checking Formulae of CTL and ATL 

Model checking: Does ϕ hold in model M 

and state q? 

Nice results: model checking CTL and ATL is 

tractable!



Theorem (Clarke, Emerson & Sistla 1986) 

CTL model checking is P -complete, and can be 

done in time linear in the size of the model 

and the length of the formula. 

Theorem (Alur, Kupferman & Henzinger 1998) 

ATL model checking is P -complete, and can be 

done in time linear in the size of the model 

and the length of the formula.



So, let's model-check! 

Not as easy as it seems.



Agent Planning with ATL 

Automated verification of ATL properties: 

Can the agents do it? 

Planning: How can they do it? 

Planning as model checking: CTL 

(Giunchiglia and Traverso, 1999). 

Agent planning with ATL: even more 

natural.



function plan(ϕ). 

Returns a subset of Q for which formula ϕ holds, together with a (conditional) 

plan to achieve ϕ. The plan is sought within the context of concurrent 

game structure S = 〈Agt, Q, Π, π, o〉. 

case ϕ ∈ Π : return {〈q, −〉 | ϕ ∈ π(q)} 

case ϕ = ¬ψ : P 1 := plan(ψ); 

return {〈q, −〉 | q /∈ states(P 1 )} 

case ϕ = ψ 1 ∨ ψ 2 : 

P 1 := plan(ψ 1 ); P 2 := plan(ψ 2 ); 

return {〈q, −〉 | q ∈ states(P 1 ) ∪ states(P 2 )} 

case ϕ = 〈〈A〉〉 ❢ ψ : return pre(A, states(plan(ψ))) 

case ϕ = 〈〈A〉〉□ψ : 

P 1 := plan(true); P 2 := plan(ψ); Q 3 := states(P 2 ); 

while states(P 1 ) ⊈ states(P 2 ) 

do P 1 := P 2 | states(P1 ); P 2 := pre(A, states(P 1 ))| Q3 od; 

return P 2 | states(P1 ) 

case ϕ = 〈〈A〉〉ψ 1 U ψ 2 : 

P 1 := ∅; Q 3 := states(plan(ψ 1 )); P 2 := plan(true)| states(plan(ψ2 )); 

while states(P 2 ) ⊈ states(P 1 ) 

do P 1 := P 1 ⊕ P 2 ; P 2 := pre(A, states(P 1 ))| Q3 od; 

return P 1 

end case



Auxiliary functions: 

Weakest precondition: 

pre(A, Q 1 ) = {〈q, σ A 〉 | ∀σ Σ\A δ(q, σ A , σ Σ\A ) ∈ Q 1 } 

States for which plan P is defined: 

states(P ) = {q ∈ Q | ∃ σ 〈q, σ〉 ∈ P } 

Extending plan P 1 with P 2 : 

P 1 ⊕ P 2 = P 1 ∪ {〈q, σ〉 ∈ P 2 | q /∈ states(P 1 )} 

P restricted to the states from Q 1 : 

P | Q1 = {〈q, σ〉 ∈ P | q ∈ Q 1 }




< nop 

1,nop 2,fuel> 

< nop 

1,nop 2,nop3> 

< load 


< unload 

< load 


1,nop 2,fuel> 

< unload 


< nop 

< unload 



1 2 3 

< load 

1,nop 2,nop3> 

< nop < load 



1 2 3 

roL 

nofuel 

1 caL 

roL 2 

roP 

roP 



5 roL 

roL 6 

roP 

roP 


caR 


roL 

roL 

roP 

roP 



caP 12



plan(〈〈1〉〉♦caP )={ 〈9, −〉, 〈10, −〉, 〈11, −〉, 〈12, −〉 } 

plan(〈〈1, 2〉〉♦caP )={ 〈2, 1:load·2:nop〉, 〈6, 1:move·2:nop〉, 

〈7, 1:unload·2:unload〉, 

〈8, 1:unload·2:unload〉, 

〈9, −〉, 〈10, −〉, 〈11, −〉, 〈12, −〉 } 

plan(〈〈1, 2, 3〉〉♦caP )={ 〈1, 1:load·3:load〉, 〈2, 1:load·3:load〉, 

〈3, 1:nop·3:fuel〉, 〈4, 1:move·3:nop〉, 

〈5, 1:load·3:fuel〉, 〈6, 1:move·3:nop〉, 

〈7, 1:unload·3:nop〉, 〈8, 1:unload·3:nop〉, 

〈9, −〉, 〈10, −〉, 〈11, −〉, 〈12, −〉 }



Complexity of Model Checking CTL and ATL 

Nice results: model checking CTL and ATL is 

tractable. 

But: the result is relative to the size of the 

model and the formula 

Well known catch (CTL): size of models is 

exponential wrt a higher-level description 

Another problem: transitions are labelled 

So: the number of transitions can be 

exponential in the number of agents.



3 agents/attributes, 12 states, 216 transitions 


< nop 

1,nop 2,fuel> 

< nop 

1,nop 2,nop3> 

< load 


< unload 

< load 


1,nop 2,fuel> 

< unload 


< nop 

< unload 



1 2 3 

< load 

1,nop 2,nop3> 

< nop < load 



1 2 3 

roL 

nofuel 

1 caL 

roL 2 

roP 

roP 



5 roL 

roL 6 

roP 

roP 


caR 


roL 

roL 

roP 

roP 



caP 12



Proposition (Jamroga & Dix 2005) 

ATL model checking is Σ P 2 -complete with respect 

to the number of states and agents.



Complexity Classes Σ P 2 , ∆ P 2 , ∆ P 3 

Σ P i 

: problems solvable in polynomial time 

by a non-deterministic Turing machine 

making adaptive queries to a Σ P i−1 oracle 

Σ P 2 = NP NP : problems solvable in polynomial 

time by a non-deterministic Turing machine 

making adaptive queries to an NP oracle 

∆ P 2 = P NP : A problem is in ∆ P 2 = P NP if it can 

be solved in deterministic polynomial time 

with subcalls to an NP-oracle. We also have 

∆ P 3 := P [NPNP] and ∆ P 1 = P.



Summary of Complexity Results 

m, l n, k, l n local , k, l 

CTL P-compl. [1] P-compl. [1] PSPACE-compl. [2] 

ATL P-compl. [3] Σ P 2 -compl. [4] EXPTIME-compl. [7] 

Pos-ATL ir NP-compl. [5,6] Σ P 2 -compl. [6] ? 

ATL ir ∆ P 2 -compl. [8] ∆ P 3 -compl. [8] ? 

[1] Clarke, Emerson & Sistla (1986), Automatic verification of finite-state concurrent 

systems using temporal logic specifications. 

[2] Kupferman, Vardi & Wolper (2000), An automata-theoretic approach to 

branching-time model checking. 

[3] Alur, Henzinger & Kupferman (2002) Alternating-time Temporal Logic. 

[4] Jamroga & Dix (2005), Do agents make model checking explode? 

[5] Schobbens (2004), Alternating-time logic with imperfect recall. 

[6] Jamroga & Dix (2005), Model checking strategic abilities of agents under 

incomplete information. 

[7] Van der Hoek, Lomuscio & Wooldridge (2006), On the complexity of practical ATL 

model checking. 

[8] Jamroga & Dix (2007), Model checking abilities of agents: A closer look.


8. Action/Time as Modalities 6. References 

8.6 References



[aqvist84] Åqvist, L. (1984). 


In D. M. Gabbay and F. Guenther (Eds.), Handbook of 

Philosophical Logic, Vol II, pp. 605–714. Reidel. 

[Alur et al. 2002] R. Alur, T. A. Henzinger, and 

O. Kupferman. 

Alternating-time Temporal Logic. 

Journal of the ACM, 49:672–713, 2002.



[Broersen et al., 2001a] Broersen, J., Dastani, M., Huang, 

Z., and van der Torre, L. (2001a). 

The BOID architecture: conflicts between beliefs, 

obligations, intentions and desires. 

In Müller, J., Andre, E., Sen, S., and Frasson, C., 

editors, Proceedings of the Fifth International 

Conference on Autonomous Agents, pages 9–16. ACM 

Press. 

[Cohen and Levesque, 1990] Cohen, P. and Levesque, H. 

(1990). 

Intention is choice with commitment. 

Artificial Intelligence, 42:213–261.



[Emerson 1990] E. A. Emerson. 

Temporal and modal logic. 

Handbook of Theoretical Computer Science, volume B, 

995–1072. Elsevier, 1990. 

[Fagin et al. 1995] Fagin, R., Halpern, J. Y., Moses, Y. & 

Vardi, M. Y. 

Reasoning about Knowledge. 

MIT Press: Cambridge, MA.



[GiunchigliaTraverso00] Fausto Giunchiglia and Paolo 

Traverso. 

In Susanne Biundo and Maria Fox (Eds.) Recent 

Advances in AI Planning, 5th European Conference on 

Planning, ECP’99, Durham, UK, September 8-10, 1999, 

Proceedings, 1–20, LNCS 1809, Springer 2000. 





Rationality, pp. 20–29.



[Huth00] Huth, M. & Ryan, M. 

Logic in Computer Science: Modeling and reasoning 

about systems. 

Cambridge University Press. 

[Jamroga, 2004] Jamroga, W. (2004). 

Strategic planning through model checking of ATL 

formulae. 

In et al., L. R., editor, Artificial Intelligence and Soft 

Computing: Proceedings of ICAISC’04, volume 3070 of 

Lecture Notes in Computer Science, pages 879–884. 

Springer Verlag.




Model Checking Strategic Abilities of Agents under 

Incomplete Information. 

In Mario Coppo, Elena Lodi and G. Michele Pinna 

(Eds.), Proceedings of the Italian Conference on 

Theoretical Computer Science (ICTCS ’05), pages 

295–308. LNCS 3701. Springer, 2005.




Do Agents Make Model Checking Explode 

(Computationally)? 

M. Pe(chouc(ek and P. Petta and L.Z. Varga (Eds.), 

Proceedings of the 4th International Central and 

Eastern European Conference on Multi-Agent 

Systems (CEEMAS ’05), pages 398-407. LNCS 3690. 

Springer, 2005. 

[Jamroga/Dix3] Wojtek Jamroga and Jürgen Dix. (2007) 

Model Checking Abilities of Agents: A closer look. 

Journal of Computing Systems, to appear.



[kripke63b] Kripke, S. (1963a). 

Semantical analysis of modal logic I. Normal 

propositional calculi. 

Zeitschrift fur math. Logik und Grundlagen der 

Mathematik 9, 67–96. 

[kripke63a] Kripke, S. (1963b). 

Semantical considerations on modal logic. 

Acta Philosophica Fennica 16, 83–94.



[kripke65] Kripke, S. (1965). 

Semantical analysis of modal logic II. Non-normal 

modal propositional calculi. 

In Addison, Henkin, and Tarski (Eds.), The theory of 

models, Amsterdam, North-Holland, pp. 206–220. 

[Pistore and Traverso, 2001] Pistore, M. and Traverso, P. 

(2001). 

Planning as model checking for extended goals in 

non-deterministic domains. 

In Proceedings of IJCAI, pages 479–486.




(1991). 



Proceedings of the International Conference on 

Knowledge Representation and Reasoning, Cambridge, 

MA, pp. 473–484. Morgan Kaufmann.





Mind 60, 1–15. 


58–74. Routledge and Kegan Paul, 1957. 

[Wooldridge, 2000] Wooldridge, M. (2000). 


MIT Press : Cambridge, Mass.


9. Agents based on FOL 

Chapter 9. Agents based on FOL 

Agents based on FOL 

9.1 First Order Logic (FOL) 

9.2 Situation Calculus 

9.3 Successor State Axioms 

9.4 (Con-)Golog 

9.5 (Concurrent-) MetaTeM 

9.6 Comparison 

9.7 References



We are dealing with deductive reasoning agents. While 

the first subsections give a general introduction (from 

propositional logic to situation calculus, the frame 

problem), the next one, (Con-)Golog, describes a 

programming language based on the situation calculus. 

We also present MetaTeM, a first-order modal logic for 

expressing theories that can be directly executed. We end 

this chapter with a simple application and a comparison 

of how it is handled by different frameworks we have 

introduced.



Symbolic AI: Symbolic representation, 

e.g. sentential or first order logic. 

Using deduction. Agent as a theorem 

prover. 

Traditional: Theory about agents. 

Implementation as stepwise process 

(Software Engineering) over many 

abstractions. 

Symbolic AI: View the theory itself as 

executable specification. 

Internal state: Knowledge Base (KB), 

often simply called D (database).



Our running example for the situation calculus is the 

Wumpus-World (as in propositional logic). You’ll have to 

model the vacuum world in a similar way. 

4 

Stench 

Breeze 

PIT 

3 

Breeze 

Stench 

PIT 

Breeze 

Gold 

2 

Stench 

Breeze 

1 

Breeze 

PIT 

Breeze 

START 

1 2 3 4

¤ 

¢ 

£ 

¡ 

¡ 

¢ 

¤ 

£ 

¡ 

£ 

¤ 


¤ 

£ 

¡ 

¢ 

¢ 

£ 

¡ 

¡ 

¢ 

¡ 


1,4 2,4 3,4 4,4 

1,3 2,3 3,3 4,3 

1,2 2,2 3,2 4,2 

OK 

1,1 2,1 3,1 4,1 

A 

OK 

OK 

¥ 

(a) 

1,4 2,4 3,4 4,4 

1,3 2,3 3,3 4,3 

W! 

1,2 

A 

2,2 3,2 4,2 

S 

OK OK 

1,1 2,1 3,1 4,1 

B P! 

£ 

V V 

OK OK 

¥ 

(a) 

A = Agent 

B = Breeze 



P = Pit 

S = Stench 

V = Visited 

W = Wumpus 

¢ 

A = Agent 

B = Breeze 



P = Pit 

S = Stench 

V = Visited 

W = Wumpus 

1,4 2,4 3,4 4,4 

1,3 2,3 3,3 4,3 

1,2 2,2 3,2 4,2 

P? 

OK 

1,1 2,1 3,1 4,1 

A P? 

¤ 

V B 

OK OK 

¥ 

(b) 

1,4 2,4 3,4 4,4 

P? 

1,3 

W! 

2,3 

A 

3,3 

P? 

4,3 

S G 

B 

1,2 2,2 3,2 4,2 

S 

£ 

V V 

OK OK 

1,1 2,1 3,1 4,1 

B P! 

£ 

V V 

OK OK 

(b)


9. Agents based on FOL 1. First Order Logic (FOL) 

9.1 First Order Logic (FOL)


Definition 9.1 (First order logic L F OL ,L ⊆ L F OL ) 

The language L F OL of first order logic (Praedikatenlogik 

erster Stufe) is defined as follows: 

x, y, z, x 1 , x 2 , . . . , x n , . . .: a countable set Var of 

variables. 

for each k ∈ N: P k 1 , P k 2 , . . . , P k n , . . . a countable set 

Pred k of k-dimensional predicate symbols (the 

0-dimensional predicate symbols are the propositional 

logic constants from At of L P L . We suppose that □ 

and ⊤ are available.) 

for each k ∈ N: f k 1 , f k 2 , . . . , f k n, . . . a countable set 

Funct k of k-dimensional function symbols. 

¬, ∧, ∨, →: the sentential connectives. 

(, ): parentheses. 

∀, ∃: quantifiers. 





The 0-dimensional function symbols are called 

individuum constants – we suppress the parentheses. In 

general we will need – as in propositional logic – only a 

certain subset of the predicate or function symbols. 

These define a language L ⊆ L F OL (analogously to 

definition 2 on page 2). The used set of predicate and 

function symbols is also called signature Σ.



The concept of an L-term t and an L-formula ϕ are 

defined inductively: 

Term: L-terms t are defined as follows: 

1 each variable is a L-term. 

2 if f k is a k-dimensional function symbol from L and t 1 , 

. . . ,t k are L-terms, then f k (t 1 , . . . , t k ) is a L-Term. 

The set of all L-terms that one can create from the set 

X ⊆ Var is called T erm L (X) or T erm Σ (X). Using X = ∅ 

we get the set of basic terms T erm L (∅), short: T erm L . 




Formula: L-formulae ϕ are also defined inductively: 

1 if P k is a k-dimensional predicate symbol from L and 

t 1 , . . . ,t k are L-terms then P k (t 1 , . . . , t k ) is an L-formula 

2 for all L-formulae ϕ is (¬ϕ) a L-formula 

3 for all L-formulae ϕ and ψ are (ϕ ∧ ψ) and (ϕ ∨ ψ) 

L-formulae. 

4 if x is a variable and ϕ a L-formula then are (∃ x ϕ) and 

(∀ x ϕ) L-formulae. 





Atomic L-formulae are those which are composed 

according to 1., we call them At L (X) (X ⊆ VAR). The set 

of all L-formulae with respect to X is called F ml L (X). 

Positive formulae (F ml + L 

(X)) are those which are 

composed using only 1, 3. and 4. 

If ϕ is a L-formula and is part of an other L-formula ψ then 

ϕ is called sub-formula of ψ.



An illustrating example 

Example 9.2 (From semigroups to rings) 

We consider L = {0, 1, +, ·, ≤, =}, where 0, 1 are constants, 

+, · binary operations and ≤, = binary relations. What can 

one express in this language? 

Ax 1: ∀x∀y∀z x + (y + z) = (x + y) + z 

Ax 2: ∀x (x + 0 = 0 + x) ∧ (0 + x = x) 

Ax 3: ∀x∃y (x + y = 0) ∧ (y + x = 0) 

Ax 4: ∀x∀y x + y = y + x 

Ax 5: ∀x∀y∀z x · (y · z) = (x · y) · z 

Ax 6: ∀x∀y∀z x · (y + z) = x · y + x · z 

Ax 7: ∀x∀y∀z (y + z) · x = y · x + z · x 

Axiom 1 describes an semigroup, the axioms 1-2 describe a 

monoid, the axioms 1-3 a group, and the axioms 1-7 a ring.



Definition 9.3 (L-structure A = (U A , I A )) 

A L-structure or a L-interpretation is a pair 

A = def (U A , I A ) with U A being an arbitrary non-empty set, 

which is called the basic set (the universe or the 

individuum range) of A. Further I A is a mapping which 

assigns to each k-dimensional predicate symbol P k in 

L a k-dimensional predicate over U A 

assigns to each k-dimensional function symbol f k in L 

a k-dimensional function on U A 

In other words: the domain of I A is exactly the set of 

predicate and function symbols of L.




The range of I A consists of the predicates and functions on 

U A . We write: 

I A (P ) = P A , I A (f) = f A . 

ϕ be a L 1 -formula and A = def (U A , I A ) a L-structure. A is 

called matching with ϕ if I A is defined for all predicate and 

function symbols which appear in ϕ, i.e. if L 1 ⊆ L.



Definition 9.4 (Variable assignment ϱ) 

A variable assignment ϱ over a L-structure 

A = (U A , I A ) is a function 

ϱ : Var → U A ; x ↦→ ϱ(x).



Definition 9.5 (Semantics of first order logic) 

ϕ be a formula, A a structure matching with ϕ and ϱ a 

variable assignment over A. For each term t, which can be 

built from components of ϕ, we define inductively the 

value of t in the structure A, we call A(t). 

1 for a variable x is A(x) = def ϱ(x). 

2 if t has the form t = f k (t 1 , . . . , t k ), with t 1 , . . . , t k being 

terms and f k a k-dimensional function symbol, then 

A(t) = def f A (A(t 1 ), . . . , A(t k )).




Now we define inductively the logical value of a formula 

ϕ in A: 

1. if ϕ = def P k (t 1 , . . . , t k ) with the terms t 1 , . . . , t k and the 

k-dimensional predicate symbol P k , then 

{ 

true, if (A(t1 ), . . . , A(t 

A(ϕ) = k )) ∈ P A , 

def 

false, else. 

2. if ϕ = def ¬ψ, then 

{ 

true, 

A(ϕ) = def 

false, 

3. if ϕ = def (ψ ∧ η), then 

{ 

true, 

A(ϕ) = def 

false, 

if A(ψ) = false, 

else. 

if A(ψ) = true and A(η) = true, 

else.




4. if ϕ = def (ψ ∨ η), then 

{ 

true, 

A(ϕ) = def 

false, 

5. if ϕ = def ∀ x ψ, then 

{ 

true, 

A(ϕ) == def 

false, 

6. if ϕ = def ∃ x ψ, then 

{ 

true, 

A(ϕ) = def 

false, 

if A(ψ) = true or A(η) = true, 

else. 

if ∀ d ∈ UA : A [x/d] (ψ) = true, 

else. 

if ∃d ∈ UA : A [x/d] (ψ) = true, 

else. 

In the cases 5. and 6. the notation [x/d] was used. For 

d ∈ U A be A [x/d] the structure A ′ , which is identical with A 

except for the definition of x A′ : x A′ = def d (independently 

of the fact that I A is defined for x or not).




We write: 

A |= ϕ[ϱ] for A(ϕ) = true: A is a model for ϕ with 

respect to ϱ. 

If ϕ does not contain free variables, then A |= ϕ[ϱ] is 

independently from ϱ. We simply suppress ϱ. 

If there is at least one model for ϕ , then ϕ is called 

satisfiable or consistent.


Definition 9.6 (Tautology) 

1 A theory is a set of formulae without free 

variables: T ⊆ F ml L . A satisfies T if A |= ϕ 

holds for all ϕ ∈ T . We write A |= T . 

2 A L-formula ϕ is called L-tautology, if for all 

matching L-structures A the following holds: 

A |= ϕ. 

From now on we mostly suppress the language 

L, because it is clearly obvious from the context. 

Nevertheless it has to be defined always. 




Definition 9.7 (Consequence set Cn(T )) 

A formula ϕ results from T , if for all 

structures A with A |= T also A |= ϕ holds. 

Therefore we write: T |= ϕ. 

We denote with 

Cn L (T ) = def {ϕ ∈ F ml L : T |= ϕ}, 

or simply Cn(T ), the semantic consequence 

operator.



Lemma 9.8 (Properties of Cn(T )) 

The semantic consequence operator has the 

following properties 

1 T -extension: T ⊆ Cn(T ), 

2 Monotony: T ⊆ T ′ ⇒ Cn(T ) ⊆ Cn(T ′ ), 

3 Closure: Cn(Cn(T )) = Cn(T ). 

Lemma 9.9 (ϕ ∉ Cn(T)) 

ϕ ∉ Cn(T ) if and only if there is a structure A 

with A |= T and A |= ¬ϕ.



Definition 9.10 (MOD(T ), Cn(U)) 

If T ⊆ F ml L , then we denote by MOD(T ) the set 

of all L-structures A which are models of T : 

MOD(T ) = def {A : A |= T }. 

If U is a set of structures then we can consider all 

sentences, which are true in all structures. We 

call this set also Cn(U): 

Cn(U) = def {ϕ ∈ F ml L : ∀A ∈ U : A |= ϕ}. 

MOD is obviously dual to Cn: 

Cn(MOD(T )) = Cn(T ), MOD(Cn(T )) = MOD(T ).



Definition 9.11 (Completeness of a theory T ) 

T is called complete, if for each formula ϕ ∈ F ml L : T |= ϕ 

or T |= ¬ϕ holds. 

Attention: 

Do not mix up this last condition with the property of a 

valuation v (or a model): each valuation is complete in the 

above sense. 

Lemma 9.12 (Ex Falso Quodlibet) 

T is consistent if and only if Cn(T ) ≠ F ml L .



An illustrating example 

Example 9.13 (Natural numbers in different languages) 

N P r = (N, 0 N , + N , = N ) („Presburger Arithmetic”), 

N P A = (N, 0 N , + N , ·N , = N ) („Peano Arithmetic”), 

N P A ′ = (N, 0 N , 1 N , + N , ·N , = N ) (variant of N P A ). 

These sets each define the natural numbers, but in 

different languages. 

Question: 

If the language bigger is bigger then we can express more. 

Is L P A ′ more expressive then L P A ?



Answer: 

No, because one can replace the 1 N by a 

L P A -formula: there is a L P A -formula φ(x) so that 

for each variable assignment ρ the following 

holds: 

N P A 

′ |= ρ φ(x) if and only if ρ(x) = 1 N 

Thus we can define a macro for defining the 1. 

Each formula of L P A 

′ can be transformed into an 

equivalent formula of L P A .



Question: 

Is L P A perhaps more expressive than L P r , or can 

the multiplication be defined somehow? 

We will see later that L P A is indeed more 

expressive: 

the set of sentences valid in N P r is decidable 

whereas 

the set of sentences valid in N P A is not even 

recursively enumerable.



We have already defined a complete and correct calculus 

for propositional logic L P L . Such calculi exist also for first 

order logic L F OL . 

Theorem 9.14 (Completeness of first order logic) 

A formula follows semantically from a theory T if it can be 

derived: 

T ⊢ ϕ if and only if T |= ϕ 

Theorem 9.15 (Compactness of first order logic) 

A formula can be derived from a theory T if it can be derived 

from a finite subset of T : 

Cn(T ) = ⋃ {Cn(T ′ ) : T ′ ⊆ T, T ′ finite}.


9. Agents based on FOL 2. Situation Calculus 

9.2 Situation Calculus



Question: 

How do we axiomatize the Wumpus-world in 

FOL? 

function KB-AGENT( percept) returns an action 

static: KB, a knowledge base 

t, a counter, initially 0, indicating time 

TELL(KB, MAKE-PERCEPT-SENTENCE( percept, t)) 

action ASK(KB, MAKE-ACTION-QUERY(t)) 

TELL(KB, MAKE-ACTION-SENTENCE(action, t)) 

t t + 1 

return action



Idea: 

In order to describe actions or their effects consistently we 

consider the world as a sequence of situations (snapshots 

od the world). Therefore we have to extend each predicate 

by an additional argument. 

We use the function symbol 

do(action, situation) 

as the term for the situation which emerges when the 

action action is executed in the situation situation. 

Actions: T urn_right, T urn_left, F oreward, Shoot, Grab, 

Release, Climb.



PIT 

Gold 

PIT 

PIT 

Gold 

PIT 

PIT 

Gold 

PIT 

PIT 

Gold 

PIT 

PIT 

PIT 

PIT 

S 3 

Forward 

S 02 

Turn (Right) 

S 1 

PIT 

Forward 

S 0



We also need a memory, a predicate 

At(person, location, situation) 

with person being either W umpus or Agent and location 

being the actual position (stored as pair [i,j]). 

Important axioms are the so called successor-state 

axioms, they describe how actions effect situations. The 

most general form of these axioms is 

true afterwards ⇐⇒ an action made it true 

or it is already true and 

no action made it false



Axioms about At(p, l, s): 

At(p, l, do(F orward, s))↔((l . = location_ahead(p, s) ∧ ¬W all(l)) 

At(p, l, s) →Location_ahead(p, s) . = 

Location_toward(l, Orient.(p, s)) 

W all([x, y]) ↔(x . = 0 ∨ x . = 5 ∨ y . = 0 ∨ y . = 5)



Location_toward([x, y], 0) . = [x + 1, y] 

Location_toward([x, y], 90) . = [x, y + 1] 

Location_toward([x, y], 180) . = [x − 1, y] 

Location_toward([x, y], 270) . = [x, y − 1] 

Orient.(Agent, s 0 ) . = 90 

Orient.(p, do(a, s)) . = d ↔ 

((a . = turn_right ∧ d . = mod(Orient.(p, s) − 90, 360)) 

∨(a . = turn_left ∧ d . = mod(Orient.(p, s) + 90, 360)) 

∨(Orient.(p, s) . = d ∧ ¬(a . = T urn_right ∨ a . = T urn_left)) 

mod(x, y) is the implemented “modulo”-function, 

assigning a distinct value between 0 and y to each variable 

x.



Axioms about percepts, useful new notions: 

P ercept([Stench, b, g, u, c], s) → Stench(s) 

P ercept([a, Breeze, g, u, c], s) → Breeze(s) 

P ercept([a, b, Glitter, u, c], s) → At_Gold(s) 

P ercept([a, b, g, Bump, c], s) → At_W all(s) 

P ercept([a, b, g, u, Scream], s) → W umpus_dead(s) 

At(Agent, l, s) ∧ Breeze(s) → Breezy(l) 

At(Agent, l, s) ∧ Stench(s) → Smelly(l)



Adjacent(l 1 , l 2 ) ↔ ∃ d l 1 

. = Location_toward(l2 , d) 

Smelly(l 1 ) → ∃l 2 At(W umpus, l 2 , s)∧ 

(l 2 

. = l1 ∨ Adjacent(l 1 , l 2 )) 

P ercept([none, none, g, u, c], s) ∧ At(Agent, x, s) ∧ Adjacent(x, y) 

→ OK(y) 

(¬At(W umpus, x, t) ∧ ¬P it(x)) 

→ OK(y) 

At(W umpus, l 1 , s) ∧ Adjacent(l 1 , l 2 ) 

→ Smelly(l 2 ) 

At(P it, l 1 , s) ∧ Adjacent(l 1 , l 2 ) 

→ Breezy(l 2 )



Axioms to describe actions: 

Holding(Gold, do(Grab, s)) ↔ At_Gold(s)∨ 

Holding(Gold, s) 

Holding(Gold, do(Release, s)) ↔ □ 

Holding(Gold, do(T urn_right, s)) ↔ Holding(Gold, s) 

Holding(Gold, do(T urn_left, s)) ↔ Holding(Gold, s) 

Holding(Gold, do(F orward, s)) ↔ Holding(Gold, s) 

Holding(Gold, do(Climb, s)) ↔ Holding(Gold, s) 

Each effect must be described carefully.



Axioms describing preferences among actions: 

Great(a, s) → Action(a, s) 

(Good(a, s) ∧ ¬∃b Great(b, s)) → Action(a, s) 

(Medium(a, s) ∧ ¬∃b (Great(b, s) ∨ Good(b, s))) → Action(a, s) 

(Risky(a, s) ∧ ¬∃b (Great(b, s) ∨ Good(b, s) ∨ Medium(a, s))) 

→ Action(a, s) 

At(Agent, [1, 1], s) ∧ Holding(Gold, s) → Great(Climb, s) 

At_Gold(s) ∧ ¬Holding(Gold, s) → Great(Grab, s) 

At(Agent, l, s) ∧ ¬V isited(Location_ahead(Agent, s))∧ 

∧OK(Location_ahead(Agent, s)) → Good(F orward, s) 

V isited(l) ↔ ∃s At(Agent, l, s) 

The goal is not only to find the gold but also to return 

safely. We need the additional axioms 

Holding(Gold, s) → Go_back(s) etc.



Important problems of knowledge 

representation 

There are three very important representation-problems 

concerning the axiomatization of a changing world: 

Frame problem: most actions change only little – we 

need many actions to describe invariant 

properties. 

It would be ideal to axiomatize only what does 

not change and to add a proposition like 

“nothing else changes”.



Qualification problem: it is necessary to enumerate all 

conditions under which an action is executed 

successfully. For example: 

∀x (Bird(x) ∧¬P enguin(x) ∧ ¬Dead(x)∧ 

∧¬Ostrich(x) ∧ ¬BrokenW ing(x)∧ 

∧ . . . 

−→ F lies(x) 

It would be ideal to just say “birds normally 

fly”. 

Ramification problem: How should we handle implicit 

consequences of actions? For example 

Grab(Gold): Gold can be contaminated. Then 

Grab(Gold) is not optimal.



Programming versus knowledge engineering 

programming knowledge engineering 

choose language choose logic 

write program define knowledge base 

write compiler implement calculus 

run program derive new facts


9. Agents based on FOL 3. Successor State Axioms 

9.3 Successor State Axioms



A Solution to the Frame Problem? 

Where do the successor state axioms come 

from? 

We have to ask: Which fluents stay 

invariant? 

relational fluent: 

¬broken(x, s) ∧ (x ≠ y ∨ ¬fragile(x, s)) −→ 

¬broken(x, do(drop(r, y), s)) 

functional fluent: color(x, s) . = c −→ 

color(x, do(drop(r, y), s)) . = c 

How many of such axioms do we need?



We need exactly 

2 × #actions × #fluents 

Suppose we are given axioms of the form 

. . . −→ fluent(x, do(action, s)) 

. . . −→ ¬fluent(x, do(action, s)), 

How can we compute the successor state axioms 

automatically? 

Note, that the above set assumes implicitly that all actions 

can be applied: this is an overly optimistic assumption 

according to the Qualification Problem.



Qualification Problem Revisited 

We assume a predicate Poss to describe the 

possibility to apply an action. 

Poss(pickup(r, x), s) → ∀z ( ¬holding(r, z, s) ). 

But → is too weak. Can we replace it by ↔?



Actions have preconditions: necessary and sufficient 

conditions which must hold before an action can be 

executed. 

A robot r can lift object x in situation s iff r is not 

holding anything, is located next to x, and x is not 

too heavy. 

The effect on Broken when r drops an object (positive 

effect axiom): 

Poss(drop(r, x), s) ∧ F ragile(x) → 

Broken(x, do(drop(r, x), s)). 

Repairing an object causes the object not to be broken 

any more (negative effect axiom): 

Poss(repair(r, x), s) → ¬Broken(x, do(repair(r, x), s)).



What about 

Poss(pickup(r, x), s) → ∀z ( ¬holding(r, z, s) ∧ ¬heavy(x) ∧ 

nextto(r, x, s) ). 

We suppose we are given a list of axioms of the 

form 

Poss(action(⃗x), s) ←→ φ action (⃗x, s) 

where φ action (⃗x, s) does not contain any 

do-terms.



(1) : fragile(x, s) −→ broken(x, do(drop(r, x), s)) 

(1 ′ ) : nextto(b, x, s) −→ broken(x, do(explode(b), s)) 

(2) : −→ ¬broken(x, do(repair(r, x), s)) 

We assume these are all possibilities for broken, ¬broken. 

Then (1), (1 ′ ) are equivalent to 

∃r (a . = drop(r, x) ∧ fragile(x, s))∨ 

∃b (a . = explode(b) ∧ nextto(b, x, s)) 

−→ 

broken(x, do(a, s)). 

(2) is equivalent to 

∃r a . = repair(r, x) −→ ¬broken(x, do(a, s))



Under which conditions could ¬broken(x, s) and 

broken(x, do(a, s)) be both true? 

(1 ′′ ) : ¬broken(x, s) ∧ broken(x, do(a, s)) −→ 

∃r (a . = drop(r, x) ∧ fragile(x, s)∨ 

∃b (a . = explode(b) ∧ nextto(b, x, s)) 

(2 ′ ) : broken(x, s) ∧ ¬broken(x, do(a, s)) −→ 

∃r a . = repair(r, x)



(1), (1 ′ ), (2), (1 ′′ ), (2 ′ ) are equivalent to the 

successor state axiom 

broken(x, do(a, s)) 

←→ 

∃r (a . = drop(r, x) ∧ fragile(x, s)) ∨ 

∃a (b . = explode(b) ∧ nextto(b, x, s)) ∨ 

broken(x, s) ∧ ¬∃r a . = repair(r, x) 

This can be generalised, also for functional 

fluents!



Thus the 2 × #actions × #fluents many axioms 

can be rewritten into only 

#fluents 

many axioms (2 × #fluents if we count each 

equivalence twice).But we also need the Poss 

axioms: another #actions many. 

Altogether, the 2 × #actions × #fluents are 

compiled into (modulo a constant factor) 

#actions+#fluents. 

Some people call this a solution to the frame 

problem.


9. Agents based on FOL 4. (Con-)Golog 

9.4 (Con-)Golog



From now on we use the symbol 

⊃ to denote implication → and 

≡ to denote equivalence ↔. 

Firstly we have to give a precise definition as to 

how the domain we are dealing with is 

described: the effects of actions, how to denote 

and distinguish actions, the initial state.



Definition 9.16 (Domain Descriptions (Sit. Calc.)) 

A domain description D in the situation calculus consists 

of the following: D = D ap ∪ D ss ∪ D una ∪ D S0 where 

D ap = action preconditions 

Poss(A(⃗x), s) ≡ Π A (⃗x, s) 

D ss = successor state axioms 

Poss(a, s) ⊃ [F (⃗x, do(a, s)) ≡ Φ F (⃗x, a, s)] 

D una = unique names for actions: 

A(⃗x) ≠ B(⃗y) for distinct A and B, 

A(⃗u) . = A(⃗v) ⊃ ⃗u . = ⃗v. 

D S0 = sentences describing the initial state, where S 0 is 

the only situation term (there are no occurrences of 

do(_, _).)



Planning in the Situation Calculus 

Let G(s) be a goal. Then we are interested in: 

D |= ∃s G(s), 

and leave this task to a theorem prover. 

As a side effect of the derivation we can (often) read 

off the correct action sequence (a plan) from the term 

bound by s 

s . = do(α n , do(α n−1 , . . . do(α 1 , S 0 ) . . .)). 

Not quite right: this allows action sequences that are not 

executable. For example, if G is a tautology, then any 

action sequence will do. 

Why is that a bit too simple?



It allows action sequences that are not executable: If G 

is a tautology, then any action sequence will do. 

It also allows actions sequences of the form 

a, a −1 , b, b −1 , . . .. 

Definition 9.17 (Plan) 

1 We define a predicate Legal(s) which is true if every 

action in the action sequence leading to s is 

executable: 

Legal(s) ≡ s . = S 0 ∨ ∃a∃s ′ [s . = do(a, s ′ ) ∧ Poss(a, s ′ ) ∧ Legal(s ′ )] 

2 A plan is a sequence of ground action terms 

α 1 , α 2 , . . . , α n s.t. 

D |= ∃s[G(s) ∧ Legal(s) ∧ s . = do(α n , do(α n−1 , . . . do(α 1 , S 0 ) . . .))]



Question: How to find a suitable s? 

First approach: Regression planning, that is 

planning backwards from the goal to 

the initial state. 

Second approach: Golog, a programming 

language for the situation calculus.



We define a regression operator R which transforms a 

formula of the situation calculus and reduces the nesting 

of do(α, do(. . .)) terms by 1. 

Definition 9.18 (Regression operator R) 

1 If A is not a fluent (incl. . =) or A = Poss(a, s), 

then R[A] = A 

2 If F is a fluent and s a situation variable or 

s = S 0 , then R[F (⃗t, s)] = F (⃗t, s). 

3 If F is a fluent with successor state axiom 

Poss(a, s) ⊃ [F (x 1 , . . . , x n , do(a, s)) ≡ Φ F ], 

then R[F (t 1 , . . . , t n , do(α, σ))] = Φ F | x 1,...,x n ,a,s 

t 1 ,...,t n ,α,σ .



1 Let W , W 1 , and W 2 be arbitrary formulas. 

Then: 

R[¬W ] = ¬R[W ]; 

R[∀v W ] = ∀v R[W ]; 

R[∃v W ] = ∃v R[W ]; 

R[W 1 ∧ W 2 ] = R[W 1 ] ∧ R[W 2 ]; 

R[W 1 ∨ W 2 ] = R[W 1 ] ∨ R[W 2 ]; 

R[W 1 ⊃ W 2 ] = R[W 1 ] ⊃ R[W 2 ]; 

R[W 1 ≡ W 2 ] = R[W 1 ] ≡ R[W 2 ].



Example 9.19 

G = ∀a∀s[P (A, do(a, do(a ′ , s))) ∧ s . = do(B, S 0 ) ⊃ 

∃x[P (x, s) ∧ Poss(B, do(a, s)) ∧ R(x) ∧ Q(do(B, s))]]. 

Let P and Q be fluents, R not a fluent. Let the successor 

state axioms for P and Q be 

Poss(a, s) ⊃ [P (x, do(a, s)) ≡ Φ P (x, a, s)], 

Poss(a, s) ⊃ [Q(do(a, s)) ≡ Φ Q (a, s)]. 

Then (neither Φ P nor Φ Q contain do(_ , _) terms): 

R[G] = ∀a∀s[Φ P (A, a, do(a ′ , s)) ∧ s . = do(B, S 0 ) ⊃ 

∃x[P (x, s) ∧ Poss(B, do(a, s)) ∧ R(x) ∧ Φ Q (B, s)]].



Linear Regression Let α = A(g 1 , . . . , g k ) be a ground 

action with precondition axiom 

Poss(A(x 1 , . . . , x k ), s) ≡ Π A (x 1 , . . . , x k , s). 

Let precond(α, s) be Π A (g 1 , . . . , g k , s). Let φ(s) be a formula 

whose only situation term is the free variable s. (φ(s) may 

contain other free variables, but not of type situation.) 

The regression of φ(s) through α, denoted as R(α, φ(s)), 

is 

R[φ(do(α, s))] ∧ precond(α, s).



Theorem 9.20 (Reiter) 

Let α 1 , . . . , α n be a sequence of ground actions and G(s) a 

wff whose only free variable is s. Then α 1 , . . . , α n is a plan 

for G iff 

D una ∪ D S0 |= R(α 1 , R(α 2 , . . . R(α n−1 , R(α n , G(s))) · · · ))| s S 0 

.



Linear Regression Planning 

Input: a domain description D and goal G(s). 

Output: an action sequence A. 

Initialize A (empty sequence) and Γ 

(G(s)). 

Check whether D una ∪ D S0 |= Γ| s S 0 

. If 

yes, then return A. 

Otherwise 

choose a ground action α; 

insert it at the beginning of A; 

compute the regression of Γ through α; 

assign the result to Γ; 

goto 2.



Comments on Regression Planning 

Because of the previous theorem, the method is both 

correct and complete. 

The algorithm is nondeterministic. It does not say 

which α to choose next. (Can be implemented using 

backtracking.) 

Does not always terminate, for example, if there is no 

plan. The problem whether a plan exists is 

undecidable. 

The algorithm is fairly dumb. For example, if there is 

an action noop with no effects, then this could be 

chosen arbitrarily often, i.e. plans need not be optimal. 

A simple strategy for goal-oriented search: try those 

actions which satisfy one or more conditions which do 

not already follow from the initial situation.



Example 9.21 

Actions: pickup(x), drop(x), walk(x) 

Preconditions D ap : 

Poss(pickup(x), s) ≡ At(x, s) ∧ ∀y¬Holding(y, s) 

Poss(drop(x), s) ≡ Holding(x, s) 

Poss(walk(x), s) ≡ true 

Effects: 

Poss(pickup(x), s) ⊃ Holding(x, do(pickup(x), s)) 

Poss(drop(x), s) ⊃ ¬Holding(x, do(drop(x), s)) 

Poss(drop(x), s) ⊃ Broken(x, do(drop(x), s)) 

Poss(walk(x), s) ⊃ At(x, do(walk(x), s)) 

Poss(walk(x), s) ∧ At(y, s) ∧ y ≠ x ⊃ ¬At(y, do(walk(x), s))



These result in successor state axioms D ss : 

Poss(a, s) ⊃ [Holding(x, do(a, s)) ≡ 

a . = pickup(x) ∨ Holding(x, s) ∧ a ≠ drop(x)] 

Poss(a, s) ⊃ [Broken(x, do(a, s)) ≡ 

a . = drop(x) ∨ Broken(x, s)] 

Poss(a, s) ⊃ [At(x, do(a, s)) ≡ a . = walk(x)∨ 

At(x, s) ∧ ¬∃y (y ≠ x ∧ a . = walk(y))] 

Initial situation D S0 : 

∀x ¬Holding(x, S 0 ), ¬At(A, S 0 ), ¬Broken(A, S 0 ) 

Goal: ∃s Broken(A, s) 

(Break A.)


Planning vs. Programming 

While regression planning works for small 

domains, the search space becomes 

unmanageable in larger, more realistic 

applications. 

We will now consider a more realistic alternative, 

where we allow the user to 

prespecify (program) how to achieve a task 

in as much detail as desired, yet, at the same 

time 

leave some flexibility in choosing appropriate 

actions to the machine. 




Complex Actions 

Complex actions are macros which are ultimately defined 

in terms of primitive actions. This way complex actions 

inherit the solution to the frame problem. 

Consider the problem of clearing the table in a 

blocksworld setting. It would be nice to write something 

like 

while (∃block) ontable(block) do remove_a_block end-while 

proc remove_a_block (Πx)[pickup(x); putaway(x)] end-proc 

Note that in such a formulation, we do not care about the 

situation argument (that must be there in the fluents). We 

want this technical point to be dealt with in the following 

macros (therefore the notation a[s]).



The following operators are considered: 

Sequence: 

First execute A, then B. 

Test actions: 

Is φ true in the current situation? 

If-then-else, While-loops 

Nondeterministic actions: 

Execute A or B. 

Nondeterministic choice of the argument of an 

actions: 

e.g. (Πx)pickup(x): pick up an arbitrary object. 

procedures 

Names for complex actions, recursion. 

With that we obtain a programming language for the 

high-level control of a robot.



Complex Actions in the Situation Calc. (1) 

DO(a, s, s ′ ) stands for: 

The execution of action a in situation s leads 

to situation s ′ . 

Primitive actions: 

DO(a, s, s ′ ) = Poss(a[s], s) ∧ s ′ . = do(a[s], s). 

What does a[s] mean? Suppose a is 

read(favorite_book(kjeld)) and favorite_book 

is a functional fluent. Then a[s] is 

read(favorite_book(kjeld, s)).



Sequence: 

DO([a 1 ; a 2 ], s, s ′ ) = 

∃s ′′ DO(a 1 , s, s ′′ ) ∧ DO(a 2 , s ′′ , s ′ ). 

Nondeterministic actions: 

DO([a 1 |a 2 ], s, s ′ ) = DO(a 1 , s, s ′ )∨DO(a 2 , s, s ′ ) 

Nondeterministic choice: 

DO((Πx)a(x), s, s ′ ) = ∃x DO(a(x), s, s ′ )



Test actions: 

DO(φ?, s, s ′ ) = φ[s] ∧ s . = s ′ 

Here φ stands for a situation calculus formula 

with all situation arguments suppressed. φ[s] 

stands for the arguments restored: if φ is 

∀x (ontable(x) ∧ ¬on(x, A)) then φ[s] is the 

formula 

∀x (ontable(x, s) ∧ ¬on(x, A, s)). 

If φ is ∃x (on(x, favorite_block(nils))) then 

φ[s] is the formula 

∃x (on(x, favorite_block(nils, s), s)).



Complex Actions in the Situation Calc. (2) 

If-then-else: 

DO([φ, a 1 , a 2 ], s, s ′ ) = 

DO([[φ?; a 1 ]|[¬φ?; a 2 ]], s, s ′ ) 

∗ -operator: execute the action n times (n ≥ 0): 

DO(a ∗ , s, s ′ ) = 

∀P [∀s 1 P (s 1 , s 1 ) ∧ ∀s 1 , s 2 , s 3 (P (s 1 , s 2 )∧ 

DO(a, s 2 , s 3 ) ⊃ P (s 1 , s 3 ))] ⊃ P (s, s ′ ) 

While-loops: 

DO(〈φ, a〉, s, s ′ ) = DO(([φ?; a] ∗ ; ¬φ?), s, s ′ )



Procedures: definition also uses ∗ (needed because of 

recursion). 

Here we give just a few examples. We define above(x, y) as 

the transitive closure of on(x, y): 

proc above(x, y) 

(x . = y)? | [Πz on(x, z)? ; above(x, z)] 

end-proc 

We define clean (no arguments) as an action to put away 

all blocks in the box: 

proc clean 

(∀x[ block(x) ⊃ in(x, box)] ? | 

(Πx)∀y ¬on(y, x) ?; put(x, box)] 

clean end-proc 

Note: ∗ -operator needs quantification over predicates (2nd order logic). But 

the semantics of complex actions is completely described within the situation 

calculus.



if Φ then δ 1 else δ 2 end-if is a macro for 

[Φ?;δ 1 ] | [¬Φ?;δ 2 ] 

while Φ do δ end-while is a macro for 

[[Φ?;δ] ∗ ; ¬Φ? ]



Golog: An Agent Oriented Programming 

Language 

Golog = alGOl in LOGic ( Semantics = Situation Calculus) 

It generalizes conventional imperative programming 

languages: 

Procedures in “normal” imperative languages are 

ultimately reduced to machine instructions like 

assignment statements. 

Procedures in Golog actions are reduced to 

primitive actions which refer to actions in the real 

world, like picking up objects, opening doors, 

moving from one room to another, etc. Instead of 

machine states we are interested in world states.



Executing complex actions requires logical 

reasoning about what the world is like and how it 

evolves. 

For example, in order to execute 

while [∃block OnT able(block)] do remove_a_block 

one needs to evaluate the test 

[∃block OnT able(block)] 

in the current situation.



An Example Golog Program 

Action preconditions: 

Poss(pickup(x), s) ≡ ∀z¬Holding(z, s) 

Poss(putontable(x), s) ≡ Holding(x, s) 

Poss(putonfloor(x), s) ≡ Holding(x, s) 

Initial state: 

OnT able(x, S 0 ) ≡ (x . = A) ∨ (x . = B) 

∀x¬Holding(x, S 0 )



Successor State Axioms: 

Poss(a, s) ⊃ [Holding(x, do(a, s)) ≡ 

a . = pickup(x) ∨ Holding(x, s)∧ 

a ≠ putontable(x)∧ 

a ≠ putonfloor(x)] 

Poss(a, s) ⊃ [OnT able(x, do(a, s)) ≡ 

a . = putontable(x)∨ 

OnT able(x, s) ∧ a ≠ pickup(x)] 

Poss(a, s) ⊃ [OnF loor(x, do(a, s)) ≡ 

a . = putonfloor(x)∨ 

OnF loor(x, s) ∧ a ≠ pickup(x)]



Complex actions: 

proc clear_table 

while [∃block OnT able(block)] do 

(Πb)OnT able(b)?; remove_from_table(b). 

proc remove_from_table(block) 

pickup(block); putonfloor(block). 

The evaluation of clear_table: 

Axioms |= ∃s DO(clear_table, S 0 , s). 

The result is a binding of a term (action sequence) to s.



For example: 

s 

. 

= do(putonfloor(B), 

do(pickup(B), 

do(putonfloor(A), 

do(pickup(A), S 0 )))). 

[Note: The order of the blocks is not predetermined by the 

program.] 

In general the evaluation of a Golog procedure results 

in a trace of the primitive actions to be executed.



A simple Forward Planner in Golog 

So far we have seen how Golog allows the user 

to tell the system in detail how to solve a task. 

Golog can also be used to leave it to the system 

to solve a goal in the original sense of planning.



The following procedure wspdf(n) finds plans of 

length at most n by forward search (Reiter et al. 

2000): 

proc wspdf(n) 

goal? | [n > 0?] ; (πa)(primitive_action(a)?; a); 

¬badSituation? ; wspdf(n − 1) 

badSituation(s) is a domain dependent predicate used to 

define when a situation s should be discarded from further 

consideration (important for pruning the search space). 

For simple definitions of badSituation, this can solve 

problems with plans of length around 20.


9. Agents based on FOL 5. (Concurrent-) MetaTeM 

9.5 (Concurrent-) MetaTeM



We present a first-order temporal logic based on 

discrete, linear models with finite past and 

infinite future, called FML [fisherNormalForm]. 

start 

start



Typical temporal operators used are 

○ϕ 

□ϕ 

♦ϕ 

ϕ is true in the next moment in time 

ϕ is true in all future moments 

ϕ is true in some future moment 

□((¬passport ∨ ¬ticket) → ○¬board_flight) 

send(msg, rcvr) → ♦receive(msg, rcvr)



□(requested → ♦received) 

□(received → ○acted) 

□(acted → ♦□done)



From the above we should be able to infer that it 

is not the case that the system continually 

re-sends a request, but never sees it completed 

(□¬done); i.e. the statement 

should be inconsistent. 

□requested ∧ □¬done



There are actually many forms of fairness: 

♦attempt → ♦succeed 

♦attempt → ♦succeed 

attempt → ♦succeed 

attempt → ♦succeed



FML introduces two new connectives to classical 

logic, 

until (φ Uψ meaning ψ will become true at 

some future time point t and in all states 

between and different from now and t, φ will 

be true) and 

since (φ Sψ meaning ψ became true at 

some time point t in the past and in all 

states between t and now φ is true).



Syntax of FML. 

The formulae for FML are generated as usual, starting from 

a set L p of predicate symbols, a set L v of variable symbols, 

a set L c of constant symbols, the quantifiers ∀ and ∃, and 

the set L t of terms (constants and variables). The set F ml 

is defined by: 

If t 1 , ..., t n are in L t and p is a predicate symbol of arity 

n, then p(t 1 , ..., t n ) is in F ml. 

⊤ (true) and ⊥ (false) are in F ml. 

If A and B are in F ml, then so are ¬A, A ∧ B, AUB, 

ASB, and (A). 

If A is in F ml and v is in L v , then ∃v.A and ∀v.A are 

both in F ml.



The following connectives are defined 

([finger-fisher-owens]): 

○φ φ is true in the next state [⊥Uφ] 

⊖φ the current state is not the initial state, 

and φ was true in the previous state [⊥Sφ] 

⊚φ if the current state is not the initial state, 

then φ was true in the previous state [¬ ⊖¬φ] 

♦φ φ will be true in some future state [⊤Uφ] 

φ φ was true in some past state [⊤Sφ] 

□φ φ will be true in all future states [¬ ♦¬φ] 

φ φ was true in all past states [¬¬φ]



Temporal formulae can be classified as follows. 

state-formula: is either a literal or a boolean combination 

of other state-formulae. 

Strict future-time: 

If A and B are either state or strict 

future-time formulae, then A UB is a strict 

future-time formula. 

If A and B are strict future-time formulae, 

then ¬A, A ∧ B, and (A) are strict 

future-time formulae. 

Strict past-time: past-time duals of strict future-time 

formulae. 

Non-strict: include state-formulae in their definition.



Semantics of FML. 

The models for FML formulae are given by a 

structure consisting of a sequence of worlds 

(also called states), together with an 

assignment of truth values to atomic sentences 

within states, a domain D which is assumed to 

be constant for every state, and mappings from 

elements of the language into denotations. 

This is like in Definition 8.4 on slide 528.



Definition 9.22 (FML model) 

A FML-model is a tuple M = 〈σ, D, h c , h p 〉 where 

σ is the ordered set of states s 0 , s 1 , s 2 , ..., 

h c is a map from the constants into D, and 

h p is a map from N × L p into D n → { true, false } (the 

first argument of h p is the index i of the state s i ). 

Thus, for a particular state s, and a particular predicate p 

of arity n, h(s, p) assigns truth values to atoms constructed 

from n-tuples of elements of D.



variable assignment: h v is a mapping from the variables 

into elements of D. 

term assignment: Given a variable and the valuation 

function h c , a term assignment τ vh is a 

mapping from terms into D defined in the 

usual way.



The semantics of FML is given by the |= relation that 

assigns the truth value of a formula in a model M at a 

particular moment in time i and with respect to a variable 

assignment. 

〈M, i, h v 〉 |= ⊤ 

〈M, i, h v 〉 ̸|= ⊥ 

〈M, i, h v 〉 |= p(x 1 , ..., x n ) iff h p (i, p)(τ vh (x 1 ), ..., τ vh (x n )) = true 

〈M, i, h v 〉 |= ¬φ iff 〈M, i, h v 〉 ̸|= φ 

〈M, i, h v 〉 |= φ ∨ ψ iff 〈M, i, h v 〉 |= φ or 〈M, i, h v 〉 |= ψ 

〈M, i, h v 〉 |= φ Uψ iff for some k s.t. i < k, 〈M, k, h v 〉 |= ψ 

for all j, if i < j < k then 〈M, j, h v 〉 |= φ 

〈M, i, h v 〉 |= φ Sψ iff for some k s.t. 0 ≤ k < i, 〈M, k, h v 〉 |= ψ 

for all j, if k < j 

〈M, i, h v 〉 |= ∀x.φ iff for all d ∈ D, 〈M, i, h v [d/x]〉 |= φ 

〈M, i, h v 〉 |= ∃x.φ iff there exists d ∈ D s.t. 〈M, i, h v [d/x]〉 |= φ



Concurrent METATEM is a programming 

language for DAI based on FML 

([concMTM14, concMTM13, concMTM16]). 

A Concurrent METATEM system contains a number of 

concurrently executing agents which are able to 

communicate through message passing. 

Each agent executes a first-order temporal logic 

specification of its desired behavior. 

Each agent has two main components: 

an interface which defines how the agent may 

interact with its environment (i.e., other agents); 

a computational engine, defining how the agent 

may act.



An agent interface consists of three components: 

a unique agent identifier which names the agent 

a set of predicates defining what messages will be 

accepted by the agent – they are called environment 

predicates; 

a set of predicates defining messages that the agent 

may send—these are called component predicates.



Besides environment and component predicates, an agent 

has a set of internal predicates with no external effect. 

The computational engine of an object is based on the 

METATEM paradigm of executable temporal logics. The 

idea behind this approach is to directly execute a 

declarative agent specification given as a set of program 

rules which are temporal logic formulae of the form: 

antecedent about past → consequent about future



The intuitive interpretation of such a rule is 

on the basis of the past, do the future. 

Since METATEM rules must respect the past 

implies future form, FML formulae defining 

agent rules must be transformed into this form. 

This is always possible as demonstrated in 

[BFG89].


9. Agents based on FOL 6. Comparison 

9.6 Comparison



Comparison via a running example 

The following is taken from [MascardiMS04]. 

Buyer 

Seller 

contractProposal 

accept 

refuse 

contractProposal 

acknowledge 

accept 

refuse 

acknowledge 

contractProposal



Notation used in this figure is based on an 

agent-oriented extension of UML 

[Parunak-odell, aose54]; 

the diamond with the × inside represents a “xor” 

connector; 

the protocol can be repeated more than once (note 

the bottom arrow from the buyer to the seller labelled 

with a “contractProposal” message, which loops 

around and up to point higher up to the seller time 

line).



Seller agent may receive a contractProposal message 

from a buyer agent. 

According to the amount of merchandise required and 

the price proposed by the buyer, the seller may 

accept the proposal, refuse it or try to negotiate a 

new price by sending a contractProposal message 

back to the buyer. 

The buyer agent can do the same (accept, refuse or 

negotiate) when it receives a contractProposal 

message back from the seller.



Behaviour of seller 

if the received message is 

contractProposal(merchandise, amount, proposedprice) 

then 

if there is enough merchandise in the warehouse 

and the price is greater or equal than a max value, 

the seller accepts by sending an accept message to 

the buyer and concurrently ships the required 

merchandise to the buyer (if no concurrent actions 

are available, answering and shipping merchandise 

will be executed sequentially);



if there is not enough merchandise in the warehouse 

or the price is lower or equal than a min value, the 

seller agent refuses by sending a refuse message to 

the buyer; 

if there is enough merchandise in the warehouse and 

the price is between min and max, the seller sends a 

contractProposal to the buyer with a proposed price 

evaluated as the mean of the price proposed by the 

buyer and max.



The merchandise to be exchanged are oranges, 

with minimum and maximum price 1 and 2 

euro respectively. The initial amount of oranges 

that the seller possesses is 1000.

Situation independent functions declaration: 

min-price(Merchandise) = Min 

The minimum price the seller is willing to take 

under consideration for Merchandise is Min. 

max-price(Merchandise) = Max 

The price for Merchandise that the seller accepts 

without negotiation is equal or greater than Max. 



GOLOG 

Primitive actions declaration: 

ship(Buyer, Merchandise, Required-amount) 

The seller agent delivers the Required-amount of 

Merchandise to the Buyer. 

send(Sender, Receiver, Message) 

Sender sends Message to Receiver.



Primitive fluents declaration: 

receiving(Sender, Receiver, Message, S) 

Receiver receives Message from Sender in situation 

S. 

storing(Merchandise, Amount, S) 

The seller stores Amount of Merchandise in 

situation S. 

Initial situation axioms: 

min-price(orange) = 1 

max-price(orange) = 2 

∀ S, R, M. ¬ receiving(S, R, M, s 0 ) 

storing(orange, 1000, s 0 )



Precondition axioms: 

poss(ship(Buyer, Merchandise, Required-amount), S) ≡ 

storing(Merchandise, Amount, S) ∧ Amount ≥ 

Required-amount 

It is possible to ship merchandise iff there is 

enough merchandise stored in the warehouse. 

poss(send(Sender, Receiver, Message), S) ≡ ⊤ 

It is always possible to send messages.



Successor state axioms: 

receiving(Sender, Receiver, Message, 

do(send(Sender, Receiver, Message), S)) ≡ ⊤ 

Receiver receives Message from Sender in 

do(send(Sender, Receiver, Message), S) 

reached by executing 

send(Sender, Receiver, Message) in S.



storing(Merchandise, Amount, do(A, S)) ≡ 

(A = ship(Buyer, Merchandise, Required-amount)∧ 

storing(Merchandise, Required-amount + 

Amount, S)) 

∨ 

(A ≠ ship(Buyer, Merchandise, Required-amount)∧ 

storing(Merchandise, Amount, S)) 

The seller has a certain Amount of Merchandise if it had 

Required-amount + Amount of Merchandise in the 

previous situation and it shipped Required-amount of 

Merchandise, or if it had Amount of Merchandise in the 

previous situation and it did not ship any Merchandise.



We may think that a buyer agent executes a 

buyer-life-cycle procedure concurrently with 

the seller agent procedure seller-life-cycle. 

buyer-life-cycle defines the actions the buyer 

agent takes according to its internal state and 

the messages it receives. 

The seller-life-cycle is defined in the following 

way.



proc seller-life-cycle 

if receiving(Buyer, seller, 

contractP roposal(Merchandise, Required-amount, P rice), now) 

then 

if storing(Merchandise, Amount, now) 

∧ Amount ≥ Required-amount 

∧ P rice ≥ max-price(Merchandise) 

then ship(Buyer, Merchandise, Required-amount) ‖ 

send(seller, Buyer, accept(Merchandise, Required-amount, P rice)) 

else 

if storing(Merchandise, Amount, now) ∧ Amount ≥ Required-amount 

∧ min-price(Merchandise) 

then send(seller, Buyer, contractP roposal(Merchandise, 

Required-amount, (P rice + max-price(Merchandise))/2)) 

else nil



Agent-0 

timegrain := m The program is characterized by a 

time-grain of one minute. 

CAPABILITIES := ((DO ?time (ship ?!buyer ?merchandise 

?required-amount ?!price)) 

(AND (B (?time (stored ?merchandise ?stored-amount))) 

(>= ?stored-amount ?required-amount))) 

The agent has the capability of shipping a certain amount 

of merchandise, provided that, at the time of shipping, it 

believes that such amount is stored in the warehouse.



INITIAL BELIEFS := (0 (stored orange 1000)) 

(?!time (min-price orange 1)) 

(?!time (max-price orange 2)) 

The agent has initial beliefs about the minimum price, 

maximum price and stored amount of oranges. The initial 

belief about stored oranges only holds at time 0, since this 

amount will change during the agent’s life, while beliefs 

about minimum and maximum prices hold whatever the 

time.



COMMITMENT RULES := (COMMIT 

(?buyer REQUEST 

(DO now (ship ?buyer ?merchandise ?req-amnt ?price))) 

(AND (B (now (stored ?merchandise ?stored-amount))) 

(>= ?stored-amount ?req-amnt) 

(B (now (max-price ?merchandise ?max))) 

(>= ?price ?max)) 

(?buyer (DO now 

(ship ?buyer ?merchandise ?req-amnt ?price))) 

(myself (INFORM now ?buyer 

(accepted ?merchandise ?req-amnt ?price))) 

(myself (DO now (update-merchandise ?merchandise ?req-amnt))) 

)



The first commitment rule says that if the seller agent 

receives a request of shipping a certain amount of 

merchandise at a certain price, and if it believes that the 

required amount is stored in the warehouse and the 

proposed price is greater than max-price, the seller agent 

commits itself to the buyer to ship the merchandise (ship 

is a private action), and decides (namely, commits to 

itself) to inform the buyer that its request has been 

accepted and to update the stored amount of 

merchandise (update-merchandise is a private action).



(COMMIT 


(DO now (ship ?buyer ?merchandise ?req-amnt 

?price))) 

(OR (AND (B (now (stored ?merchandise 

?stored-amount))) 

(< ?stored-amount ?req-amnt)) 

(AND (B (now (min-price ?merchandise ?min))) 

(



The second rule says that if the required amount 

of merchandise is not present in the warehouse 

or the price is too low, the seller agent decides 

to inform the buyer that its request has been 

refused.



(COMMIT 


(DO now (ship ?buyer ?merchandise ?req-amnt 

?price))) 

(AND (B (now (stored ?merchandise ?stored-amount))) 

(>= ?stored-amount ?req-amnt) 

(B (now (max-price ?merchandise ?max))) 

(< ?price ?max) 

(B (now (min-price ?merchandise ?min))) 

(> ?price ?min)) 

(myself (DO now (eval-mean ?max ?price ?mean-price))) 

(myself (REQUEST now ?buyer 

(eval-counter-proposal ?merchandise 

?req-amnt ?mean-price))) 

)



Finally, the third rule says that if the price can be 

negotiated and there is enough merchandise in the 

warehouse, the seller agent evaluates the price to propose 

to the buyer agent (eval-mean is a private action) and 

decides to send a counter-proposal to it. We are assuming 

that the buyer agent is able to perform an 

eval-counter-proposal action to evaluate the seller agent’s 

proposal: the seller agent must know the exact action 

syntax if it wants that the buyer agent understands and 

satisfies its request.



Concurrent MetaTeM 

The Concurrent METATEM program for the seller agent 

may be as follows: 

The interface of the seller agent is the following: 

seller(contractP roposal)[accept, refuse, contractP roposal, ship] < 

meaning that: 

– the seller agent, identified by the seller identifier, is able 

to recognize a contractProposal message with its 

arguments, not specified in the interface; 

– the messages that the seller agent is able to broadcast to 

the environment, including both communicative acts and 

actions on the environment, are accept, refuse, 

contractProposal, ship.



The internal knowledge base of the seller agent contains 

the following rigid predicates (predicates whose value 

never changes): 

min-price(orange, 1). 

max-price(orange, 2). 

The internal knowledge base of the seller agent contains 

the following flexible predicates (predicates whose value 

changes over time): 

storing(orange, 1000).



The program rules of the seller agent are the following 

ones (lowercase symbols = constants, uppercase = 

variables): 

∀ Buyer, Merchandise, Req_Amnt, P rice 

⊖[contractP roposal(Buyer, seller, Merchandise, Req_Amnt, P rice) ∧ 

storing(Merchandise, Old_Amount) ∧ Old_Amount ≥ Req_Amnt ∧ 

max − price(Merchandise, Max) ∧ P rice ≥ Max] =⇒ 

[ship(Buyer, Merchandise, Req_Amnt, P rice) ∧ 

accept(seller, Buyer, Merchandise, Req_Amnt, P rice)] 

If there was a previous state where Buyer sent a 

contractProposal message to seller, and in that previous 

state all the conditions were met to accept the proposal, 

then accept the Buyer’s proposal and ship the required 

merchandise.



∀Buyer, Merchandise, Req_Amnt, P rice 

⊖[contractP roposal(Buyer, seller, Merchandise, Req_Amnt, P rice)∧ 

storing(Merchandise, Old_Amount)∧ 

min − price(Merchandise, Min)∧ 

Old_Amount < Req_Amnt ∨ P rice ≥ Min] =⇒ 

refuse(seller, Buyer, Merchandise, Req_Amnt, P rice) 



state the conditions were not met to accept the Buyer’s 

proposal, then send a refuse message to Buyer.



∀Buyer, Merchandise, Req_Amnt, P rice 

⊖[contractP roposal(Buyer, seller, Merchandise, Req_Amnt, P rice)∧ 

storing(Merchandise, Old_Amount) ∧ min − 

price(Merchandise, Min)∧ 

max − price(Merchandise, Max) ∧ Old_Amount ≥ Req_Amnt∧ 

P rice > Min ∧ P rice < Max ∧ New_P rice = 

(Max + P rice)/2] =⇒ 

contractP roposal(seller, Buyer, Merchandise, Req_Amnt, New_P rice) 



state the conditions were met to send a contractProposal 

back to Buyer, then send a contractProposal message to 

Buyer with a new proposed price.



T Sns Cm Cc N M Sem 

ConGolog □ □ □ 

AGENT-0 □ □ □ □ □ 

IMPACT □ □ □ 

Conc. METATEM □ □ 

T: Temporal. 

Sns: Sensing. 

Cm: Communication mechanisms. 

Cc: Concurrent. 

N: Nondeterministic. 

M: Modular. 

Sem: Formal semantics. 

: Fully supported. 

□: Partially supported. 

□: Not supported.


9. Agents based on FOL 7. References 

9.7 References



[BFG89] Barringer, H., M. Fisher, D. Gabbay, G. Gough, 

and R. Owens (1990). 

METATEM: A framework for programming in 

temporal logic. 

In J. W. de Bakker, W. P. de Roever, and G. Rozenberg 

(Eds.), Proceedings of the 1989 REX Workshop on 

Stepwise Refinement of Distributed Systems: Models, 

Formalisms, Correctness, pp. 94–129. Springer-Verlag. 

LNCS 430.



[DLL] De Giacomo, G., Y. Lespérance, and H. J. Levesque 

(2000). 

Congolog, a concurrent programming language 

based on the situation calculus. 

Artificial Intelligence 121, 109–169. 

[finger-fisher-owens] Finger, M., M. Fisher, and R. Owens 

(1993). 

METATEM at work: Modelling reactive systems using 

executable temporal logic. 

In P. Chung, G. L. Lovegrove, and M. Ali (Eds.), 

Proceedings of the 6th International Conference on



Industrial and Engineering Applications of Artificial 

Intelligence and Expert Systems (IEA/AIE’93), pp. 

209–218. Gordon and Breach Publishing. 

[fisherNormalForm] Fisher, M. (1992). 

A normal form for first-order temporal formulae. 

In D. Kapur (Ed.), Proceedings of the 11th 

International Conference on Automated Deduction 

(CADE’92), pp. 370–384. Springer-Verlag. 

LNCS 607.



[concMTM13] Fisher, M. (1993). 

Concurrent METATEM – A language for modeling 

reactive systems. 

In A. Bode, M. Reeve, and G. Wolf (Eds.), Proceedings 

of the 5th International Conference on Parallel 

Architectures and Language, Europe (PARLE’93), pp. 

185–196. Springer-Verlag. 

LNCS 694.



[concMTM14] Fisher, M. and H. Barringer (1991). 

Concurrent METATEM processes – A language for 

distributed AI. 

In E. Mosekilde (Ed.), Proceedings of the 1991 

European Simulation Multiconference. SCS Press. 

[concMTM16] Fisher, M. and M. Wooldridge (1993). 

Executable temporal logic for distributed AI. 

In K. Sycara (Ed.), Proceedings of the 12th International 

Workshop of Distributed AI (IWDAI’93), pp. 131–142.



[fipa] Foundation for Intelligent Physical Agents. 

FIPA home page. 

http://www.fipa.org/. 

[fipa-acl] Foundation for Intelligent Physical Agents 

(2002). 

FIPA ACL message structure specification. 

Approved for standard, 2002-12-06. 

http://www.fipa.org/specs/fipa00061/.







Rationality, pp. 20–29. 

[mascardi02] Mascardi, V. (2002). 

Logic-Based Specification Environments for Multi-Agent 

Systems. 

Ph. D. thesis, Computer Science Department of 

Genova University, Genova, Italy. 

DISI-TH-2002-04.



[MascardiMS04] Mascardi, V., M. Martelli, and L. Sterling 

(2004). 



TPLP 4(4), 429–494. 

[Parunak-odell] Odell, J., H. V. D. Parunak, and B. Bauer 

(2000a). 

Extending UML for agents. 

In G. Wagner, Y. Lespérance, and E. Yu (Eds.), 

Proceedings of the 2nd Workshop on Agent-Oriented 

Information Systems (AOIS’00). Held in conjunction



with 17th National conference on Artificial Intelligence 

(AAAI’00), pp. 3–17. 

[aose54] Odell, J., H. V. D. Parunak, and B. Bauer (2000b). 

Representing agent interaction protocols in UML. 

In P. Ciancarini and M. Wooldridge (Eds.), 

Proceedings of the 1st International Workshop on 

Agent-Oriented Software Engineering (AOSE’00), pp. 


LNCS 1957. 

[Shoham90] Shoham, Y. (1993, March). 

Agent-oriented programming. 

Artificial Intelligence 60, 51–92.



[placa] Thomas, S. R. (1995). 

The PLACA agent programming language. 

In M. Wooldridge and N. R. Jennings (Eds.), 

Proceedings of the 1st International Workshop on Agent 

Theories, Architectures, and Languages (ATAL’94), pp. 


LNCS 890. 



Mind 60, 1–15. 


58–74. Routledge and Kegan Paul, 1957.

Multiagent Systems

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?