on some <strong>of</strong> their concepts, we develop a model that allows an agent to make <strong>decision</strong>s influenced byemotional factors <strong>with</strong>in the ARA framework.The model is essentially multi-attribute <strong>decision</strong> analytic, see [3], but our agent entertains also modelsforecasting the evolution <strong>of</strong> its adversary and the environment surrounding them. We also includemodels simulating emotions, which have an impact over the agent’s utility function. In such a way,we aim at better simulating human <strong>decision</strong> <strong>making</strong> and, specially, improving interfacing and interaction<strong>with</strong> users.2 Basic ElementsWe start by introducing the basic elements <strong>of</strong> our model. We aim at designing an agent A whoseactivities we want to regulate and plan. There is another participant B, the user, which interacts <strong>with</strong>A. The activities <strong>of</strong> both A and B take place <strong>with</strong>in an environment E. As a motivating example,suppose that we aim at designing a bot A which will interact <strong>with</strong> a kid B <strong>with</strong>in a given room E.A makes <strong>decision</strong>s <strong>with</strong>in a finite set A = {a 1 , . . . , a m }, which possibly includes a do nothingaction. B makes <strong>decision</strong>s <strong>with</strong>in a set B = {b 1 , . . . , b n }, which also includes a do nothing action.B will be as complete as possible, while simplifying all feasible results down to a finite number. Itmay be the case that not all user actions will be known a priori. This set could grow as the agentlearns, adding new user actions, as we discuss in our conclusions. The environment E changes <strong>with</strong>the user actions. The agent faces this changing environment, which affects its own behavior. Weassume that the environment adopts a state <strong>with</strong>in a finite set E = {e 1 , . . . , e r }.A has q sensors which provide readings and are the window through which it perceives the world.Sensory information originating in the external environment plays an important role in the intensity<strong>of</strong> the agent’s emotions, which, in turn, affect its <strong>decision</strong>-<strong>making</strong> process. Each sensor reading isattached to a time t, so that the sensor reading vector will be s t = (s 1 t , . . . , s q t ). The agent infers theexternal environmental state e, based on a transformation a function f, so thatê t = f(s t ).A also uses the sensor readings to infer what the user has done, based on a (possibly probabilistic)function gˆbt = g(s t ).We design our agent <strong>with</strong> an embedded management by exception principle, see [23]. Under normalcircumstances, its activities will be planned according to the basic loop shown in Figure 1. This isopen to interventions if an exception occurs.Readsensors s tInterpretstate e tInferaction b tUpdate theforecastingmodelChoosenext actiona t+1UpdateclockFigure 1: Basic Agent Loop3 ARA Affective Decision ModelEssentially, we shall plan our agent’s activities over time <strong>with</strong>in the <strong>decision</strong> analytic framework,see [3]. We describe, in turn, the forecasting model (which incorporates the ARA elements), thepreference model (which incorporates the affective elements) and the optimization part.2
3.1 Forecasting ModelsThe agent maintains forecasting models which suggest him <strong>with</strong> which probabilities will the user actand the environment react, given the past history <strong>of</strong> its actions, the user’s actions and the evolution<strong>of</strong> the environment. We describe the general structure <strong>of</strong> such models.Assume that, for computational reasons, we limit the agent’s memory to two instant times, so thatwe are interested in computingp(e t , b t | a t , (e t−1 , a t−1 , b t−1 ), (e t−2 , a t−2 , b t−2 )).Extensions to k instants <strong>of</strong> memory or forecasting m steps ahead follow a similar path. The abovemay be decomposed throughp(e t |b t , a t , (e t−1 , a t−1 , b t−1 ), (e t−2 , a t−2 , b t−2 )) × p(b t |a t , (e t−1 , a t−1 , b t−1 ), (e t−2 , a t−2 , b t−2 )).We assume that the environment is fully under the control <strong>of</strong> the user. As an example, the usercontrols the light, the temperature and other features <strong>of</strong> the room. Moreover, he may plug in the botto charge its battery, and so on. Only the latest action <strong>of</strong> the user will affect the evolution <strong>of</strong> theenvironment. Thus, we shall assume thatp(e t | b t , a t , (e t−1 , a t−1 , b t−1 ), (e t−2 , a t−2 , b t−2 )) = p(e t | b t , e t−1 , e t−2 ).We term this the environment model.Similarly, we shall assume that the user has its own behavior evolution, that might be affected byhow does he react to the agent actions, thus incorporating the ARA principle, so thatp(b t | a t , (e t−1 , a t−1 , b t−1 ), (e t−2 , a t−2 , b t−2 )) = p(b t | a t , b t−1 , b t−2 ). (1)The agent will maintain two models for such purpose. The first one describes the evolution <strong>of</strong> theuser by itself, assuming that he is in control <strong>of</strong> the whole environment, and he is not affected by theagent’s actions. We call it the user model, and is described throughp(b t | b t−1 , b t−2 ).The other one refers to the user’s reactions to the agent’s actions. Indeed, that the user is fullyreactive to the agent, which we describe throughp(b t | a t ).We call it the classical conditioning model, <strong>with</strong> the agent conditioning the user.We combine both models to recover (1). For example, under appropriate independence conditionswe recover it throughp(b t | a t , b t−1 , b t−2 ) = (p(b t | b t−1 , b t−2 ) p(b t | a t ))/p(b t ).Other combinations are required if such conditions do not hold. In general, we could view theproblem as one <strong>of</strong> model averaging, see [9]. In such casep(b t | a t , b t−1 , b t−2 ) = p(M 1 )p(b t | b t−1 , b t−2 ) + p(M 2 )p(b t | a t ),where p(M i ) denotes the probability that the agent gives to model i, which, essentially, capturehow much reactive to the agent’s actions the user is. Learning about various models <strong>with</strong>in ourimplementation is sketched in Section 4.3.2 Affective Preference ModelWe describe now the preference model, which incorporates the affective principles. We shall assumethat the agent faces <strong>multiple</strong> consequences c = (c 1 , c 2 , . . . , c l ). At each instant t, such consequencesdepend on his action a t , the user’s action b t and the future state e t , realized after a t and b t . Therefore,the consequences will be <strong>of</strong> the formc i (a t , b t , e t ), i = 1, . . . , l.3
- Page 1 and 2: The 2nd International Workshop onDE
- Page 3 and 4: The 2nd International Workshop onDE
- Page 5 and 6: Time Title Authors7:30—7:50 Openi
- Page 7 and 8: Modeling Humans as Reinforcement Le
- Page 9 and 10: solution concept used to predict th
- Page 11 and 12: an ɛ-greedy policy parameterizatio
- Page 13 and 14: Automated Explanations for MDP Poli
- Page 15 and 16: V E = ∑ i∈Eλ π∗s 0(sc i )
- Page 17 and 18: Figure 1: User Perception of MDP-Ba
- Page 19 and 20: [9] Warren B. Powell. Approximate D
- Page 21 and 22: information (technological knowledg
- Page 23 and 24: The complete fulfilment of preferen
- Page 25 and 26: References[1] J.O. Berger. Statisti
- Page 27 and 28: (happiness, anger, fear, disgust, s
- Page 30 and 31: David H Wolpert, editors, Decision
- Page 34 and 35: We assume that they are evaluated t
- Page 36 and 37: AcknowledgmentsResearch supported b
- Page 38 and 39: Bayesian Combination of Multiple, I
- Page 40 and 41: is not in closed form [5], requirin
- Page 42 and 43: where α (k)j is updated by adding
- Page 44 and 45: Figure 3: Prototypical confusion ma
- Page 46 and 47: Artificial Intelligence Designfor R
- Page 48 and 49: the game. Each has its own attribut
- Page 50 and 51: 4.3 Experimental Results and Limita
- Page 52 and 53: Distributed Decision Making byCateg
- Page 54 and 55: q 1a 1 (1) b 1(1)a 2(1)b 2(1)a K(1)
- Page 56 and 57: Bayes risk0.250.20.150.10.05Collabo
- Page 58 and 59: Decision making and working memory
- Page 60 and 61: in WM as well as inhibitory tasks [
- Page 62 and 63: Difficulty level will be automatica
- Page 64 and 65: Overall, the current literature lea
- Page 66 and 67: [31] W K Bickel, R Yi, R D Landes,
- Page 68 and 69: Each time instant, the agents first
- Page 70 and 71: Figure 2: Two basic decentralized a
- Page 72 and 73: [11] M. Kárný and T.V. Guy. Shari
- Page 74 and 75: these results yields a design metho
- Page 76 and 77: Minimum of (16) is well known from
- Page 78 and 79: Algorithm 2 VB-DP variant of the di
- Page 80 and 81: Further simplications can be achiev
- Page 82 and 83:
The basic terms we use are as follo
- Page 84 and 85:
3 Connection to the Bayesian soluti
- Page 86 and 87:
4 ConclusionThis paper brings an im
- Page 88 and 89:
2 Dynamic programming and revisions
- Page 90 and 91:
The possible way how to recognize t
- Page 92 and 93:
The data used for experiment are da
- Page 95:
T.V. Guy, Institute of Information