decision making with multiple imperfect decision makers - Institute of ...

More documents

Recommendations

Info

We assume that they are evaluated through a multi-attribute utility function, see [3]. Specifically,without much loss of generality, see [24], we shall adopt an additive formwith w i ≥ 0, ∑ li=1 w i = 1.u(c 1 , c 2 , . . . , c l ) =l∑w i u i (c i ),The consequences might be perceived differently depending on the current emotional state d t of theagent. We shall define it in terms of the level of k basic emotions, through a mixing functioni=1d t = h(em 1 t , em 2 t , . . . , em k t ).[5] and [22] provide many pointers to the literature on mixing emotions. The intensity of these basicemotions, in turn, will be defined in terms of how desirable a situation is, i.e. how much utilityu(c t ) is gained, and how surprising the situation was, see [7] for an assessment of such models. Theexpectations, or surprise, will be defined by comparing the predicted and the actual (inferred) useraction through some distance functionz t = d(¯b t , ˆb t ),where ¯b t is (the most likely) predicted user action. We assume some stability within emotions, in thatcurrent emotions influence future emotions. Thus, we assume a probabilistic evolution of emotionsthroughem i t = r i (em i t−1, u(c t ), z t ).Finally, following [13], we shall actually assume that the utility weights will depend on the emotionalstate, the stock of visceral factors in their notation, so that3.3 Expected Utilityu(c) =l∑w i (d)u i (c i ).i=1The goal of our agent will be to maximize the predictive expected utility. Planning m instants aheadrequires computing maximum expected utility plans defined through:[max ψ(a ∑m]∑t, . . . , a t+m ) =(u(a t+i , b t+i , e t+i )) ×(a t,...,a t+m)(b t,e t)...,(b t+m,e t+m)×p((b t, e t), . . . , (b t+m, e t+m) | (a t, a t+1, . . . , a t+m, (a t−1, b t−1, e t−1), (a t−2, b t−2, e t−2))).i=1assuming utilities to be additive over time. This could be done through dynamic programming. Ifplanning m instants ahead turns out to be very expensive computationally, we could plan just oneperiod ahead. In this case, we aim at solvingmax ψ(a t) = ∑ u(a t , b t , e t ) × p(b t , e t | a t , (a t−1 , b t−1 , e t−1 ), (a t−2 , b t−2 , e t−2 )).a t∈Ab t,e tWe may mitigate the myopia of this approach by adding a term penalizing deviations from someideal agent consequences, as in [21]. In this case, the utility would have the form u(c) − ρ(c, c ∗ )where ρ is a distance and c ∗ is an ideal consequence value.Agents operating in this way may end up being too predictable. We may mitigate such effect bychoosing the next action in a randomized way, with probabilities proportional to the predictive expectedutilities, that isP (a t ) ∝ ψ(a t ), (2)where P (a t ) is the probability of choosing a t .4
4 ImplementationThe above procedures have been implemented within the AISoy1 robot environment(http://www.aisoy.es). Some of the details of the model implemented are described next, with codedeveloped in C++ over Linux.The set A includes the robot’s actions which include cry, tell a joke, ask for being recharged, complain,talk, sing, argue, ask for playing, ask for help and do nothing. On the user’s side, set B, therobot is able to detect several agent’s actions, some of them in a probabilistic way. Among them,the robot detects hug, hit, shout, speak, being recharged, play, being touched, stroke or no action.Regarding the environment (set E), the bot may recognize contextual issues concerning the presenceof noise or music, the level of darkness, the temperature, or its inclination. To do so, the bot hasseveral sensors including a camera to detect objects or persons within a scene, as well as the lightintensity; a microphone used to recognize when the user talks and understand what he says, througha natural language processing component; some touch sensors, to interpret when it has been touchedor hugged or hit; an inclination sensor so as to know when it is lying down or not; and a temperaturesensor. The information provided by these sensors is used by the bot to infer the user’s actions andenvironmental states. Some are based on simple deterministic rules; for example, the hit action isinterpreted through a detection in a touch sensor and a variation in the inclination sensor. Others arebased on probabilistic rules, like those involving voice recognition and processing.The basic forecasting models (environment, user, classical conditioning) are Markov chains and welearn about their transition probabilities based on matrix beta models, see [20]. For expected utilitycomputations and point forecasts we summarize the corresponding row-wise Dirichlet distributionsthrough their means. Learning about probability models is done through Bayesian model averaging,as in [9].The bot aims at satisfying four objectives, which, as in [14], are ordered in hierarchical order ofimportance. They go from a primary security objective, in which the bot cares mainly for its survivaland security (in terms of not being hit, having a sufficient energy level, and being at the righttemperature), to higher objective levels in relation with a social empathy layer, once basic objectivesare sufficiently covered, in relation with social interests of the bot. Weights reflect the importanceof the objectives and component utility functions reflect the aim of fulfilling as quickly as possiblethe primary objectives, up to a certain level. Emotion implementations are based on [7], who comparedifferent appraisal models to obtain the intensity of an emotion. We use four basic emotions(joy, sadness, hope and fear), which are then combined to obtain more complex emotions. Emotionsevolve as Dynamic Linear Models, as in [23].The model is implemented in a synchronous mode. Sensors are read at fixed times (with differenttimings for various sensors). When relevant events are detected, the basic information processingand decision making loop is shot. However, as mentioned it is managed by exception in that ifexceptions to standard behavior occur, the loop is open to interventions through various threads.We plan only one step ahead and choose the action with probabilities proportional to the computedexpected utilities. Memory is limited to the two previous instants.5 DiscussionWe have described a model to control the behavior of an agent in front of an intelligent adversary. Itis multi-attribute decision analytic at its core, but it incorporates forecasting models of the adversary(Adversarial Risk Analysis) and emotion-based behavior (Affective Decision Making). This wasmotivated by our interest in improving the user’s experience interacting with a bot [2], [12] and[15]. We believe though that this model may find many other potential applications in fields likeinterface design, e-learning, entertainment or therapeutical devices.The model should be extended to a case in which the agent interacts with several users, througha process of identification. It could also be extended to a case in which there are several agents,possibly cooperating or competing, depending on their emotional state. Dealing with the possibilityof learning about new user actions, based on repeated readings, and, consequently, augmenting theset B is another challenging problem. Finally, we have shown what is termed a 0-level ARA analysis.We could try to undertake higher ARA levels in modeling the performance of adversaries.5
Page 1 and 2: The 2nd International Workshop onDE
Page 3 and 4: The 2nd International Workshop onDE
Page 5 and 6: Time Title Authors7:30—7:50 Openi
Page 7 and 8: Modeling Humans as Reinforcement Le
Page 9 and 10: solution concept used to predict th
Page 11 and 12: an ɛ-greedy policy parameterizatio
Page 13 and 14: Automated Explanations for MDP Poli
Page 15 and 16: V E = ∑ i∈Eλ π∗s 0(sc i )
Page 17 and 18: Figure 1: User Perception of MDP-Ba
Page 19 and 20: [9] Warren B. Powell. Approximate D
Page 21 and 22: information (technological knowledg
Page 23 and 24: The complete fulfilment of preferen
Page 25 and 26: References[1] J.O. Berger. Statisti
Page 27 and 28: (happiness, anger, fear, disgust, s
Page 30 and 31: David H Wolpert, editors, Decision
Page 32 and 33: on some of their concepts, we devel
Page 36 and 37: AcknowledgmentsResearch supported b
Page 38 and 39: Bayesian Combination of Multiple, I
Page 40 and 41: is not in closed form [5], requirin
Page 42 and 43: where α (k)j is updated by adding
Page 44 and 45: Figure 3: Prototypical confusion ma
Page 46 and 47: Artificial Intelligence Designfor R
Page 48 and 49: the game. Each has its own attribut
Page 50 and 51: 4.3 Experimental Results and Limita
Page 52 and 53: Distributed Decision Making byCateg
Page 54 and 55: q 1a 1 (1) b 1(1)a 2(1)b 2(1)a K(1)
Page 56 and 57: Bayes risk0.250.20.150.10.05Collabo
Page 58 and 59: Decision making and working memory
Page 60 and 61: in WM as well as inhibitory tasks [
Page 62 and 63: Difficulty level will be automatica
Page 64 and 65: Overall, the current literature lea
Page 66 and 67: [31] W K Bickel, R Yi, R D Landes,
Page 68 and 69: Each time instant, the agents first
Page 70 and 71: Figure 2: Two basic decentralized a
Page 72 and 73: [11] M. Kárný and T.V. Guy. Shari
Page 74 and 75: these results yields a design metho
Page 76 and 77: Minimum of (16) is well known from
Page 78 and 79: Algorithm 2 VB-DP variant of the di
Page 80 and 81: Further simplications can be achiev
Page 82 and 83: The basic terms we use are as follo
Page 84 and 85:
3 Connection to the Bayesian soluti
Page 86 and 87:
4 ConclusionThis paper brings an im
Page 88 and 89:
2 Dynamic programming and revisions
Page 90 and 91:
The possible way how to recognize t
Page 92 and 93:
The data used for experiment are da
Page 95:
T.V. Guy, Institute of Information
show all

decision making with multiple imperfect decision makers - Institute of ...

Create successful ePaper yourself

Delete template?

Save as template?