A Framework for Evaluating Early-Stage Human - of Marcus Hutter

More documents

Recommendations

Info

ΦDBN Coding and Evaluation I now construct a code for s1:n given a1:n, and for r1:n given s1:n and a1:n, which is optimal (minimal) if s1:nr1:n given a1:n is sampled from some MDP. It constitutes our cost function for Φ and is used to define the Φ selection principle for DBNs. Compared to the MDP case, reward coding is more complex, and there is an extra dependence on the graphical structure of the DBN. Recall [Hut09] that a sequence z1:n with counts n = (n1,...,nm) can within an additive constant be coded in CL(n) := n H(n/n) + m′ −1 2 log n if n>0 and 0 else (2) bits, where n = n+ = n1 +...+nm and m ′ = |{i : ni > 0}|≤m is the number of non-empty categories, and H(p):= − �m i=1pilogpi is the entropy of probability distribution p. The code is optimal (within +O(1)) for all i.i.d. sources. State/Feature Coding. Similarly to the ΦMDP case, we need to code the temporal “observed” state=feature sequence x1:n. I do this by a frequency estimate of the state/feature transition probability. (Within an additive constant, MDL, MML, combinatorial, incremental, and Bayesian coding all lead to the same result). In the following I will drop the prime in (u i ,a,x ′i ) tuples and related situa- tions if/since it does not lead to confusion. Let T ia u i x i ={t≤ n : ut−1 = u i ,at−1 = a,x i t = x i } be the set of times t−1 at which features that influence x i have values u i , and action is a, and which leads to feature i having value x i . Let nia uixi = |T ia uixi| their number (ni+ ++ = n ∀i). I estimate each feature probability separately by ˆ Pa (xi |ui ) = nia ui ia xi/nui + . Using (1), this yields ˆP(x1:n|a1:n) = n� t=1 ˆT at−1 xt−1xt = � � = ... = exp i,u i ,a n� n ia u i + H t=1 i=1 m� ˆP at−1 (x i t|u i t−1) � n ia u i • n ia u i + �� The length of the Shannon-Fano code of x1:n is just the logarithm of this expression. We also need to code each non-zero count nia uixi to accuracy O(1/ � nia u i +), which each needs 1 2log(nia ui ) bits. Together this gives a complete code + of length CL(x1:n|a1:n) = � i,u i ,a The rewards are more complicated. CL(n ia ui ) (3) • Reward structure. Let Ra xx ′ be (a model of) the observed reward when action a in state x results in state x ′ . It is natural to assume that the structure of the rewards Ra xx ′ is related to the transition structure T a xx ′. Indeed, this is not restrictive, since one can always consider a DBN with the union of transition and reward dependencies. Usually it is assumed that the “global” reward is a sum of “local” rewards R ia u i x ′i, one for each feature i [KP99]. For simplicity of exposition I assume that the local reward R i only depends on the feature value x ′i and not on u i and a. Even this is not restrictive 69 and actually may be advantageous as discussed in [Hut09] for MDPs. So I assume R a m� xx ′ = R i x ′i =: R(x′ ) i=1 For instance, in the example in the previous section, two local rewards (RA x ′A = 1x ′A
where ˆσ 2 =Loss( ˆw)/n. Given ˆw and ˆσ this can be regarded as the (Shannon-Fano) code length of r1:n (there are actually a few subtleties here which I gloss over). Each weight ˆwk and ˆσ need also be coded to accuracy O(1/ √ n), which needs (m+2) 1 2logn bits total. Together this gives a complete code of length CL(r1:n|x1:na1:n) = (5) = n m+2 2 log(Loss( ˆw)) + 2 log n − n ne 2 log 2π ΦDBN evaluation and selection is similar to the MDP case. Let G denote the graphical structure of the DBN, i.e. the set of parents Pa i ⊆ {1,...,m} of each feature i. (Remember ui are the parent values). Similarly to the MDP case, the cost of (Φ,G) on hn is defined as Cost(Φ, G|hn) := CL(x1:n|a1:n) + CL(r1:n|x1:n, a1:n), (6) and the best (Φ,G) minimizes this cost. (Φ best , G best ) := arg min{Cost(Φ, G|hn)} Φ,G A general discussion why this is a good criterion can be found in [Hut09]. In the following section I mainly highlight the difference to the MDP case, in particular the additional dependence on and optimization over G. DBN Structure Learning & Updating This section briefly discusses minimization of (6) w.r.t. G given Φ and even briefer minimization w.r.t. Φ. For the moment regard Φ as given and fixed. Cost and DBN structure. For general structured local rewards Ria uix ′i, (3) and (5) both depend on G, and (6) represents a novel DBN structure learning criterion that includes the rewards. For our simple reward model Ri xi, (5) is independent of G, hence only (3) needs to be considered. This is a standard MDL criterion, but I have not seen it used in DBNs before. Further, the features i are independent in the sense that we can search for the optimal parent sets Pa i ⊆ {1,...,m} for each feature i separately. Complexity of structure search. Even in this case, finding the optimal DBN structure is generally hard. In principle we could rely on off-the-shelf heuristic search methods for finding good G, but it is probably better to use or develop some special purpose optimizer. One may even restrict the space of considered graphs G to those for which (6) can be minimized w.r.t. G efficiently, as long as this restriction can be compensated by “smarter” Φ. A brute force exhaustive search algorithm for Pa i is to consider all 2m subsets of {1,...,m} and select the one that minimizes � ui ,aCL(nia ui ). A reasonable and often • employed assumption is to limit the number of parents to some small value p, which reduces the search space size to O(mp ). Indeed, since the Cost is exponential in the maximal number of parents of a feature, but only linear in n, a Cost minimizing Φ can usually not have more than a logarithmic number of parents, which leads to a search space that is pseudopolynomial in m. 70 Heuristic structure search. We could also replace the wellfounded criterion (3) by some heuristic. One such heuristic has been developed in [SDL07]. The mutual information is another popular criterion for determining the dependency of two random variables, so we could add j as a parent of feature i if the mutual information of x j and x ′i is above a certain threshold. Overall this takes time O(m 2 ) to determine G. An MDL inspired threshold for binary random variables is 1 2nlogn. Since the mutual information treats parents independently, ˆ T has to be estimated accordingly, essentially as in naive Bayes classification [Lew98] with feature selection, where x ′i represents the class label and ui are the features selected x. The improved Tree-Augmented naive Bayes (TAN) classifier [FGG97] could be used to model synchronous feature dependencies (i.e. within a time slice). The Chow-Liu [CL68] minimum spanning tree algorithm allows determining G in time O(m3 ). A tree becomes a forest if we employ a lower threshold for the mutual information. Φ search is even harder than structure search, and remains an art. Nevertheless the reduction of the complex (illdefined) reinforcement learning problem to an internal feature search problem with well-defined objective is a clear conceptual advance. In principle (but not in practice) we could consider the set of all (computable) functions {Φ : H → {0,1}}. We then compute Cost(Φ|h) for every finite subset Φ = {Φ i1 ,...,Φ im } and take the minimum (note that the order is irrelevant). Most practical search algorithms require the specification of some neighborhood function, here for Φ. For instance, stochastic search algorithms suggest and accept a neighbor of Φ with a probability that depends on the Cost reduction. See [Hut09] for more details. Here I will only present some very simplistic ideas for features and neighborhoods. Assume binary observations O = {0,1} and consider the last m observations as features, i.e. Φ i (hn) = on−i+1 and Φ(hn) = (Φ 1 (hn),...,Φ m (hn)) = on−m+1:n. So the states are the same as for ΦmMDP in [Hut09], but now S ={0,1} m is structured as m binary features. In the example here, m= 5 lead to a perfect ΦDBN. We can add a new feature on−m (m ❀ m+1) or remove the last feature (m ❀ m−1), which defines a natural neighborhood structure. Note that the context trees of [McC96; Hut09] are more flexible. To achieve this flexibility here we either have to use smarter features within our framework (simply interpret s=ΦS(h) as a feature vector of length m=⌈log|S|⌉) or use smarter (non-tabular) estimates of P a (x i |u i ) extending our framework (to tree dependencies). For general purpose intelligent agents we clearly need more powerful features. Logical expressions or (non)accepting Turing machines or recursive sets can map histories or parts thereof into true/false or accept/reject or in/out, respectively, hence naturally represent binary features. Randomly generating such expressions or programs with an appropriate bias towards simple ones is a universal feature generator that eventually finds the optimal feature map. The idea is known as Universal Search [Gag07].
Page 2 and 3:
Ben Goertzel, Pascal Hitzler, Marcu
Page 4 and 5:
Artificial General Intelligence Vol
Page 6:
Preface Artificial General Intellig
Page 9 and 10:
Organizing Committee Tsvi Achler U.
Page 11 and 12:
A formal framework for the symbol g
Page 13 and 14:
Importing Space-time Concepts Into
Page 15 and 16:
one structure, everything else adap
Page 17 and 18:
generalize is essential to understa
Page 19 and 20:
First Application The above framewo
Page 21 and 22:
problem solving, use of knowledge,
Page 23 and 24:
Of particular note is the separatio
Page 25 and 26:
Conclusions and Future Work While p
Page 27 and 28:
Conceptual Spaces The conceptual ar
Page 29 and 30:
perceptions and actions. The lingui
Page 31 and 32: means of the knowledge stored in th
Page 33 and 34: grams given some (syntactically res
Page 35 and 36: For example, let reverse([]) = [] r
Page 37 and 38: Igor2 produced solutions with auxil
Page 39 and 40: evolve neural net modules as quickl
Page 41 and 42: ain”, consisting of some dozen or
Page 43 and 44: Population Chr NListPtr m Chrm is a
Page 45 and 46: and shaping, we feel that exploring
Page 47 and 48: Table 2 reviews the key capabilitie
Page 49 and 50: One way to achieve this goal would
Page 51 and 52: question. Our approach of staying a
Page 53 and 54: H0 δB(H0) δF(H0) H1 δB(H1) δF(H
Page 55 and 56: Discussion and future work Testing
Page 57 and 58: Types of Reasoning Corresponding Fo
Page 59 and 60: Integrating Different Types of Reas
Page 61 and 62: good overview of analogy models can
Page 63 and 64: ��
Page 65 and 66: � ��
Page 67 and 68: ��
Page 69 and 70: A Unified Framework for IP Conditio
Page 71 and 72: negative evidence when added to a r
Page 73 and 74: ing strategy. Due to GOLEM’s rand
Page 75 and 76: any state index, n∈IN the current
Page 77 and 78: is the best observation summary, wh
Page 79 and 80: to a code for the rewards only, whi
Page 81: have exponentially many states (2O(
Page 85 and 86: • The cost function can be improv
Page 87 and 88: decides to introduce inflation or d
Page 89 and 90: € € The diffusion matrix is the
Page 91 and 92: tuning may be done adaptively by te
Page 93 and 94: Of course, Harnad’s argument is n
Page 95 and 96: By exploring variations of Harnad
Page 97 and 98: Discussion The representational sys
Page 99 and 100: the cognitive map is less of a map
Page 101 and 102: models of cognition except in cases
Page 103 and 104: 7(b), we can see that the straight
Page 105 and 106: eaction times, error rates, or exac
Page 107 and 108: comparing behavior with and without
Page 109 and 110: Doorenbos, R. B. 1994. Combining le
Page 111 and 112: egion, we leave the type of object
Page 113 and 114: extract features in support of reco
Page 115 and 116: with a minimum score of -1.0. The
Page 117 and 118: the NASA Human Error Modeling compa
Page 119 and 120: whether their models are capable of
Page 121 and 122: Incorporating Planning and Reasonin
Page 123 and 124: see an unclaimed ten-dollar bill al
Page 125 and 126: swering user queries) and the state
Page 127 and 128: Program Representation for General
Page 129 and 130: distribution P corresponding to the
Page 131 and 132: with all Ei replaced by the unbound
Page 133 and 134:
Consciousness in Human and Machine:
Page 135 and 136:
at the sensory receptors, is pre-pr
Page 137 and 138:
The second choice is to take a deta
Page 139 and 140:
Hebbian Constraint on the Resolutio
Page 141 and 142:
The Case of Inputs that Change by T
Page 143 and 144:
This route is relevant if the noise
Page 145 and 146:
1. Oculus Info. Inc. Toronto, Ont.
Page 147 and 148:
Our implemented Turing judge used a
Page 149 and 150:
2.7; Human vs. JabberWacky t(157.6)
Page 151 and 152:
UnsupervisedSegmentationofAudioSpee
Page 153 and 154:
FinallyweusedthediscreteFourierTran
Page 155 and 156:
Secondly,thereexistsmorethanonelogi
Page 157 and 158:
Parsing PCFG within a General Proba
Page 159 and 160:
known in advance, although many imp
Page 161 and 162:
Figure 2: Shows the underlying weig
Page 163 and 164:
Self-Programming: Operationalizing
Page 165 and 166:
expressivity. More over, self-progr
Page 167 and 168:
existence of a distance function ov
Page 169 and 170:
Bootstrap Dialog: A Conversational
Page 171 and 172:
elation is typically a verb. In the
Page 173 and 174:
node RDF proposition N1 N2 N3 N4 N5
Page 175 and 176:
Analytical Inductive Programming as
Page 177 and 178:
Problem domain: puttable(x) PRE: cl
Page 179 and 180:
eq Hanoi(0, Src, Aux, Dst, S) = mov
Page 181 and 182:
Human and Machine Understanding Of
Page 183 and 184:
case orderings t correspond to the
Page 185 and 186:
in (1) received a different denotat
Page 187 and 188:
Abstract This paper analyzes the di
Page 189 and 190:
Having grounded meaning: In an inte
Page 191 and 192:
to experience” can be argued to b
Page 193 and 194:
Abstract Case-by-case Problem Solvi
Page 195 and 196:
First, since NARS is designed in th
Page 197 and 198:
typical response is to find such an
Page 199 and 200:
What Is Artificial General Intellig
Page 201 and 202:
Finally, the facts that both seem t
Page 203 and 204:
differ. NARS uses Narsese, the fair
Page 205 and 206:
Integrating Action and Reasoning th
Page 207 and 208:
states, with the condition that the
Page 209 and 210:
action model. The system is able to
Page 211 and 212:
Neuroscience and AI Share the Same
Page 213 and 214:
Relevance Based Planning: Why Its a
Page 215 and 216:
General Intelligence and Hypercompu
Page 217 and 218:
To appear, AGI-09 1 Stimulus proces
Page 219 and 220:
Distribution of Environments in For
Page 221 and 222:
The Importance of Being Neural-Symb
Page 223 and 224:
Improving the Believability of Non-
Page 225 and 226:
Understanding the Brain’s Emergen
Page 227 and 228:
Abstract The challenge of creating
Page 229 and 230:
Importing Space-time Concepts Into
Page 231 and 232:
HELEN: Using Brain Regions and Mech
Page 233 and 234:
Holistic Intelligence: Transversal
Page 235 and 236:
Achieving Artificial General Intell
Page 237:
Achler, Tsvi . . . . . . . . . . .
show all

A Framework for Evaluating Early-Stage Human - of Marcus Hutter

Create successful ePaper yourself

Delete template?

Save as template?