A Framework for Evaluating Early-Stage Human - of Marcus Hutter

More documents

Recommendations

Info

AGI Preschool: A Framework for Evaluating Early-Stage Human-like AGIs Abstract A class of environments for teaching and evaluating AGI systems is described, modeled loosely on preschools used for teaching human children and intended specifically for early-stage systems aimed at approaching human-level, human-inspired AGI through a process resembling human developmental psychology. Some particulars regarding implementation of environments in this class in online virtual worlds are presented, along with details on associated evaluation tasks, and discussion of environments potentially appropriate for “AGI preschool graduates,” based on more advanced human schooling. Introduction One of the many difficult issues arising in the course of research on human-level AGI is that of “evaluation and metrics” – i.e., AGI intelligence testing. The Turing test (Turing, 1950) approaches the question of how to tell if an AGI has achieved human-level intelligence, and while there are debates and subtleties even with this well-known test (Moor, 200l; French, 1990; Hayes and Ford, 1995), the question we’ll address here is a significantly trickier one: assessing the quality of incremental progress toward human-level AI. (Laird, et al, 2009) discusses some of the general difficulties involved in this type of assessment, and some requirements that any viable approach must fulfill. Here, rather than surveying the spectrum of possibilities, we will focus on describing in detail one promising approach: emulation, in a multiuser online virtual world, of an environment similar to preschools used in early human childhood education. Complete specification of an “AGI Preschool” would require much more than a brief conference paper; our goal here is to sketch the idea in broad outline, and give a few examples of the types of opportunities such an environment would afford for instruction, spontaneous learning and formal and informal evaluation of certain sorts of early-stage AGI systems. The Need for “Realistic” AGI Testing One might question the need or importance of a new, overall framework for AGI intelligence testing, such as the Ben Goertzel, Stephan Vladimir Bugaj Novamente LLC & Singularity Institute for AI, AGI Research Institute 1405 Bernerd Place, Rockville MD 20851, USA ben@goertzel.org, stephan@bugaj.com 31 AGI Preschool appraoch described here. After all, there has already been a lot of work on evaluating the capability of various AI systems and techniques. However, we believe that the approaches typically taken have significant shortcomings from an AGI perspective. Certainly, the AI field has inspired many competitions, each of which tests some particular type or aspect of intelligent behavior. Examples include robot competitions, tournaments of computer chess, poker, backgammon and so forth at computer olympiads, trading-agent competition, language and reasoning competitions like the Pascal Textual Entailment Challenge, and so on. In addition to these, there are many standard domains and problems used in the AI literature that are meant to capture the essential difficulties in a certain class of learning problems: standard datasets for face recognition, text parsing, supervised classification, theorem-proving, question-answering and so forth. However, the value of these sorts of tests for AGI is predicated on the hypothesis that the degree of success of an AI program at carrying out some domain-specific task, is correlated the the potential of that program for being developed into a robust AGI program with broad intelligence. If AGI and problem-area-specific “narrow AI” are in fact very different sorts of pursuits requiring very different principles, as we suspect (Goertzel and Pennachin, 2006), then these tests are not strongly relevant to the AGI problem. There are also some standard evaluation paradigms aimed at AI going beyond specific tasks. For instance, there is a literature on “multitask learning,” where the goal for an AI is to learn one task quicker given another task solved previously (Caruna, 1997; Thrun and Mitchell, 1995; Ben-David and Schuller, 2003; Taylor and Stone, 2007). This is one of the capabilities an AI agent will need to simultaneously learn different types of tasks as proposed in the Preschool scenario given here. And there is a literature on “shaping,” where the idea is to build up the capability of an AI by training it on progressively more difficult versions of the same tasks (Laud and Dejong, 203; Walsh and Littman, 2006). Again, this is one sort of capability an AI will need to possess if it is to move up some type of curriculum, such as a school curriculum. While we applaud the work done on multitask learning
and shaping, we feel that exploring these processes using mathematical abstractions, or in the domain of various machine-learning or robotics test problems, may not adequately address the problem of AGI. The problem is that generalization among tasks, or from simpler to more difficult versions of the same task, is a process whose nature may depend strongly on the overall nature of the set of tasks and task-versions involved. Real-world tasks have a subtlety of interconnectedness and developmental course that is not captured in current mathematical learning frameworks nor standard AI test problems. To put it mathematically, we suggest that the universe of real-world human tasks has a host of “special statistical properties” that have implications regarding what sorts of AI programs will be most suitable; and that, while exploring and formalizing the nature of these statistical properties is important, an easier and more reliable approach to AGI testing is to create a testing environment that embodies these properties implicitly, via its being an emulation of the cognitively meaningful aspects of the real-world human learning environment. One way to see this point vividly is to contrast the current proposal with the “General Game Player” AI competition, in which AIs seek to learn to play games based on formal descriptions of the rules. 1 Clearly doing GGP well requires powerful AGI; and doing GGP even mediocrely probably requires robust multitask learning and shaping. But we suspect GGP is far inferior to AGI Preschool as an approach to testing early-stage AI programs aimed at roughly humanlike intelligence. This is because, unlike the tasks involved in AI Preschool, the tasks involved in doing simple instances of GGP seem to have little relationship to humanlike intelligence or realworld human tasks. Multiple Intelligences Intelligence testing is, we suggest, best discussed and pursued in the context of a theoretical interpretation of “intelligence.” As there is yet no consensus on the best such interpretation (Legg and Hutter (2006) present a summary of over 70 definitions of intelligence presented in the research literature), this is a somewhat subtle point. In our own prior work we have articulated a theory based on the notion of intelligence as “the ability to achieve complex goals in complex environments,” a definition that relates closely to Legg and Hutter’s (2007) more rigorous definition in terms of statistical decision theory. However, applying this sort of theory to practical intelligence testing seems very difficult, in that it requires an assessment of the comparative complexity of various real-world tasks and environments. As real-world tasks and environments are rarely well-formalized, one’s only pragmatic option is to assess complexity in terms of human common sense or natural language, which is an approach fraught with “hidden rocks,” though it might prove fruitful 1 http://games.stanford.edu/ 32 if pursued with sufficient care and effort. Here we have chosen an alternate, complementary approach, choosing as our inspiration Gardner’s (1983) multiple intelligences (MI) framework -- a psychological approach to intelligence assessment based on the idea that different people have mental strengths in different highlevel domains, so that intelligence tests should contain aspects that focus on each of these domains separately. MI does not contradict the “complex goals in complex environments” view of intelligence, but rather may be interpreted as making specific commitments regarding which complex tasks and which complex environments are most important for roughly human-like intelligence. MI does not seek an extreme generality, in the sense that it explicitly focuses on domains in which humans have strong innate capability as well as general-intelligence capability; there could easily be non-human intelligences that would exceed humans according to both the commonsense human notion of “general intelligence” and the generic “complex goals in complex environments” or Hutter/Legg-style definitions, yet would not equal humans on the MI criteria. This strong anthropocentrism of MI is not a problem from an AGI perspective so long as one uses MI in an appropriate way, i.e. only for assessing the extent to which an AGI system displays specifically human-like general intelligence. This restrictiveness is the price one pays for having an easily articulable and relatively easily implementable evaluation framework. Table 1 sumarizes the types of intelligence included in Gardner’s MI theory. Later on, we will suggest that one way to assess an AGI Preschool implementation is to ask how well it covers all the bases outlined in MI theory. Intelligence Type Linguistic Logical- Mathematical Musical Bodily- Kinesthetic Aspects words and language, written and spoken; retention, interpretation and explanation of ideas and information via language, understands relationship between communication and meaning logical thinking, detecting patterns, scientific reasoning and deduction; analyse problems, perform mathematical calculations, understands relationship between cause and effect towards a tangible outcome musical ability, awareness, appreciation and use of sound; recognition of tonal and rhythmic patterns, understands relationship between sound and feeling body movement control, manual dexterity, physical agility and balance; eye and body coordination Spatial-Visual visual and spatial perception; interpretation and creation of images; pictorial imagination and expression; understands relationship between images and meanings, and between space and effect Interpersonal perception of other people’s feelings; relates to others; interpretation of behaviour and communications; understands relationships between people and their situations Table 1. Types of Intelligence, Aspects and Testing
Page 2 and 3: Ben Goertzel, Pascal Hitzler, Marcu
Page 4 and 5: Artificial General Intelligence Vol
Page 6: Preface Artificial General Intellig
Page 9 and 10: Organizing Committee Tsvi Achler U.
Page 11 and 12: A formal framework for the symbol g
Page 13 and 14: Importing Space-time Concepts Into
Page 15 and 16: one structure, everything else adap
Page 17 and 18: generalize is essential to understa
Page 19 and 20: First Application The above framewo
Page 21 and 22: problem solving, use of knowledge,
Page 23 and 24: Of particular note is the separatio
Page 25 and 26: Conclusions and Future Work While p
Page 27 and 28: Conceptual Spaces The conceptual ar
Page 29 and 30: perceptions and actions. The lingui
Page 31 and 32: means of the knowledge stored in th
Page 33 and 34: grams given some (syntactically res
Page 35 and 36: For example, let reverse([]) = [] r
Page 37 and 38: Igor2 produced solutions with auxil
Page 39 and 40: evolve neural net modules as quickl
Page 41 and 42: ain”, consisting of some dozen or
Page 43: Population Chr NListPtr m Chrm is a
Page 47 and 48: Table 2 reviews the key capabilitie
Page 49 and 50: One way to achieve this goal would
Page 51 and 52: question. Our approach of staying a
Page 53 and 54: H0 δB(H0) δF(H0) H1 δB(H1) δF(H
Page 55 and 56: Discussion and future work Testing
Page 57 and 58: Types of Reasoning Corresponding Fo
Page 59 and 60: Integrating Different Types of Reas
Page 61 and 62: good overview of analogy models can
Page 63 and 64: ��
Page 65 and 66: � ��
Page 67 and 68: ��
Page 69 and 70: A Unified Framework for IP Conditio
Page 71 and 72: negative evidence when added to a r
Page 73 and 74: ing strategy. Due to GOLEM’s rand
Page 75 and 76: any state index, n∈IN the current
Page 77 and 78: is the best observation summary, wh
Page 79 and 80: to a code for the rewards only, whi
Page 81 and 82: have exponentially many states (2O(
Page 83 and 84: where ˆσ 2 =Loss( ˆw)/n. Given
Page 85 and 86: • The cost function can be improv
Page 87 and 88: decides to introduce inflation or d
Page 89 and 90: € € The diffusion matrix is the
Page 91 and 92: tuning may be done adaptively by te
Page 93 and 94: Of course, Harnad’s argument is n
Page 95 and 96:
By exploring variations of Harnad
Page 97 and 98:
Discussion The representational sys
Page 99 and 100:
the cognitive map is less of a map
Page 101 and 102:
models of cognition except in cases
Page 103 and 104:
7(b), we can see that the straight
Page 105 and 106:
eaction times, error rates, or exac
Page 107 and 108:
comparing behavior with and without
Page 109 and 110:
Doorenbos, R. B. 1994. Combining le
Page 111 and 112:
egion, we leave the type of object
Page 113 and 114:
extract features in support of reco
Page 115 and 116:
with a minimum score of -1.0. The
Page 117 and 118:
the NASA Human Error Modeling compa
Page 119 and 120:
whether their models are capable of
Page 121 and 122:
Incorporating Planning and Reasonin
Page 123 and 124:
see an unclaimed ten-dollar bill al
Page 125 and 126:
swering user queries) and the state
Page 127 and 128:
Program Representation for General
Page 129 and 130:
distribution P corresponding to the
Page 131 and 132:
with all Ei replaced by the unbound
Page 133 and 134:
Consciousness in Human and Machine:
Page 135 and 136:
at the sensory receptors, is pre-pr
Page 137 and 138:
The second choice is to take a deta
Page 139 and 140:
Hebbian Constraint on the Resolutio
Page 141 and 142:
The Case of Inputs that Change by T
Page 143 and 144:
This route is relevant if the noise
Page 145 and 146:
1. Oculus Info. Inc. Toronto, Ont.
Page 147 and 148:
Our implemented Turing judge used a
Page 149 and 150:
2.7; Human vs. JabberWacky t(157.6)
Page 151 and 152:
UnsupervisedSegmentationofAudioSpee
Page 153 and 154:
FinallyweusedthediscreteFourierTran
Page 155 and 156:
Secondly,thereexistsmorethanonelogi
Page 157 and 158:
Parsing PCFG within a General Proba
Page 159 and 160:
known in advance, although many imp
Page 161 and 162:
Figure 2: Shows the underlying weig
Page 163 and 164:
Self-Programming: Operationalizing
Page 165 and 166:
expressivity. More over, self-progr
Page 167 and 168:
existence of a distance function ov
Page 169 and 170:
Bootstrap Dialog: A Conversational
Page 171 and 172:
elation is typically a verb. In the
Page 173 and 174:
node RDF proposition N1 N2 N3 N4 N5
Page 175 and 176:
Analytical Inductive Programming as
Page 177 and 178:
Problem domain: puttable(x) PRE: cl
Page 179 and 180:
eq Hanoi(0, Src, Aux, Dst, S) = mov
Page 181 and 182:
Human and Machine Understanding Of
Page 183 and 184:
case orderings t correspond to the
Page 185 and 186:
in (1) received a different denotat
Page 187 and 188:
Abstract This paper analyzes the di
Page 189 and 190:
Having grounded meaning: In an inte
Page 191 and 192:
to experience” can be argued to b
Page 193 and 194:
Abstract Case-by-case Problem Solvi
Page 195 and 196:
First, since NARS is designed in th
Page 197 and 198:
typical response is to find such an
Page 199 and 200:
What Is Artificial General Intellig
Page 201 and 202:
Finally, the facts that both seem t
Page 203 and 204:
differ. NARS uses Narsese, the fair
Page 205 and 206:
Integrating Action and Reasoning th
Page 207 and 208:
states, with the condition that the
Page 209 and 210:
action model. The system is able to
Page 211 and 212:
Neuroscience and AI Share the Same
Page 213 and 214:
Relevance Based Planning: Why Its a
Page 215 and 216:
General Intelligence and Hypercompu
Page 217 and 218:
To appear, AGI-09 1 Stimulus proces
Page 219 and 220:
Distribution of Environments in For
Page 221 and 222:
The Importance of Being Neural-Symb
Page 223 and 224:
Improving the Believability of Non-
Page 225 and 226:
Understanding the Brain’s Emergen
Page 227 and 228:
Abstract The challenge of creating
Page 229 and 230:
Importing Space-time Concepts Into
Page 231 and 232:
HELEN: Using Brain Regions and Mech
Page 233 and 234:
Holistic Intelligence: Transversal
Page 235 and 236:
Achieving Artificial General Intell
Page 237:
Achler, Tsvi . . . . . . . . . . .
show all

A Framework for Evaluating Early-Stage Human - of Marcus Hutter

Create successful ePaper yourself

Delete template?

Save as template?