11.04.2014 Views

Advances in Intelligent Systems Research - of Marcus Hutter

Advances in Intelligent Systems Research - of Marcus Hutter

Advances in Intelligent Systems Research - of Marcus Hutter

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

data.<br />

However, TP alone performs very poorly on natural language,<br />

a fact which has not escaped opponents <strong>of</strong> the view<br />

that word segmentation is driven by distributional properties<br />

rather than <strong>in</strong>nate knowledge about language. L<strong>in</strong>guistic nativists<br />

such as Gambell and Yang (GY05), argue that this<br />

failure <strong>of</strong> TP to scale up to natural language suggests that<br />

the statistical segmentation ability that children possess is<br />

limited and likely orthogonal to a more powerful segmentation<br />

ability driven by <strong>in</strong>nate l<strong>in</strong>guistic knowledge. Gambell<br />

and Yang demonstrate that an algorithm based on l<strong>in</strong>guistic<br />

constra<strong>in</strong>ts (specifically, constra<strong>in</strong>ts on the pattern <strong>of</strong> syllable<br />

stress <strong>in</strong> a word) significantly outperforms TP when<br />

segment<strong>in</strong>g a corpus <strong>of</strong> phonetically-encoded child-directed<br />

speech. In fact, VE can further outperform Gambell and<br />

Yang’s method (F=0.953 vs. F=0.946) even though VE has<br />

no prior knowledge <strong>of</strong> l<strong>in</strong>guistic constra<strong>in</strong>ts, suggest<strong>in</strong>g that<br />

add<strong>in</strong>g <strong>in</strong>nate knowledge may not be as useful as simply <strong>in</strong>creas<strong>in</strong>g<br />

the power <strong>of</strong> the chunk<strong>in</strong>g method.<br />

Algorithms like VE and PtM provide a counter-argument<br />

to the nativist position, by fully expla<strong>in</strong><strong>in</strong>g the results that<br />

Saffran et al. observed, and also perform<strong>in</strong>g very well at segment<strong>in</strong>g<br />

natural language. When represented symbolically<br />

as a sequence <strong>of</strong> phonemes, VE perfectly segments the simple<br />

artificial language generated by Saffran et al. (SAN96),<br />

while also perform<strong>in</strong>g well <strong>in</strong> the segmentation <strong>of</strong> childdirected<br />

speech. Miller et al. (MWS09) re<strong>in</strong>force this case<br />

by replicat<strong>in</strong>g the experimental setup <strong>of</strong> Saffran et al., but<br />

feed<strong>in</strong>g the speech <strong>in</strong>put to VE <strong>in</strong>stead <strong>of</strong> a child. The audio<br />

signal had to be discretized before VE could segment it, but<br />

VE was able to achieve an accuracy <strong>of</strong> 0.824.<br />

Vision<br />

Miller and Stoytchev (MS08) applied VE <strong>in</strong> a hierarchical<br />

fashion to perform a visual task similar to optical character<br />

recognition (OCR). The <strong>in</strong>put was an image conta<strong>in</strong><strong>in</strong>g<br />

words written <strong>in</strong> a particular font. VE was to first segment<br />

this image <strong>in</strong>to short sequences correspond<strong>in</strong>g to letters, and<br />

then chunk the short sequences <strong>in</strong>to longer sequences correspond<strong>in</strong>g<br />

to words. The image was represented as a sequence<br />

<strong>of</strong> columns <strong>of</strong> pixels, where each pixel was either<br />

black or white. Each <strong>of</strong> these pixel columns can be represented<br />

by a symbol denot<strong>in</strong>g the particular pattern <strong>of</strong> black<br />

and white pixels with<strong>in</strong> it, thus creat<strong>in</strong>g a sequence <strong>of</strong> symbols<br />

to serve as <strong>in</strong>put to VE. Depend<strong>in</strong>g on the font used, VE<br />

scored between F=0.751 and F=0.972 on segment<strong>in</strong>g this<br />

first sequence.<br />

After f<strong>in</strong>d<strong>in</strong>g letters, VE had to chunk these letters together<br />

<strong>in</strong>to words, which is essentially the same as the wellstudied<br />

word segmentation problem except with some noise<br />

added to the identification <strong>of</strong> each character. VE was still<br />

able to perform the task, with scores rang<strong>in</strong>g from F=0.551<br />

to F=0.754 for the three fonts. With perfect letter identification,<br />

VE scored F=0.776.<br />

Robot Behaviors<br />

Cohen et al. (CAH07) tested VE on data generated by a<br />

mobile robot, a Pioneer 2 equipped with sonar and a pantilt-zoom<br />

camera runn<strong>in</strong>g a subsumption architecture. The<br />

robot wandered around a large playpen for 20-30 m<strong>in</strong>utes<br />

look<strong>in</strong>g for <strong>in</strong>terest<strong>in</strong>g objects, which it would orbit for a<br />

few m<strong>in</strong>utes before mov<strong>in</strong>g on. At one level <strong>of</strong> abstraction,<br />

the robot engaged <strong>in</strong> four types <strong>of</strong> behaviors: wander<strong>in</strong>g,<br />

avoid<strong>in</strong>g, orbit<strong>in</strong>g and approach<strong>in</strong>g. Each behavior was implemented<br />

by sequences <strong>of</strong> actions <strong>in</strong>itiated by controllers<br />

such as move-forward and center-camera-on-object. The<br />

challenge for Vot<strong>in</strong>g Experts was to f<strong>in</strong>d the boundaries <strong>of</strong><br />

the four behaviors given only <strong>in</strong>formation about which controllers<br />

were on or <strong>of</strong>f.<br />

This experiment told us that the encod<strong>in</strong>g <strong>of</strong> a sequence<br />

matters: When the cod<strong>in</strong>g produced shorter behaviors (average<br />

length <strong>of</strong> 7.95 time steps), VE’s performance was comparable<br />

to that <strong>in</strong> earlier experiments (F=0.778), but when<br />

the cod<strong>in</strong>g produced longer behaviors, performance is very<br />

much worse (F=0.183). This is because very long episodes<br />

are unique, so most locations <strong>in</strong> very long episodes have zero<br />

boundary entropy and frequency equal to one. And when the<br />

w<strong>in</strong>dow size is very much smaller than the episode length,<br />

then there will be a strong bias to cut the sequence <strong>in</strong>appropriately.<br />

Instruction <strong>of</strong> an AI Student<br />

The goal <strong>of</strong> the DARPA’s Bootstrapped Learn<strong>in</strong>g (BL)<br />

project is to develop an “electronic student” that can be <strong>in</strong>structed<br />

by human teachers, <strong>in</strong> a natural manner, to perform<br />

complex tasks. Currently, <strong>in</strong>teraction with the electronic student<br />

is not very different from high-level programm<strong>in</strong>g. Our<br />

goal is to replace many <strong>of</strong> the formal cues or “signposts” that<br />

enable the electronic student to follow the teacher, mak<strong>in</strong>g<br />

the <strong>in</strong>teraction between them more natural. VE can largely<br />

replace one <strong>of</strong> these cues: the need to <strong>in</strong>form the student<br />

whenever the teacher’s <strong>in</strong>struction method changes.<br />

In BL, teachers communicate with the student <strong>in</strong> a language<br />

called Interl<strong>in</strong>gua language (IL). Some IL messages<br />

serve only to notify the student that a “Lesson Epoch” (LE)<br />

has ended.<br />

Several curricula have been developed for BL. VE f<strong>in</strong>ds<br />

LE boundaries with high accuracy <strong>in</strong> all <strong>of</strong> them – and can<br />

be tra<strong>in</strong>ed on one and tested on another to good effect. To<br />

illustrate, we will present results for the Unmanned Aerial<br />

Vehicle (UAV) doma<strong>in</strong>. To study the detection <strong>of</strong> LE boundaries,<br />

a tra<strong>in</strong><strong>in</strong>g corpus was generated from version 2.4.01<br />

<strong>of</strong> the UAV curriculum by remov<strong>in</strong>g all <strong>of</strong> the messages that<br />

<strong>in</strong>dicate boundaries between LEs. This tra<strong>in</strong><strong>in</strong>g corpus conta<strong>in</strong>s<br />

a total <strong>of</strong> 742 LEs. A separate corpus consist<strong>in</strong>g <strong>of</strong> 194<br />

LEs served as a test corpus. As the teacher should never have<br />

to provide LE boundaries, the problem is treated as unsupervised<br />

and both the tra<strong>in</strong><strong>in</strong>g and test corpora are stripped <strong>of</strong><br />

all boundary <strong>in</strong>formation.<br />

Each <strong>in</strong>dividual message <strong>in</strong> the corpus is a recursive structure<br />

<strong>of</strong> IL objects that together express a variety <strong>of</strong> relations<br />

about the concepts be<strong>in</strong>g taught and the state <strong>of</strong> teach<strong>in</strong>g.<br />

LEs are def<strong>in</strong>ed more by the structure <strong>of</strong> the message sequence<br />

than the full content <strong>of</strong> each message. Thus, we represent<br />

each message as a s<strong>in</strong>gle symbol, formed by concatenat<strong>in</strong>g<br />

the IL type <strong>of</strong> the two highest composite IL objects<br />

(generally equivalent to the message’s type and subtype).<br />

The sequence <strong>of</strong> structured messages is thus translated <strong>in</strong>to<br />

34

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!