22.07.2013 Views

PhD Document - Universidad de Las Palmas de Gran Canaria

PhD Document - Universidad de Las Palmas de Gran Canaria

PhD Document - Universidad de Las Palmas de Gran Canaria

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 1. INTRODUCTION<br />

[Liao et al., 1997], eyeglasses, expression, pose [Beymer, 1993, Gross et al., 2004], image<br />

resolution [Wang et al., 2004], aging, etc. Pose experiments, for example, show that per-<br />

formance is stable when the angle between a frontal image and a probe is less than 25 <strong>de</strong>-<br />

grees and that performance dramatically falls off when the angle is greater than 40 <strong>de</strong>grees<br />

[Blackburn et al., 2001]. As for the other factors, acceptable performances can be achieved<br />

only un<strong>de</strong>r certain circumstances, see Table 1.2.<br />

Category False reject rate<br />

Same day, same illumination 0.4 %<br />

Same day, different illumination 9 %<br />

Different days 11 %<br />

Different days over 1.5 years apart 43 %<br />

Table 1.2: Face verification error (for a fixed false alarm rate of 2%) when test conditions<br />

differ from training conditions (taken from [Martin et al., 2000]).<br />

Also, speech recognition performance <strong>de</strong>creases catastrophically during natural spon-<br />

taneous interaction. Factors like speaking style, hyperarticulation (speaking in a more careful<br />

and clarified manner) and emotional state of the speaker significantly <strong>de</strong>gra<strong>de</strong> word recog-<br />

nition rates [Oviatt, 2000]. Above all, environmental noise is consi<strong>de</strong>red to be the worst<br />

obstacle [Wenger, 2003]. The mouth-microphone distance is in this respect crucial. The<br />

typical achievable recognition rate for large-vocabulary speaker-in<strong>de</strong>pen<strong>de</strong>nt speech recog-<br />

nition is about 80%-90% for clear environment, but can be as low as 50% for scenarios like<br />

cellular phone with background noise.<br />

Sound localization is very hard to achieve in noisy environments. Sound signals<br />

tend to interfere with each other and are transformed by collisions with furniture and people<br />

[Good and Gilkey, 1996, Shinn-Cunningham, 2003]. On the other hand, the physical con-<br />

struction of the robot itself can have a negative effect on sound localization (in humans, it<br />

has been <strong>de</strong>monstrated that hearing protection affects our sound localization abilities.<br />

In summary, there is the impression (especially among researchers) that performance<br />

would <strong>de</strong>gra<strong>de</strong> up to unacceptable levels if conditions were different from those used to test<br />

the implementations. In test scenarios, performance is acceptable. However, it would seem<br />

that there is little guarantee that it remains at the same levels for future, unseen conditions<br />

and samples. How can we explain this negative impression? Note that it does not appear for<br />

other types of robots, say industrial manipulators, where the robot performance is somehow<br />

"un<strong>de</strong>r control". This leads us to the important question: is building a social robot in any<br />

sense different than building other kinds of robots?<br />

10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!