04.02.2013 Views

MAS.632 Conversational Computer Systems - MIT OpenCourseWare

MAS.632 Conversational Computer Systems - MIT OpenCourseWare

MAS.632 Conversational Computer Systems - MIT OpenCourseWare

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

106 VOICE COMMUNICATION WITH COMPUTERS<br />

AfIktion Apprapdie.less<br />

outline a number of design considerations and discuss some interactive techniques<br />

in this light.<br />

Most of the limitations of voice output described earlier can be summarized by<br />

the broader observation that speech takes place over a periodof time. The amount<br />

of time required to listen to a voice recording, the necessity of reciting menu items<br />

in sequence, and much of the difficulty associated with the bulky nature of voice<br />

are all a direct consequence of the temporal nature of speech.<br />

Speed is thus one of the primary factors distinguishing voice from other interface<br />

media; for example, most graphics applications can be written with the<br />

assumption that display operations such as drawing lines or text are performed<br />

nearly instantaneously. Much of the time the application is idle while the user<br />

reads text, picks a menu item, etc.; this is punctuated by sporadic user input<br />

and display update. For telephone-based interfaces, the exact opposite is true;<br />

presentation accounts for most of the total time the user interacts with the<br />

application.<br />

Voice response systems must therefore strive to minimize the amount of time<br />

required to retrieve information: they must be brief, selective about content, and<br />

interruptible. Applications must plan what to say so as to avoid utterances that<br />

are irrelevant by the time they are spoken. A real-time speech output system may<br />

have to formulate conflicting goals and choose among them if there is not suffident<br />

time to say everything, which requires estimating how long it will take to<br />

speak an utterance. The awareness that speech takes time (and acceptance that<br />

time is a commodity of which we never have enough) should permeate the design<br />

process for building any speech application. This argues in favor of shortening<br />

spoken menus and precludes many applications that may be quite feasible in text<br />

such as browsing through a newspaper. Time motivates strategies to filter an<br />

application's output requiring careful judgment as to the relevance of a specific<br />

system comment. The remainder of this section explores these design considerations<br />

in more detail.<br />

Because speech is so difficult to employ effectively, discretion is essential in determining<br />

effective voice applications and in choosing the appropriate speech output<br />

device. Poor quality speech, whether synthesized or recorded, quickly becomes<br />

annoying unless it serves a clear purpose. For example, speech output has been<br />

employed effectively as a warning medium in airplane cockpits. On the contrary,<br />

few car drivers appreciate a tinny voice reminding them to close the doors oftheir<br />

automobile. The cockpit is a busier place, and the consequences of an error may<br />

be more severe. A more effective use of speech in the car may be traffic information<br />

(drivers already tune to commuter traffic reports on their radios) or as part<br />

of a navigation system (as described later in this chapter).<br />

In retrospect, another example of a misguided speech interface is the recitation<br />

of a customer's purchases at a cash register. As each item is scanned by the bar<br />

code reader, its description and price are announced. The original motivation for

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!