12.07.2015 Views

A Socially Assistive Robot Exercise Coach for the Elderly

A Socially Assistive Robot Exercise Coach for the Elderly

A Socially Assistive Robot Exercise Coach for the Elderly

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Fasola & Matarić. A SAR <strong>Exercise</strong> <strong>Coach</strong> <strong>for</strong> <strong>the</strong> <strong>Elderly</strong>Figure 2. Diagram of system architecture, including <strong>the</strong> six system modules: vision andworld model, speech, user communication, behaviors, action output, and databasemanagement.3.4 System ArchitectureThe system architecture is comprised of six independent software modules: vision andworld model, speech, user communication, behaviors, robot action, and databasemanagement. A diagram of <strong>the</strong> system architecture, showing <strong>the</strong> connections among <strong>the</strong>system modules, is provided in Figure 2. The following briefly describes <strong>the</strong> role of eachsystem module:Vision and World Model Module. This module is responsible <strong>for</strong> providing in<strong>for</strong>mationregarding <strong>the</strong> state of <strong>the</strong> user to <strong>the</strong> behavior module <strong>for</strong> <strong>the</strong> robot to make task-baseddecisions during interaction; <strong>for</strong> example, by providing spoken or demonstrativefeedback,prompts <strong>for</strong> user movement, or by imitating user arm poses. The input <strong>for</strong> <strong>the</strong>visual user activity recognition procedure is a monocular USB camera, with a frameresolution of 640x480 pixels. The presence of <strong>the</strong> user and <strong>the</strong> locations of <strong>the</strong> face,hands, and arm angles of <strong>the</strong> user are captured by <strong>the</strong> vision process and stored in <strong>the</strong>world model <strong>for</strong> subsequent use. The details of <strong>the</strong> vision algorithm <strong>for</strong> user staterecognition are provided in Section 3.5.Speech Module. The speech module is responsible <strong>for</strong> translating <strong>the</strong> speech requestsfrom <strong>the</strong> behaviors, provided as text strings, into spoken words (syn<strong>the</strong>sized audio). Thismodule uses <strong>the</strong> commercially available NeoSpeech text-to-speech (TTS) engine tosyn<strong>the</strong>size <strong>the</strong> given text into communicative speech. The speech module plays <strong>the</strong>syn<strong>the</strong>sized speech through <strong>the</strong> robot’s speakers, and is also responsible <strong>for</strong>synchronizing <strong>the</strong> lip movements of <strong>the</strong> robot to match <strong>the</strong> start and end of spokenutterances. To accomplish this synchronization, <strong>the</strong> module analyzes <strong>the</strong> wave file11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!