29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Rapid Configuration & Instruction Selection <strong>for</strong> an ASIP 411<br />

4. SPEECH RECOGNITION PROGRAM & ANALYSIS<br />

We use a speech recognition application as a case study to <strong>de</strong>monstrate the<br />

effectiveness of our approach. This application provi<strong>de</strong>s user voice control<br />

over Unix commands in a Linux’ shell environment. It employs a template<br />

matching based recognition approach‚ which requires the user to record at<br />

least four samples <strong>for</strong> each Unix command that he/she wants to use. These<br />

samples are recor<strong>de</strong>d and are stored in a file. Moreover‚ since this software<br />

involves a user’s voice‚ <strong>for</strong> the sake of consistency‚ a test bench is recor<strong>de</strong>d<br />

and is stored in another file.<br />

The application consists of three main sections: record‚ pre-process‚ and<br />

recognition. In the “record” section‚ it first loads the configuration of the<br />

speaker‚ then it loads the pre-recor<strong>de</strong>d test bench as inputs‚ and the prerecor<strong>de</strong>d<br />

samples into memory. After that‚ it puts the test bench into a queue<br />

passing through to the next section. In the “pre-process” section‚ it copies<br />

the data from the queue and divi<strong>de</strong>s the data into frame size segments. Then‚<br />

it per<strong>for</strong>ms a filtering process using the Hamming window and applies singleprecision<br />

floating-point 256-point FFT algorithm to the input data to minimize<br />

the work done in the following recognition section. Afterwards‚ it calculates<br />

the power spectrum of each frame and puts these frames back into the queue.<br />

Finally‚ in the “recognition” section‚ it implements the template matching<br />

approach utilising Euclid’s distance measure in which it compares the input<br />

data with the pre-recor<strong>de</strong>d samples that are loa<strong>de</strong>d in memory during the<br />

“record” section. In the next step‚ it stores the compared results in another<br />

queue. If the three closest matches of pre-recor<strong>de</strong>d samples are the same<br />

command‚ then the program executes the matched Unix command. However‚<br />

if the input data does not match the pre-recor<strong>de</strong>d samples‚ the program goes<br />

back to the “record” section to load the input again‚ and so on. Figure 30-4<br />

shows the block diagram of speech recognition program.<br />

As this application involves a human interface and operates in sequence‚<br />

it is necessary to set certain real-time constraints. For example‚ it is assumed<br />

that each voice command should be processed within 1 second after the user<br />

finished his/her command. So‚ “record” section should take less than 0.25s<br />

to put all the data into a queue. While “pre-process” and “recognition” should<br />

consume less than 0.5s and 0.25s respectively.<br />

Through a profiler we analysed the speech recognition software in a first<br />

approximation. Table 30-2 shows the percentage of selected software functions<br />

that are involved in this application in different Xtensa processors<br />

configurations that we <strong>de</strong>note as P1‚ P2 and P3. P1 is an Xtensa processor<br />

with the minimal configurable core options. P2 is a processor with Vectra DSP<br />

Engine and its associated configurable core options. P3 is a processor with a<br />

floating-point unit and its associated configurable core options (more<br />

configurations are possible‚ but <strong>for</strong> the sake of efficacy‚ only these three<br />

configurations are shown here). Moreover‚ in Table 30-2‚ the application spent<br />

13.4% of time calling the single precision square root function in Xtensa

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!