24.11.2014 Views

Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych

Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych

Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

[23] W3C - Web Services Glossary - http://www.w3.org/TR/ws-gloss/,<br />

<strong>2009</strong>.<br />

[24] Wayner, P.: Disappearing cryptography 3rd Edition: information<br />

hiding: steganography & watermarking. Amsterdam: MK/Morgan<br />

Kaufmann Publishers. ISBN 978-0123744791, <strong>2009</strong>.<br />

[25] Westfeld, A., High capacity despite better steganalysis (F5-a<br />

steganographic algorithm). Information Hiding, 4th International<br />

Workshop, volume 2137 of LNCS, Springer-Verlag, New York,<br />

289-302, 2001.<br />

[26] Wu, W., Yang, Z., Nahrstedt, K., Kurillo, G., Bajcsy, R.: Towards<br />

multi-site collaboration in tele-immersive environments.<br />

Proc. of ACM Multimedia (MM’07) (short paper), Augsburg,<br />

Germany, 2007.<br />

The use of speech recognition and user verification<br />

in closed-circuit television systems<br />

(Zastosowanie rozpoznawania mowy i weryfikacji użytkownika<br />

w systemach telewizji przemysłowej)<br />

dr inż. MARIUSZ KUBANEK<br />

Politechnika Częstochowska, <strong>Instytut</strong> Informatyki Teoretycznej i Stosowanej<br />

Speech recognition systems, and the verification of persons<br />

on the basis of independent speech are widely used. Speech<br />

is the most natural way for humans to communicate with each<br />

other. Over the past decade, much work has been done in<br />

man-machine communications in order to incorporate speech<br />

as a new modality in multimedia applications. The greatest interest<br />

is in two areas which have received considerable interest:<br />

speech recognition, in which the aim is for the machine to<br />

extract and understand the linguistic message in the speech,<br />

and speaker recognition, where the goal is to identify, recognize<br />

or verify the speaker responsible for producing the<br />

speech. Speech recognition systems are used in mobile<br />

phones for dial voice, in operating systems to voice control of<br />

different applications, in text editors to impose sentences, to<br />

recognize voice commands in cars, etc. User identification and<br />

verification based on speech are most often used in access<br />

control systems. The variety of applications of automatic<br />

speech recognition systems, for human computer interfaces,<br />

telephony, or robotics has driven the research of a large scientific<br />

community [1,3].<br />

The most important problem in process of speech recognition<br />

and speaker identification or verification is suitable coding<br />

of signal audio [4]. In general, speech coding is a procedure to<br />

represent a digitized speech signal using a few bits as possible,<br />

maintaining at the same time a reasonable level of speech quality.<br />

Speech coding has matured to the point where it now constitutes<br />

an important application area of signal processing. Due<br />

to the increasing demand for speech communication, speech<br />

coding technology has received augmenting levels of interest<br />

from the research, standardization, and business communities.<br />

Advances in microelectronics and the vast availability of lowcost<br />

programmable processors and dedicated chips have enabled<br />

rapid technology transfer from research to product<br />

development; this encourages the research community to investigate<br />

alternative schemes for speech coding, with the objectives<br />

of overcoming deficiencies and limitations. To<br />

standardization community pursues the establishment of standard<br />

speech coding methods for various applications that will<br />

be widely accepted and implemented by the industry. The business<br />

communities capitalize on the ever-increasing demand<br />

and opportunities in the consumer, corporate and network environments<br />

for speech processing products [1,4].<br />

In this work, it was proposed use speech recognition<br />

method to control the movement of the camera of closed-circuit<br />

television system, and use user verification method to log<br />

on to this system. To extraction of the audio features of person’s<br />

speech, in this work it was applied modified mechanism<br />

of cepstral speech analysis. For acoustic speech coding was<br />

used twenty dimensional MFCC (Mel Frequency Cepstral Coefficients)<br />

as the standard audio features. Speech recognition<br />

is done using hidden Markov models.<br />

Preliminary process of signal<br />

Analysis of audio channel one should to begin from filtration<br />

of signal, removing elements of signal being him disturbances.<br />

In working system in conditions approximate to ideal it was<br />

been possible to skip stage of preliminary filtration in aim of<br />

acceleration of working. In real conditions of work, signal of<br />

audio speech is often considerably disturbed, therefore in<br />

work was applied preliminary filtration [2].<br />

In system of recognizing of isolated word to control the<br />

movement of the camera of closed-circuit television, during<br />

recordings is necessary making a short-lived but clear pauses<br />

in form of silence among individual words. In view of remember<br />

kind of recognizing, after preliminary filtration of signal<br />

next stage is emission clean, proper audio signal, across removal<br />

of silence from before and behind signal [2]. In this<br />

work, it was applied two joint methods of removing of redundancy<br />

silence. First, from the base on calculation energy of<br />

signal and rejection of all samples, no exceeding receive<br />

threshold of energy. Entrance signal is divided onto frame<br />

boxes of 256 samples. Size of frame boxes depends from frequency<br />

of sampling. It was applied frequency of sampling<br />

8000 Hz. Then for every frame box is counted her energy.<br />

Choosing suitably threshold it was been possible to mark<br />

frame box, including beginning of recorded word. Such frame<br />

box crosses receive threshold of energy. Since in moment of<br />

beginning of recording, first some frame boxes contain only<br />

silence, it was been possible to mark threshold of energy on<br />

basis of initial frame boxes. In work, it was accepted doubled<br />

sum of energy of three first frame boxes as threshold. Second<br />

from methods counts number of changes of value of samples<br />

of signal from smaller on larger and onto retreat, in<br />

ELEKTRONIKA 11/<strong>2009</strong> 65

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!