Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
Elektronika 2009-11.pdf - Instytut Systemów Elektronicznych
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
[23] W3C - Web Services Glossary - http://www.w3.org/TR/ws-gloss/,<br />
<strong>2009</strong>.<br />
[24] Wayner, P.: Disappearing cryptography 3rd Edition: information<br />
hiding: steganography & watermarking. Amsterdam: MK/Morgan<br />
Kaufmann Publishers. ISBN 978-0123744791, <strong>2009</strong>.<br />
[25] Westfeld, A., High capacity despite better steganalysis (F5-a<br />
steganographic algorithm). Information Hiding, 4th International<br />
Workshop, volume 2137 of LNCS, Springer-Verlag, New York,<br />
289-302, 2001.<br />
[26] Wu, W., Yang, Z., Nahrstedt, K., Kurillo, G., Bajcsy, R.: Towards<br />
multi-site collaboration in tele-immersive environments.<br />
Proc. of ACM Multimedia (MM’07) (short paper), Augsburg,<br />
Germany, 2007.<br />
The use of speech recognition and user verification<br />
in closed-circuit television systems<br />
(Zastosowanie rozpoznawania mowy i weryfikacji użytkownika<br />
w systemach telewizji przemysłowej)<br />
dr inż. MARIUSZ KUBANEK<br />
Politechnika Częstochowska, <strong>Instytut</strong> Informatyki Teoretycznej i Stosowanej<br />
Speech recognition systems, and the verification of persons<br />
on the basis of independent speech are widely used. Speech<br />
is the most natural way for humans to communicate with each<br />
other. Over the past decade, much work has been done in<br />
man-machine communications in order to incorporate speech<br />
as a new modality in multimedia applications. The greatest interest<br />
is in two areas which have received considerable interest:<br />
speech recognition, in which the aim is for the machine to<br />
extract and understand the linguistic message in the speech,<br />
and speaker recognition, where the goal is to identify, recognize<br />
or verify the speaker responsible for producing the<br />
speech. Speech recognition systems are used in mobile<br />
phones for dial voice, in operating systems to voice control of<br />
different applications, in text editors to impose sentences, to<br />
recognize voice commands in cars, etc. User identification and<br />
verification based on speech are most often used in access<br />
control systems. The variety of applications of automatic<br />
speech recognition systems, for human computer interfaces,<br />
telephony, or robotics has driven the research of a large scientific<br />
community [1,3].<br />
The most important problem in process of speech recognition<br />
and speaker identification or verification is suitable coding<br />
of signal audio [4]. In general, speech coding is a procedure to<br />
represent a digitized speech signal using a few bits as possible,<br />
maintaining at the same time a reasonable level of speech quality.<br />
Speech coding has matured to the point where it now constitutes<br />
an important application area of signal processing. Due<br />
to the increasing demand for speech communication, speech<br />
coding technology has received augmenting levels of interest<br />
from the research, standardization, and business communities.<br />
Advances in microelectronics and the vast availability of lowcost<br />
programmable processors and dedicated chips have enabled<br />
rapid technology transfer from research to product<br />
development; this encourages the research community to investigate<br />
alternative schemes for speech coding, with the objectives<br />
of overcoming deficiencies and limitations. To<br />
standardization community pursues the establishment of standard<br />
speech coding methods for various applications that will<br />
be widely accepted and implemented by the industry. The business<br />
communities capitalize on the ever-increasing demand<br />
and opportunities in the consumer, corporate and network environments<br />
for speech processing products [1,4].<br />
In this work, it was proposed use speech recognition<br />
method to control the movement of the camera of closed-circuit<br />
television system, and use user verification method to log<br />
on to this system. To extraction of the audio features of person’s<br />
speech, in this work it was applied modified mechanism<br />
of cepstral speech analysis. For acoustic speech coding was<br />
used twenty dimensional MFCC (Mel Frequency Cepstral Coefficients)<br />
as the standard audio features. Speech recognition<br />
is done using hidden Markov models.<br />
Preliminary process of signal<br />
Analysis of audio channel one should to begin from filtration<br />
of signal, removing elements of signal being him disturbances.<br />
In working system in conditions approximate to ideal it was<br />
been possible to skip stage of preliminary filtration in aim of<br />
acceleration of working. In real conditions of work, signal of<br />
audio speech is often considerably disturbed, therefore in<br />
work was applied preliminary filtration [2].<br />
In system of recognizing of isolated word to control the<br />
movement of the camera of closed-circuit television, during<br />
recordings is necessary making a short-lived but clear pauses<br />
in form of silence among individual words. In view of remember<br />
kind of recognizing, after preliminary filtration of signal<br />
next stage is emission clean, proper audio signal, across removal<br />
of silence from before and behind signal [2]. In this<br />
work, it was applied two joint methods of removing of redundancy<br />
silence. First, from the base on calculation energy of<br />
signal and rejection of all samples, no exceeding receive<br />
threshold of energy. Entrance signal is divided onto frame<br />
boxes of 256 samples. Size of frame boxes depends from frequency<br />
of sampling. It was applied frequency of sampling<br />
8000 Hz. Then for every frame box is counted her energy.<br />
Choosing suitably threshold it was been possible to mark<br />
frame box, including beginning of recorded word. Such frame<br />
box crosses receive threshold of energy. Since in moment of<br />
beginning of recording, first some frame boxes contain only<br />
silence, it was been possible to mark threshold of energy on<br />
basis of initial frame boxes. In work, it was accepted doubled<br />
sum of energy of three first frame boxes as threshold. Second<br />
from methods counts number of changes of value of samples<br />
of signal from smaller on larger and onto retreat, in<br />
ELEKTRONIKA 11/<strong>2009</strong> 65