13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm Universitymation is provided on the magnitude of these“minute involuntary changes” but the wordingconveys the impression that these are very subtlechanges in the amplitude and time structureof the speech signal. A reasonable assumptionis to expect the order of magnitude of such "involuntarychanges" to be at least one or two ordersof magnitude below typical values forspeech signals, inevitably leading to the firstissue along the series of ungrounded claimsmade by Nemesysco. If the company's referenceto "minute changes" is to be taken seriously,then such changes are at least 20 dB below thespeech signal's level and therefore masked bytypical background noise. For a speech waveformcaptured by a standard microphone in acommon reverberant room, the magnitude ofthese "minute changes" would be comparable tothat of the disturbances caused by reflections ofthe acoustic energy from the walls, ceiling andfloor of the room. In theory, it could be possibleto separate the amplitude fluctuations caused byroom acoustics from fluctuations associatedwith the presumed “involuntary changes” butthe success of such separation procedure iscritically dependent on the precision with whichthe acoustic signal is represented and on theprecision and adequacy of the models used torepresent the room acoustics and the speaker'sacoustic output. This is a very complex problemthat requires multiple sources of acoustic informationto be solved. Also the reliability ofthe solutions to the problem is limited by factorslike the precision with which the speaker'sdirect wave-front (originating from thespeaker’s mouth, nostrils, cheeks, throat, breastand other radiating surfaces) and the roomacoustics can be described. Yet another issueraised by such “sound signatures” is that theyare not even physically possible given themasses and the forces involved in speech production.The inertia of the vocal tract walls, velum,vocal folds and the very characteristics ofthe phonation process lead to the inevitableconclusion that Nemesysco’s claims of pickingup that type of "sound signatures" from thespeaker’s speech waveform are simply not realistic.It is also possible that these “minutechanges” are thought as spreading over severalperiods of vocal-fold vibration. In this case theywould be observable but typically not “involuntary”.Assuming for a moment that the signalpicked up by Nemesysco’s system would not becontaminated with room acoustics and backgroundnoise, the particular temporal profile ofthe waveform is essentially created by the vocaltract’s response to the pulses generated by thevocal folds’ vibration. However these pulsesare neither “minute” nor “involuntary”. Thechanges observed in the details of the waveformscan simply be the result of the superpositionof pulses that interfere at different delays.In general, the company’s descriptions of themethods and principles are circular, inconclusiveand often incorrect. This conveys the impressionof superficial knowledge of acousticphonetics, obviously undermining the credibilityof Nemesysco’s claims that the LVAtechnologyperforms a sophisticated analysis ofthe speech signal. As to the claim that the productsmarketed by Nemesysco would actually beable to detect the speaker’s emotional status,there is no known independent evidence to supportit. Given the current state of knowledge,unless the company is capable of presentingscientifically sound arguments or at least producingindependently and replicable empiricaldata showing that there is a significant differencebetween their systems’ hit and false-alarmrates, Nemesysco’s claims are unsupported.How LVA-technology worksThis section examines the core principles ofNemesysco’s LVA-technology, as available inthe Visual Basic Code in the method’s patent.Digitizing the speech signalFor a method claiming to use information fromminute details in the speech wave, it is surprisingthat the sampling frequency and the samplesizes are as low as 11.025 kHz and 8 bit persample. By itself, this sampling frequency isacceptable for many analysis purposes but,without knowing which information the LVAtechnologyis supposed to extract from the signal,it is not possible to determine whether11.025 kHz is appropriate or not. In contrast,the 8 bit samples inevitably introduce clearlyaudible quantification errors that preclude theanalysis of “minute details”. With 8 bit samplesonly 256 levels are available to encode thesampled signal’s amplitude, rather than 65536quantization levels associated with a 16 bitsample. In acoustic terms this reduction in samplelength is associated with a 48 dB increase of221

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!