13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm Universitythe background noise relative to what wouldhave been possible using a 16-bit/sample representation.It is puzzling that such crude signalrepresentations are used by a technology claimingto work on “details”. But the degradation ofthe amplitude resolution becomes even worseas a “filter” that introduces a coarser quantizationusing 3 units’ steps reduces the 256 levelsof the 8-bit representation to only 85 quantizationlevels (ranging from -42 to +42). This verylow sample resolution (something around 6.4-bit/sample), resulting in a terrible sound quality,is indeed the basis for all the subsequent signalprocessing carried out by the LVA-technology.The promise of an analysis of “minute” detailsin the speech waveform cannot be taken seriously.Figure 1 displays a visual analogue of thesignal degradation introduced by the LVAtechnology.Figure 1. Visual analogs of LVA-technology’sspeech signal input. The 256×256 pixels image,corresponding to 16 bit samples, is sampleddown to 16×16 pixels (8 bit samples) and finallydown-sampled to approximately 9×9 pixelsrepresenting the ±42 levels of amplitude encodingused by the LVA-technology.The core analysis procedureIn the next step, the LVA-technology scans thatcrude speech signal representation for “thorns”and “plateaus” using triplets of consecutivesamples."Thorns"According to Nemesysco’s definition, thornsare counted every time the middle sample ishigher than the maximum of the first and thirdsamples, provided all three samples are abovean arbitrary threshold of +15. Similarly, a thornis also detected when the middle sample valueis lower than the minimum of both the first andthe third samples in the triplet and all threesamples are below -15. In short, thorns are localmaxima, if the triplet is above +15 and localminima if the triplet is below -15. Incidentallythis is not compatible with the illustration providedin fig. 2 of the patent, where any localmaxima or minima are counted as thorns, providedthe three samples fall outside the(-15;+15) threshold interval."Plateaus"Potential plateaus are detected when the samplesin a triplet have a maximum absolute amplitudedeviation that is less than 5 units. The±15 threshold is not used in this case but tocount as a plateau the number of samples in thesequence must be between 5 and 22. The numberof occurrences of plateaus and their lengthsare the information stored for further processing.A blind technologyAlthough Nemesysco presents a rationale forthe choice of these “thorns” and “plateaus” thatsimply does not make sense from a signal processingperspective, there are several interestingproperties associated with these peculiar variables.The crucial temporal information is completelylost during this analysis. Thorns and plateausare simply counted within arbitrarychunks of the poorly represented speech signalwhich means that a vast class of waveformscreated by shuffling the positions of the thornsand plateaus are indistinguishable from eachother in terms of totals of thorns and plateaus.Many of these waveforms may even not soundlike speech at all. This inability to distinguishbetween different waveforms is a direct consequenceof the information loss accomplished bythe signal degradation and the loss of temporalinformation. In addition to this, the absolutevalues of the amplitudes of the thorns can bearbitrarily increased up to the ±42 maximumlevel, creating yet another variant of physicallydifferent waveforms that are interpreted asidentical from the LVA-technology’s perspective.222

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!