A Time-Frequency Way for Improving the Quality of the ... - Wseas.us

wseas.us

A Time-Frequency Way for Improving the Quality of the ... - Wseas.us

A Time-Frequency Way for Improving the Quality of the Synthesized SpeechSignal in Classical LPC Compression MethodCERNAIANU LEONARDO, SERBANESCU ALEXANDRU, QUINQUIS ANDRETelecommunication SectionEIAMilitary Technical AcademyENSIETAGeorge Cosbuc Street, 81-83, Bucharest 2, rue François Verny 29806 Brest CEDEX 9ROMANIAFranceAbstract: - The LPC method is classical in speech compression, nevertheless it presents a bad sound quality. But, even withthis drawback, the LPC method is currently used because of its sound compression capacity. We have looked for a way ofimproving the quality of the synthesized speech signal.Key-Words: - Adaptive, Wavelet Packets, Huffman, LPC, Speech, Compression1 IntroductionThe classical LPC method is a beginning step in theworld of speech processing. The LPC filter, the LPCcoefficients, Levinson and Durbin algorithms are wellknown ([1], [2] and [3]). A lot of applications use,today, this method of speech synthesis such as LPCvocoders and mobile phones. But the quality of thesynthesized speech was never very good. That’s thereason why we propose a speech synthesis methodhaving as start point the LPC method, but with a goodsynthesized sound given to us by the time-frequencyalgorithms we use : the Wavelet Packetsdecomposition.2 Problem FormulationThe LPC method computes the coefficients in the ARmodel by minimizing the mean square error betweenthe input and synthesized signals. The resultingprediction error tend to a decorrelated signal as theAR model order increases. We propose to transmitalso a compressed version of this error. The length ofthe prediction error is the same as the initial signal soit can be said that there is no gain in transmitting errorinstead of original signal. Indeed, the proposedmethod gives better results if applied directly on thespeech signal. With the LPC-Wavelet Packets methodthe compression rate decreases but the gain is animprovement of the synthesized signal quality.3 Problem SolutionThe main idea of the algorithm is to send acompressed form of the prediction error. To do that,we used a modified form of Wavelet Packets (WP)algorithm combined with an adaptive Huffmancompression.3.1 The adaptive Wavelet Packets algorithmThe classical WP algorithm use two types of filters forsignal filtering – a low-pass filter synthesized using thewavelet mother function and the associated mirrorfilter. This is done at each decomposition level,beginning with level 1 and finishing with level N.Each cell at the level k has 2 N −kWP coefficients.That means that, at each level, the spectrum portionsconsidered are twice smaller than at the previous level.In most applications, the decompositionalgorithms reach the level 3 or 5. In practice, thespectrum properties of the prediction error can vary alot. So, the decomposition level can not be the samefor all signals. It have to be determined from one inputspeech word to another.


We have developed an automatic leveldecomposition algorithm. We have made manyexperiments consisting in Wavelet Packetsdecomposition on the best basis using various kind ofsignals, mother functions and cost functions. For astrict control of the synthesized sound quality, wehave used the Donoho thresholding method which willbe presented in the next sub-section.We observed that the thresholded coefficientsnumber variation in function of the decompositionlevel follow a convex parabolic law. That means thatthere is an optimal decomposition level for each inputsignal we have and we can determine this level bymaking step by step the decomposition until we findthat for the level k, the necessary coefficients numberis bigger that by using the k-1 level. This algorithm isillustrated in fig. 1. The synthesized signal quality isthe same for each level decomposition we use becauseof the Donoho thresholding method, but thecompression varies from one level to another.3.1.1 The Donoho thresholding methodThe Donoho thresholding method [4] idea is to have asynthesized signal carrying only a percentage (α) ofthe original signal energy (E). We keep the greatestcoefficients until the energy or regarded coefficients isαE . For choosing α value, we can approximate thesignal to noise ratio using the followed equation:ESNR = 10*log = −10*log( 1−α ) (1)E − αEAs you can observe, another advantage of theproposed method is that we can impose the desiredsignal to noise ratio. This is very important for signaltransmission because we can choose the quality of thereceived signal. We did not consider the channel noise,but the difference between the input and synthesizedsignals.Fig.1 The best decomposition level determinationalgorithmEven with Donoho thresholding method, the numberof WP coefficients is equal to the length of the inputsignal after we complete with zeros to a length powerof two (zero padding). We cannot eliminate thecoefficients we do not keep because, in the synthesisalgorithm, we need the initial position of the holdedcoefficients in the decomposition table. So, we put allthe values of the canceled coefficients zero and try tocompress this vector. We also need a vector toindicate the position of the coefficients in thedecomposition table. Because we used the best basisdetermination algorithm, this vector (bb) keeps theinformation about the best basis localization in thedecomposition table. If we look at the decompositiontable, from left to the right and from up to down, tobuild the bb vector we add a 1 if the field belongs tothe best basis and a 0 if not. Therefore, we have tocompress the thresholded coefficients vector (wpth)and the bb vector.


3.1.2 The Wavelet Packets coefficients compressionmethodWe can multiply the wpth vector without changing thesignal information neither in form nor in frequency.Knowing this, we normalized the input signal between–1 and 1. Doing this operation, we obtain sub-unitaryWP coefficients. Then, we multiplied the WPcoefficients vector by 100 and we synthesized thesignal. The synthesized signal was the input signalamplified by 100.We apply the Donoho thresholding method,amplify by 100 the wpth vector, round all values to thenearest integer and synthesize the signal. The resultingerror is very small (under 0.2 dB). Therefore we canbuild a amplified version of the wpth vector (wptha)which has integer values and we can still compress thisvector using one of the classical compressionalgorithms without significant deterioration. We usedtwo compression methods : the RLC (Run LengthCoding) and Huffman algorithms [5].To eliminate the bb vector, we compress the wpthavector using a modified form of RLC method. Weobtain a vector which has two kinds of elements – thenormal ones (we named them the elements of thewptha vector) and some number-zero groups. Themethod we use to build this vector is very simple : forall the parts from the wptha vector which contain justzeros, we count the number of zeros, delete therespective part and put instead the counted valuefollowed by one single zero as in the next example:vector so we have just positive values. We do this byadding to each element of the wpthac vector theabsolute value of the greatest negative element plusone. We will not touch the number zero-zero groups(see fig. 3 for example)!Fig.3 The wpthac translation methodIf we have in mind that we made a amplification by100, we can change each coefficient of the wpthactvector into an even one. The error resulting from thisoperation is insignificant. Afterwards, we add the bestbasis vector to the resulting vector like in thefollowing example:Fig.4 The combination of the wpthactp and bb vectorsFig.2 The RLC compression algorithmAfter the RLC compression we want to add the bbvector to the wpthac vector to keep only 1 vector. Butthe wpthac vector have also negative values. It cancontain an –1 so, if we add the best basis vector to thewpthac vector, we can obtain another zero and wecan’t distinguish anymore the number-zero groups. Tosolve this, it is necessary to translate the wpthacAs you can see in fig.4, we can extract the bbvector in the synthesis process by looking at the evenoddpropriety of the wpthactpb elements.We still have a vector which has just integernumbers. So we can compress it by using the Huffmanalgorithm. The main problem is that we must consideras data some delimiters between the coefficients. Inmost cases, a blank character is considered.Proceeding like this, the quantity of data we need tocompress becomes double and we have ten characters(from 0 to 9) for compression and the blank character.


To eliminate blank characters from compression,we imagine the following method : we complete thewpthactpb vector with the length in digits of eachnumber which it contains and eliminates all the blankcharacters as illustrated in fig.5.Fig.7 The correlation between the input signal andsynthesized LPC signal (top) and between the inputsignal and the synthesized LPC-Wavelet Packets-RLC-Huffman signal (bottom)Fig.5 The way we eliminate the blank characterNow, we can perform the Huffman compressionalgorithm to the wpthactpbc vector. The advantage isthat we will generate the Huffman tree using just tencharacters (from 0 to 9). All the additionally data weneed (which includes the length of the vector we addin blank character elimination step) for synthesisalgorithm can be send or memorized as a header (inthe transmission).In fig.6, we show an example of signal synthesizedwith this method compared with the input signal (theromanian word “shase”) and the signal synthesizedusing the classical LPC method.In fig.7 and fig.8 we draw the correlation andcoherence functions calculated between the inputsignal and the synthesized LPC-WP-RLC-Huffmansignal and between the input signal and synthesizedLPC signal respectively.Fig.8 The coherence between the input signal andsynthesized LPC signal (top) and between the inputsignal and the synthesized LPC-Wavelet Packets-RLC-Huffman signal (bottom)4 The LPC-WP-RLC-Huffmancompression algorithm stepsFig.6 The input signal “shase” (top), the LPC-Wavelet Packets- RLC-Huffman synthesized signal(middle) and the LPC synthesized signal(bottom)Fig.9 The compression algorithm


5 Practical resultsThe synthesized sound quality and the compressionrate may be chosen by selecting the α parameter. If weselect a greater value for α, we’ll have a wellapproximation of the input signal, but a smallercompression rate. For “shase”, the compressedversion of the input sound we need 4.24 times lessmemory than the original signal.The variation of the compression rate (C) infunction of the α parameter is almost linear as shownin fig. 10.[3] A. Gersho, R.M. Gray, Vector quantization andsignal compression, Kluwer Academic Publishers,1992[4] Donoho, D., WaveLab reference manual, SanfordUniversity, 1995[5]Thomas J. Lynch, Data Compression – Techniquesand applications, Van Nostrand Reinhold, 1985Fig. 10 The variation of the compression rate infunction of the α parameter for the “shase”6 ConclusionThe synthesized signal we obtain is closer to the inputsignal that in the classical LPC method and wepreserve a good compression rate. The synthesizedsound quality and the compression rate can be strictlycontrolled. This algorithm can be use in both LPC-WP-RLC and LPC-WP-RLC-Huffman form, thesecond one being recommended for data stocking.References:[1] J.D. Markel, A.H. Gray, Linear prediction ofspeech, Springer-Verlag, 1976[2] L.R. Rabiner, R.W. Schafer, Digital processing ofspeech signals, Prentice Hall, 1978

More magazines by this user
Similar magazines