Signal Analysis Research (SAR) Group - RNet - Ryerson University

Recommendations

Info

criterion function. The confusion matrix and the final recognition results are presented in Table II and Table III respectively. The abbreviations in Table II stand for the six different emotions: anger, fear, disgust, happiness, sadness, and surprise, and FS in Table III means Feature Selection. As it is shown in Table III, the best performance (81.3%) belongs to fuzzy-pairwise LS-SVM using the features selected by Forward Selection algorithm. Table II shows that the most difficult emotion to recognize in our experiment is surprise and the easiest ones are sadness and happiness. And fear and sadness have the highest probability to be confused with each other. VI. CONCLUSION In this contribution, we introduced a set of new acoustic features which are used for the first time in the application of AER. For classification we used LS-SVM which is a recent and powerful classifier with many advantages to other conventional and popular classifiers such as Neural Networks. We also implemented different schemes to adapt our binary classifiers to a multi-category problem. The result of a Linear Classifier is compared with LS-SVM performance. We achieved an overall classification accuracy of 81.3% with fuzzy-pairwise LS-SVM TABLE II. CONFISION MATRIX OF THE LS-SVM CLASSIFIER (FUZZY PAIRWISE WITH FEATURE SELECTION) Recognized Emotions (%) Ang Fea Dis Hap Sad Sur Ang 83.3 0 2.7 6.4 2.7 4.6 Fea 1.8 71.9 7.4 1.8 13 3.7 Dis 4.6 5.5 79.6 0 3.7 6.4 Hap 1.8 1.8 0 92.4 1.8 1.8 Sad 0 6.1 0.9 0 90.5 2.3 Sur 11.1 9.2 5.5 4.6 13.8 55.5 TABLE III. FINAL RECOGNITION RESULTS Recognition Rate One-Vs-All SVM 44.9% fuzzy One-Vs-All SVM 53.6% Pairwise SVM 74.5% fuzzy pairwise SVM 78.4% fuzzy pairwise SVM, FS 81.3% fuzzy pairwise LDA 37.7% 348 REFERENCES [1] N. Cristianini and J. SH. Taylor, An Introduction to Support Vector Machines and Other Kernel-based Methods. United Kingdom: Cambridge University Press, 2000. [2] C.J. Burges, “A tutorial on support vector machine for pattern recognition,” Knowledge Discovery and Data Mining, vol. 2, pp. 121- 167, June, 1998. [3] I.E. Naqa, Y. Yang, M. N. Wernick, N. P. Galatsanos, and R. M. Nohikawa, “A support vector machine approach for detection of microcalifications,” IEEE trans. Med. Imag., vol.21, NO. 12, December, 2002. [4] P.H. Chen, C. J. Lin, and B. Schölkopf, “A tutorial on ν – support vector machines,” unpublished. [5] B. Schölkopf and A. J. Smola, Learning with kernels – support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT press, 2002. [6] J. A. K. Suykens, T. V. Gestel, J. D. Brabanter, B. D. Moor, and J. Vandewalle, Least square support vector machines. Singapore: World scientific publishing Co. Pte. Ltd., 2002. [7] S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll, “Bimodal fusion of emotional data in an automotive environment,” proceeding of IEEE international conference on acoustic, speech, and signal processing. Vol. 2, PP. 1085-1088, 18-23 March 2005. [8] C.A. Martinez and A.B. Cruz, “Emotion recognition in non-structured utterance for human-robot interaction”, IEEE international workshop on robot and human interactive communication, PP. 19-23, 13-15 Aug. 2005. [9] T. Nguyen and I. Bass, “Investigation of combining SVM and decision tree for emotion classification,” seventh IEEE international symposium on multimedia, PP. 540-544, 2005. [10] ZJ. Chuang, CH. Wu, “Emotion recognition using acoustic features and textual content”, IEEE international conference on multimedia and expo, Vol. 1, PP. 53-56, 27-3- June 2004. [11] YL. Lin and G. Wei, “Speech emotion recognition based on HMM and SVM,” proceeding of International conference on machine learning and cybernetics, Vol. 8, PP 4898-4901, 18-21 Aug. 2005. [12] B. Schuller, G. Rigoll, and M. Lang, “Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture, ” Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing, vol.1, PP. I-577-80, 17-21 May, 2004. [13] B. Schuller, G. Rigoll, and M. Lang, “Hidden Markov model-based speech emotion recognition, ” Proceeding of the IEEE International Conference on Acoustic, Speech, and Signal Processing, PP. I-401-04, 6-10 April, 2003. [14] J. Nicholson, K. Takahashi, and R. Nakatsu, “Emotion recognition in speech using Nueral Networks,” Proceedings of the 6 th International Conference on Neural Information Processing, vol. 2, PP. 495-501, 1999. [15] V. A. Petrushin, “Creating emotion recognition agents for speech signal, ” unpublished. [16] O. Martin, I. Kotsia, B. Macq, and I. Pitas, “The eNTERFACE’05 audio-visual emotion database,” Proceedings of the 22 nd International Conference on Data Emgineering Workshop, 3-7 April 2006. [17] D. Tsujinishi, Y. Koshiba, and SH. Abe, “Why pairwise is better that One-against-All or All-at-Once,” Proceedings of IEEE International Conference on Neural Networks, vol. 1, PP. 693-698, July 2004. Authorized licensed use limited to: Ryerson University Library. Downloaded on July 7, 2009 at 11:29 from IEEE Xplore. Restrictions apply. 23
A WATERMARKING METHOD FOR SPEECH SIGNALS BASED ON THE TIME–WARPING SIGNAL PROCESSING CONCEPT Cornel Ioana (1) , Arnaud Jarrot (2) , André Quinquis (2) , Sridhar Krishnan (3) (1) LIS Laboratory BP 46, 961 rue de la Houille Blanche 38402 Saint-Martin d’Hères cedex, FRANCE phone: +33(0) 476 826 422 email: cornel.ioana@lis.inpg.fr ABSTRACT This paper deals with the watermarking of audio speech signals which consists in introducing an imperceptible mark in a signal. To this end, we suggest to use an amplitude modulated signal that mimics a formantic structure present in the signal. This allows to exploit the time–masking effect occurring when two signals are close in the time–frequency plane. From this embedding scheme, a watermark extraction method based on nonstationary linear ltering and matched lter detection is proposed in order to recover information carried by the watermark. Numerical results conducted on a real speech signal show that the watermark is likely not hearable and informations carried by the watermark are easily retrievable. Index Terms— Watermarking, Time–warping signal processing, Time–frequency analysis. 1. INTRODUCTION Today’s digital media have opened the door to an information era where the true value of a product is generally dissociated from any physical medium. While it enables a high degree of exibility in its distribution, the commerce of data without any physical media raises serious copyright issues. Data can be easily duplicated turning piracy into a simple data copy process. In order to secure the identity of the owner of a media, a solution consists in hiding digital–subcodes inside data since no physical media can be used for this purpose. This problematic is generally referred as watermarking [1]. The main rules in watermarking context are : • The watermarking should not be discernible from the media in order to keep the integrity of the media. • The watermarking should be easily retrievable. Providing a priori, the inserted watermark should be recovered as well as the digital–subcodes carried by the watermark. (2) E 3 I 2 Laboratory (EA 3876) – ENSIETA, 2RueFrançois Verny, 29806, Brest, FRANCE phone: +33(0) 298 348 720 emails: [jarrotar, quinquis]@ensieta.fr (3) Department of Electrical Engineering – Ryerson University 350 Victoria Street, Toronto, CANADA phone: 416.979.5000 x6086 email: krishnan@ee.ryerson.ca • The watermarking should be robust to attacks (i.e. compression or noise insertion) since these phenomenons often occur in media transmissions. In this paper we propose a watermarking procedure that attempts to exploit the time–frequency region available between two formants. We suggest to use, for the watermark, an amplitude modulated signal whose carrier frequency is modulated according to the modulation law of a formant. In this way, the time-frequency content of the watermark follows the time-frequency content of the formant. This allows to put the watermark signal very close to the formant. As will be seen, this embedding strategy makes the watermark likely not perceptible from an acoustical point of view. The recovery of the watermark is ensured by nonstationary linear ltering and matched ltering method. Numerical results show that the watermark can be easily recovered as well as the coded sequence carried by the watermark. The paper is organized as follows. Section 2 is devoted to a short presentation of the time–warping signal processing concept. Based on this concept, a new watermarking procedure is proposed in Section 3. Numerical results presented in Section 4 illustrate the benets of the proposed technique. Concluding remarks are given in Section 5. 2. TIME–WARPING SIGNAL PROCESSING CONCEPT 2.1. Non-unitary Time–Warping Operators Let x(t) ∈ L 2 (R) be a squared integrable signal. The set of unitary time–warping operators {W, w(t) ∈C 1 , ˙w(t) ≥ 0: x(t) → (Wx)(t)}, isdenedin[2]by (Wx)(t) =| ˙w(t)| 1/2 x (w(t)) , (1) where ˙w(t) stands for the derivative of the warping function w(t) with respect to t. Properties of this transformation include linearity and unitary equivalence since the envelope | ˙w| 1/2 preserves the energy in the signal at the output of W. 1424407281/07/$20.00 ©2007 IEEE II 201 24 ICASSP 2007 Authorized licensed use limited to: Ryerson University Library. Downloaded on July 7, 2009 at 11:18 from IEEE Xplore. Restrictions apply.
Page 1 and 2: Signal Analysis Research (SAR) Grou
Page 3 and 4: 2006.05 Support Vector Machines Bas
Page 5 and 6: 2001.05 Instantaneous Mean Frequenc
Page 7 and 8: Class A Class B TWFB Mapping Discri
Page 9 and 10: as an added advantage of this appro
Page 11 and 12: Combining Vocal Source and MFCC Fea
Page 13 and 14: probability distribution for calcul
Page 15 and 16: Proceedings of the 29th Annual Inte
Page 17 and 18: Fig. 1. Typical images of the small
Page 19 and 20: This full text paper was peer revie
Page 21 and 22: Frequency 1 0.9 0.8 0.7 0.6 0.5 0.4
Page 23 and 24: Frequency TABLE I RESULTS WITH LINE
Page 25 and 26: Emotion Recognition Using Novel Spe
Page 27: points [1, 2, 3] (Fig. 2). SVM is b
Page 31 and 32: In order to exploit the masking eff
Page 33 and 34: Chirp-based image watermarking as e
Page 35 and 36: 3.2.2 Discrete Polynomial Phase Tra
Page 37 and 38: 2006 International Joint Conference
Page 39 and 40: Fig 2 Two-dimensional mapping (Left
Page 41 and 42: p3 Feature Extraction u Database +
Page 43 and 44: 1424403677/06/$20.00 ©2006 I
Page 45 and 46: = − = K ≤
Page 47 and 48: DISCRETE POLYNOMIAL TRANSFORM FOR D
Page 49 and 50: 3.2. Watermark detection Fig.2 show
Page 51 and 52: IMPROVING POSITION ESTIMATES FROM A
Page 53 and 54: _ . r W-. n r We tried two ways of
Page 55 and 56: Acknowledgements This work of Ryers
Page 57 and 58: ing two consecutive keys. Another c
Page 59 and 60: Table 1. Experimental Results for F
Page 61 and 62: D as { gγ γ ∈ Γ, g = 1} = γ ,
Page 63 and 64: frequency parameters issued from th
Page 65 and 66: necessity of prediction, controllin
Page 67 and 68: 4. Experimental Results The objecti
Page 73: Proceedings of the 2005 IEEE Engine
Page 78 and 79:
the whole spectrum but on non-overl
Page 80 and 81:
Identification Rate 1 0.9 0.8 0.7 0
Page 82 and 83:
signals. On some occasions, when us
Page 84 and 85:
For our first test, we recorded two
Page 86 and 87:
Frequency bands F4 F3 F2 F1 ME5 s1
Page 88 and 89:
as F1, F2, F3 and F4 as shown in Fi
Page 90 and 91:
Figure. 1. Motion Vector magnitudes
Page 92 and 93:
All the above descriptor were quant
Page 94 and 95:
2. METHODOLOGY 2.1. Local Discrimin
Page 96 and 97:
. each group, the posterior probabi
Page 98 and 99:
A NOVEL ROBUST IMAGE WATERMARKING U
Page 100 and 101:
where n is the distortion component
Page 102 and 103:
A Novel Way of Lossless Compression
Page 104 and 105:
111. IMPLEMENTATION As we have pres
Page 106 and 107:
CONTENT BASED AUDIO CLASSIFICATION
Page 108 and 109:
Entropy 4.5 4 3.5 3 2.5 2 1.5 1 0.5
Page 110 and 111:
MODIFIED LOCAL DISCRIMINANT BASES A
Page 112 and 113:
Tree Decomposition (0,0) (1,0) (1,1
Page 114 and 115:
RADIO OVER MULTIMODE FIBER FOR WIRE
Page 116 and 117:
3. COMPARISON OF MULTIMODE FIBER AN
Page 118 and 119:
SUB-DICTIONARY SELECTION USING LOCA
Page 120 and 121:
0 : SI 92 r3 54 =-ax Tirnep"idfh :
Page 122 and 123:
Proceedings of the 25h Annual Inter
Page 124 and 125:
estimated noise generated as the ou
Page 126 and 127:
ROBUST AUDIO WATERMARKING USING A C
Page 128 and 129:
where 0 is the angle of the ray pat
Page 130 and 131:
TIME-FREQUENCY FILTERING OF INTERFE
Page 132 and 133:
3. INTERFERENCE DETECTION. nouen AN
Page 134 and 135:
A GENERAL PERCEPTUAL TOOL FOR EVALU
Page 136 and 137:
Fig. 2. Snapshot of the GUI used fo
Page 138 and 139:
Non-Stationary Noise Cancellation i
Page 140 and 141:
signal and noise spectra overlap, f
Page 142 and 143:
Fig. 4: Output of the Comb filter.
Page 144 and 145:
Indoor infrared transmission suffer
Page 146 and 147:
The inverse wavelet transform is de
Page 148 and 149:
The simulations has been done using
Page 150 and 151:
Figure 6: The original Gaussian noi
Page 152 and 153:
147 Authorized licensed use limited
Page 154 and 155:
Page 156 and 157:
Page 158 and 159:
Page 160 and 161:
Page 162 and 163:
Page 164 and 165:
Page 166 and 167:
Page 168 and 169:
Page 170 and 171:
Figure 2: Block diagram of the FBB
Page 172 and 173:
Figure 4: Benign mammogram before F
Page 174 and 175:
. Table 1: Compression ratios of be
Page 176 and 177:
Most Cohen’s class TFD derived fr
Page 178 and 179:
the desired frequency marginal m(w)
Page 180 and 181:
constructing an adaptive TFD and ex
Page 182 and 183:
Proceedings of the 22"d Annual EMBS
Page 184 and 185:
are: Proceedings of the 22"d Annual
Page 186 and 187:
may be written as M-1 = (z,grn)gm,
Page 188 and 189:
e 1 os 07- Ohzo5- 04- 09- 02- 01- O
Page 190 and 191:
using the denoised signals. As an i
Page 192 and 193:
signals are: 1) model-based TFD, 2)
Page 194 and 195:
compared to that in the MP dictiona
Page 196 and 197:
fied. In addition to being positive
Page 198 and 199:
08 [3] S. Peleg and B. Friedlander.
Page 200 and 201:
Proceedings - 19th International Co
Page 202 and 203:
Proceedings - 19th International Co
Page 204 and 205:
the x and y axes correspond to the
Page 206 and 207:
Fig. 5. Results with synthetic sign
Page 208 and 209:
the users and is explained in next
Page 210 and 211:
,-t 1 2 J 4 5 e 7 e e 10 SNR (I" de
Page 212 and 213:
'il! (c) (a) Weakea User 2nd Strong
Page 214 and 215:
18th Annual International Conferenc
Page 216 and 217:
quisition board and Lab Windows (Na
Page 218 and 219:
Table 1: Comparison of different cl
Page 220:
K. Umapathy and S. Krishnan, Low bi
show all

Signal Analysis Research (SAR) Group - RNet - Ryerson University

Create successful ePaper yourself

Delete template?

Save as template?