26.03.2013 Views

Pietruch R., Grzanka A., Konopka W.: Vowels recognition

Pietruch R., Grzanka A., Konopka W.: Vowels recognition

Pietruch R., Grzanka A., Konopka W.: Vowels recognition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

16 th International Congress on Sound and Vibration, 5–9 July 2009, Kraków, Poland<br />

3. nK > nF<br />

In this situation nK − nF candidates will stay unassigned. There can be one extra candidate<br />

in our situation. We are looking for the function 4 that will assign formant numbers to limited<br />

candidates according to matrix BT F , where columns represents each candidate and each row is<br />

related to formant number. Matrix elements are found according to equation 5.<br />

fF →K : F i → f K j<br />

B T F (i, j) = 0 ⇔ fF →K(i) = j (5)<br />

Best fit, minimal cost algorithm to find related transformation takes into account candidates<br />

frequencies f K and mean frequencies from previously assigned formants: ¯F = { ¯ F 1, ¯ F 2, · · · , ¯ F nF },<br />

where for n-th sample: ¯ F i(n) = M m=1 F i(n − m). In our program we make an assumption: M = 9.<br />

In the algorithm the sum of costs of changing previous formant frequencies ( ¯ F i) into new ones (f K j )<br />

is minimized according to equation 6.<br />

where CF (i, j) = | ¯ F i − f K j |<br />

max<br />

BF<br />

nF<br />

i=1<br />

(4)<br />

nK<br />

BF (i, j) · CF (i, j) (6)<br />

j=1<br />

∀j1,j2∈{1,2,...,nK}∀i1,i2∈{1,2,...,nF } for which fK→F (j1) = i1 ∧ fK→F (j2) = i2 :<br />

f K j1 < f K j1 ⇔ i1 < i2 (7)<br />

∀j1,j2∈{1,2,...,nK}∀i1,i2∈{1,2,...,nF } for which fF →K(i1) = j1 ∧ fF →K(i2) = j2 :<br />

i1 < i2 ⇔ f K j1 < f K j2 (8)<br />

There is natural assumption that formant candidates should be assigned to formants with respect<br />

to their order (equations 7, 8). Then we need to check all combinations min{nF , nK} - elements set<br />

of max{nF , nK} - elements set.<br />

3.4 Video methods<br />

Within our system novel computer vision techniques were used to automatically segment the<br />

eyes, mouth and nose regions [5]. Following parameters are tracked within the system: distance<br />

between the eyes L0, lips height LH and width LW , distance between line joining the eyes and the<br />

bottom of a jaw LJ. From lips shape the mouth opening area is estimated. Jaw angle and area of<br />

mouth opening were chosen as representatives of facial expression descriptors.<br />

3.5 Face elements tracking<br />

Facial elements tracking algorithms start from color space conversion, RGB to HSV. Then for<br />

each pixel (with respect to assumed margins) values of 5 features is counted. According to adaptive<br />

thresholds the binarization of 5 regions is made. The thresholds are updated according to region sizes.<br />

Related regions specifies red colored pixels (RED), dark pixels (DARK), moving objects (MOV),<br />

vertical differences (VDIFF) and horizontal differences (HDIFF).<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!