20.07.2013 Views

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Stabler - Lx 185/209 2003<br />

(4) There are quite a few language resources available <strong>on</strong>line. One <strong>of</strong> them is Roger Mitt<strong>on</strong>’s (1992) ph<strong>on</strong>etic<br />

dicti<strong>on</strong>ary in the Oxford Text Archive. It has 70645 words <strong>of</strong> various kinds with ph<strong>on</strong>etic transcripti<strong>on</strong>s<br />

<strong>of</strong> British English. The beginning <strong>of</strong> the listing looks like this:<br />

’neath niT T-$ 1<br />

’shun SVn W-$ 1<br />

’twas tw0z Gf$ 1<br />

’tween twin Pu$,T-$ 1<br />

’tween-decks ’twin-deks Pu$ 2<br />

’twere tw3R Gf$ 1<br />

’twill twIl Gf$ 1<br />

’twixt twIkst T-$ 1<br />

’twould twUd Gf$ 1<br />

’un @n Qx$ 1<br />

A eI Ki$ 1<br />

A’s eIz Kj$ 1<br />

A-bombs ’eI-b0mz Kj$ 2<br />

A-level ’eI-levl K6% 3<br />

A-levels ’eI-levlz Kj% 3<br />

AA ,eI’eI Y>% 2<br />

ABC ,eI,bi’si Y>% 3<br />

The sec<strong>on</strong>d column is a ph<strong>on</strong>etic transcripti<strong>on</strong> <strong>of</strong> the word spelled in the first column. (Columns 3 and<br />

4 c<strong>on</strong>tain syntactic category, number <strong>of</strong> syllables.)<br />

The ph<strong>on</strong>etic transcripti<strong>on</strong> has notati<strong>on</strong>s for 43 sounds. My guesses <strong>on</strong> the translati<strong>on</strong>:<br />

Mitt<strong>on</strong> IPA example Mitt<strong>on</strong> IPA example<br />

i i bead N sing<br />

I Á bid T Ì thin<br />

e bed D ð then<br />

& æ bad S Ë shed<br />

A a bard Z beige<br />

0(zero) cod O cord<br />

U Í good u u food<br />

p p t t<br />

k k b b<br />

d d g g<br />

V m m<br />

n n f f<br />

v v s s<br />

z z 3 bird<br />

r r l l<br />

w w h h<br />

j j @ about<br />

eI eÁ day @U oÎ go<br />

aI aÁ eye aU aÍ cow<br />

oI oÁ boy I@ Á beer<br />

e@ bare U@ Í tour<br />

R far<br />

The ph<strong>on</strong>etic entries also mark primary stress with an apostrophe, and sec<strong>on</strong>dary stress with an comma.<br />

Word boundaries in compound forms are indicated with a +, unless they are spelled with a hyphen or<br />

space, in which case the ph<strong>on</strong>etic entries do the same.<br />

bookclub above board air-raid<br />

a. Mitt<strong>on</strong>’s dicti<strong>on</strong>ary is organized by spelling, rather than by ph<strong>on</strong>etic transcripti<strong>on</strong>, but it would be<br />

easy to reverse. Write a program that maps ph<strong>on</strong>etic sequences like this<br />

[ ′ D ′ , ′ @ ′ , k, ′ & ′ , t, ′ I ′ , z, ′ O ′ , n, ′ D ′ , ′ @ ′ , m, ′ & ′ , t]<br />

to word sequences like this:<br />

[the, cat, is, <strong>on</strong>, the, mat].<br />

b. As in the previous problem (3), c<strong>on</strong>nect this translator to the recognizer, so that we can recognize<br />

certain ph<strong>on</strong>etic sequences as sentences.<br />

33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!