01.03.2013 Views

word boundary- hypothesisation in hindi speech - Speech and ...

word boundary- hypothesisation in hindi speech - Speech and ...

word boundary- hypothesisation in hindi speech - Speech and ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

For English, several studies were carried out, both by l<strong>in</strong>guists <strong>and</strong> <strong>speech</strong><br />

scientists, to identify lexical clues which can be used to detect <strong>word</strong> boundaries<br />

[Shipman <strong>and</strong> Zue 19821. The clues used were basically constra<strong>in</strong>ts on sequences of<br />

phonemes, also known as phonotactic constra<strong>in</strong>ts. In one study [Lamel <strong>and</strong> Zue 19841,<br />

sequences of consonants of the form C+ (+ <strong>in</strong>dicat<strong>in</strong>g a sequence of one or more<br />

consonants) were identified. These sequences were used to hypothesise some of the<br />

<strong>word</strong> boundaries <strong>in</strong> several texts. It was also suggested that such clues can also be used<br />

to detect <strong>word</strong> boundaries at a broadclass level. To identify the exact location of the<br />

<strong>word</strong> <strong>boundary</strong> <strong>in</strong> the consonant str<strong>in</strong>g (for example, a two consonant sequence C1C2<br />

can conta<strong>in</strong> a <strong>word</strong> <strong>boundary</strong> <strong>in</strong> three positions; before the sequence, with<strong>in</strong> the<br />

sequence or after the sequence), it was suggested that additional knowledge such as<br />

acoustic-phonetics can be used. Though results on actual texts were not reported, the<br />

number of <strong>word</strong> boundaries that can be detected by these clues appear to be limited.<br />

A more recent <strong>and</strong> exhaustive study on the use of the phoneme sequence<br />

constra<strong>in</strong>ts was done by Harr<strong>in</strong>gton [Harr<strong>in</strong>gton, Johnson <strong>and</strong> Cooper 19871 <strong>in</strong> which<br />

sequences of the types CV, VC <strong>and</strong> CVC were considered. In this study, all <strong>word</strong>-<br />

<strong>in</strong>ternal sequences of the given type were extracted from a dictionary. Also all possible<br />

sequences that can occur across <strong>word</strong> boundaries were found by consider<strong>in</strong>g all<br />

possible pair<strong>in</strong>gs of the <strong>word</strong>s. From these <strong>word</strong> <strong>boundary</strong> sequences the <strong>word</strong>-<strong>in</strong>ternal<br />

sequences were removed. Thus the rema<strong>in</strong><strong>in</strong>g sequences were sequences which occur<br />

only across <strong>word</strong> boundaries <strong>and</strong> these sequences were used to hypothesise <strong>word</strong><br />

boundaries. It was found that nearly 45% of the <strong>word</strong> boundaries can be detected <strong>in</strong> an<br />

English text represented <strong>in</strong> a phonemic form with <strong>in</strong>correct hypotheses less than 4%.<br />

In a later study [Harr<strong>in</strong>gton, Watson <strong>and</strong> Cooper 19891, the above sequences<br />

were used to hypothesise <strong>word</strong> boundaries <strong>in</strong> str<strong>in</strong>gs represented us<strong>in</strong>g broadclasses. It

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!