word boundary- hypothesisation in hindi speech - Speech and ...
word boundary- hypothesisation in hindi speech - Speech and ...
word boundary- hypothesisation in hindi speech - Speech and ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
For English, several studies were carried out, both by l<strong>in</strong>guists <strong>and</strong> <strong>speech</strong><br />
scientists, to identify lexical clues which can be used to detect <strong>word</strong> boundaries<br />
[Shipman <strong>and</strong> Zue 19821. The clues used were basically constra<strong>in</strong>ts on sequences of<br />
phonemes, also known as phonotactic constra<strong>in</strong>ts. In one study [Lamel <strong>and</strong> Zue 19841,<br />
sequences of consonants of the form C+ (+ <strong>in</strong>dicat<strong>in</strong>g a sequence of one or more<br />
consonants) were identified. These sequences were used to hypothesise some of the<br />
<strong>word</strong> boundaries <strong>in</strong> several texts. It was also suggested that such clues can also be used<br />
to detect <strong>word</strong> boundaries at a broadclass level. To identify the exact location of the<br />
<strong>word</strong> <strong>boundary</strong> <strong>in</strong> the consonant str<strong>in</strong>g (for example, a two consonant sequence C1C2<br />
can conta<strong>in</strong> a <strong>word</strong> <strong>boundary</strong> <strong>in</strong> three positions; before the sequence, with<strong>in</strong> the<br />
sequence or after the sequence), it was suggested that additional knowledge such as<br />
acoustic-phonetics can be used. Though results on actual texts were not reported, the<br />
number of <strong>word</strong> boundaries that can be detected by these clues appear to be limited.<br />
A more recent <strong>and</strong> exhaustive study on the use of the phoneme sequence<br />
constra<strong>in</strong>ts was done by Harr<strong>in</strong>gton [Harr<strong>in</strong>gton, Johnson <strong>and</strong> Cooper 19871 <strong>in</strong> which<br />
sequences of the types CV, VC <strong>and</strong> CVC were considered. In this study, all <strong>word</strong>-<br />
<strong>in</strong>ternal sequences of the given type were extracted from a dictionary. Also all possible<br />
sequences that can occur across <strong>word</strong> boundaries were found by consider<strong>in</strong>g all<br />
possible pair<strong>in</strong>gs of the <strong>word</strong>s. From these <strong>word</strong> <strong>boundary</strong> sequences the <strong>word</strong>-<strong>in</strong>ternal<br />
sequences were removed. Thus the rema<strong>in</strong><strong>in</strong>g sequences were sequences which occur<br />
only across <strong>word</strong> boundaries <strong>and</strong> these sequences were used to hypothesise <strong>word</strong><br />
boundaries. It was found that nearly 45% of the <strong>word</strong> boundaries can be detected <strong>in</strong> an<br />
English text represented <strong>in</strong> a phonemic form with <strong>in</strong>correct hypotheses less than 4%.<br />
In a later study [Harr<strong>in</strong>gton, Watson <strong>and</strong> Cooper 19891, the above sequences<br />
were used to hypothesise <strong>word</strong> boundaries <strong>in</strong> str<strong>in</strong>gs represented us<strong>in</strong>g broadclasses. It