02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

are usually achieved by combining class <str<strong>on</strong>g>based</str<strong>on</strong>g> model and the n-gram model. There exists a<br />

lot <strong>of</strong> variati<strong>on</strong>s <strong>of</strong> class <str<strong>on</strong>g>based</str<strong>on</strong>g> models, which <strong>of</strong>ten focus <strong>on</strong> the process <strong>of</strong> forming classes.<br />

So-called s<strong>of</strong>t classes allow <strong>on</strong>e word to bel<strong>on</strong>g to multiple classes. Descripti<strong>on</strong> <strong>of</strong> several<br />

variants <strong>of</strong> class <str<strong>on</strong>g>based</str<strong>on</strong>g> models can be found in [24].<br />

While perplexity improvements given by class <str<strong>on</strong>g>based</str<strong>on</strong>g> models are usually moderate, these<br />

techniques have noticeable effect <strong>on</strong> the word error rate in speech recogniti<strong>on</strong>, especially<br />

when <strong>on</strong>ly small amount <strong>of</strong> training data is available. This makes class <str<strong>on</strong>g>based</str<strong>on</strong>g> models quite<br />

attractive as opposed to the cache models, which usually work well <strong>on</strong>ly in experiments<br />

c<strong>on</strong>cerning perplexity.<br />

The disadvantages <strong>of</strong> class <str<strong>on</strong>g>based</str<strong>on</strong>g> models include high computati<strong>on</strong>al complexity during<br />

inference (for statistical classes) or reliance <strong>on</strong> expert knowledge (for manually assigned<br />

classes). More seriously, improvements tend to vanish with increased amount <strong>of</strong> the train-<br />

ing data [24]. Thus, class <str<strong>on</strong>g>based</str<strong>on</strong>g> models are more <strong>of</strong>ten found in the research papers, than<br />

in real applicati<strong>on</strong>s.<br />

From the critical point <strong>of</strong> view, there are several theoretical difficulties involving class<br />

<str<strong>on</strong>g>based</str<strong>on</strong>g> models:<br />

• The assumpti<strong>on</strong> that words bel<strong>on</strong>g to some higher level classes is intuitive, but<br />

usually no special theoretical explanati<strong>on</strong> is given to the process how classes are<br />

c<strong>on</strong>structed; in the end, the number <strong>of</strong> classes is usually just some tunable parameter<br />

that is chosen <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> performance <strong>on</strong> development data<br />

• Most techniques do attempt to cluster individual words in the vocabulary, but the<br />

idea is not extended to n-grams: by thinking about character-level models, it is obvi-<br />

ous that with increasing amount <strong>of</strong> the training data, classes can <strong>on</strong>ly be successful<br />

if l<strong>on</strong>ger c<strong>on</strong>text can be captured by a single class (several characters for this case)<br />

2.3.3 Structured <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Models</str<strong>on</strong>g><br />

The statistical language modeling was criticized heavily by the linguists from the first<br />

days <strong>of</strong> its existence. The already menti<strong>on</strong>ed Chomsky’s statement that ”the noti<strong>on</strong> <strong>of</strong><br />

probability <strong>of</strong> a sentence is completely useless <strong>on</strong>e” can be nowadays easily seen as a big<br />

mistake due to indisputable success <strong>of</strong> applicati<strong>on</strong>s that involve n-gram models. However,<br />

further objecti<strong>on</strong>s from the linguistic community usually address the inability <strong>of</strong> n-gram<br />

models to represent l<strong>on</strong>ger term patterns that clearly exist between words in a sentence.<br />

20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!