02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Table 7.5: Additi<strong>on</strong>al results <strong>on</strong> the Micros<strong>of</strong>t Sentence Completi<strong>on</strong> Challenge task.<br />

Model Accuracy [%]<br />

filtered KN5 47.7<br />

filtered RNNME-100 48.8<br />

RNNME combinati<strong>on</strong> 55.4<br />

n-gram models. These results are summarized in Table 7.5, where models trained <strong>on</strong><br />

modified training data are denoted as filtered. Combinati<strong>on</strong> <strong>of</strong> RNNME models trained<br />

<strong>on</strong> the original and the filtered training data then provides the best result <strong>on</strong> this task so<br />

far, about 55% accuracy.<br />

As the task itself is very interesting and shows what language modeling research can<br />

focus <strong>on</strong> in the future, the next chapter will include some <strong>of</strong> my ideas how good test sets<br />

for measuring quality <strong>of</strong> language models should be created.<br />

7.4 Speech Recogniti<strong>on</strong> <strong>of</strong> Morphologically Rich <str<strong>on</strong>g>Language</str<strong>on</strong>g>s<br />

N-gram language models usually work quite well for English, but not so much for other<br />

languages. The reas<strong>on</strong> is that for morphologically rich languages, the number <strong>of</strong> word<br />

units is much larger, as new words are formed easily using simple rules, by adding new<br />

word ending etc. Having two or more separate sources <strong>of</strong> informati<strong>on</strong> (such as stem and<br />

ending) in a single token increases amount <strong>of</strong> parameters in n-gram models that have to<br />

be estimated from the training data. Thus, higher order n-gram models usually do not<br />

give much improvement. Other problem is that for these languages, much less training<br />

data is usually available.<br />

To illustrate the problem, we have used the Penn Treebank Corpus as described in<br />

Chapter 4, and added two bits <strong>of</strong> random informati<strong>on</strong> to every token. This should increase<br />

perplexity <strong>of</strong> the model that is trained <strong>on</strong> these modified data by more than two bits, as<br />

it is not possible to revert the process (the informati<strong>on</strong> that certain words are similar has<br />

to be obtained just from the statistical similarity <strong>of</strong> occurrence).<br />

As the n-gram models cannot perform any clustering, it must be expected that their<br />

performance will degrade significantly. On the other hand, RNN models can perform clus-<br />

tering well, thus the increase <strong>of</strong> entropy should be lower. Results with simple RNN models<br />

with the same architecture and KN5 models with no discounts are shown in Table 7.6.<br />

100

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!