02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

As can be seen in Table 7.3, the usual data compressi<strong>on</strong> programs such as gzip do<br />

not work very well <strong>on</strong> text data - however, the speed <strong>of</strong> advanced compressors is orders<br />

<strong>of</strong> magnitude lower. Thus, the achieved results are currently more interesting from the<br />

research point <strong>of</strong> view, than from the practical point <strong>of</strong> view - however, with further<br />

progress, things may change in the future.<br />

Interestingly, the achieved entropy 1.21 bits per character for English text (including<br />

spaces and end <strong>of</strong> line symbols) is already lower than the upper bound estimate <strong>of</strong> Shann<strong>on</strong>,<br />

1.3 bpc [66]. It can be expected that with even more data, the entropy would still decrease<br />

c<strong>on</strong>siderably.<br />

7.3 Micros<strong>of</strong>t Sentence Completi<strong>on</strong> Challenge<br />

The motivati<strong>on</strong> examples in the introducti<strong>on</strong> <strong>of</strong> this thesis did show that a good statistical<br />

language model should assign higher probability to sentences that can be assumed as usual,<br />

correct or meaningful, and low probability to the others. Also, it was explained that n-<br />

gram models cannot represent patterns over l<strong>on</strong>ger c<strong>on</strong>texts efficiently due to exp<strong>on</strong>ential<br />

increase <strong>of</strong> number <strong>of</strong> parameters with the order <strong>of</strong> the model. Thus it is an interesting<br />

task to compare the developed RNN language models and n-gram models <strong>on</strong> a simple<br />

task, where the language model is supposed to choose the most meaningful word am<strong>on</strong>g<br />

several opti<strong>on</strong>s in a sentence with <strong>on</strong>e missing word.<br />

Such task has been recently published in [83]. It c<strong>on</strong>sists <strong>of</strong> 1040 sentences where a<br />

single informative word is discarded, and five possible opti<strong>on</strong>s are given. An example:<br />

I have seen it <strong>on</strong> him , and could write to it .<br />

I have seen it <strong>on</strong> him , and could migrate to it .<br />

I have seen it <strong>on</strong> him , and could climb to it .<br />

I have seen it <strong>on</strong> him , and could swear to it .<br />

I have seen it <strong>on</strong> him , and could c<strong>on</strong>tribute to it .<br />

Thus, by computing the likelihood <strong>of</strong> each sentence and choosing the most likely <strong>on</strong>e<br />

given a specific model, we can test ability <strong>of</strong> language models to ”understand” patterns<br />

in the sentence. Note that this task is similar to the usual quality measure <strong>of</strong> language<br />

98

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!