02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

words, but also additi<strong>on</strong>al symbols such as brackets and quotati<strong>on</strong> marks. These follow<br />

simple patterns that cannot be described efficiently with n-gram models (such as that after<br />

a left bracket, a right bracket should be expected etc.). Even with a limited RNNME-<br />

160 model, the entropy reducti<strong>on</strong> over KN5 model with no count cut<strong>of</strong>fs was about 7% -<br />

perplexity was reduced from 110 to 80. By combining static and dynamic RNNME-160<br />

model, perplexity decreased further to 58. It would be not surprising if a combinati<strong>on</strong><br />

<strong>of</strong> several larger static and dynamic RNNME models trained <strong>on</strong> the MT corpora would<br />

provide perplexity reducti<strong>on</strong> over 50% against the KN5 model with no count cut<strong>of</strong>fs - it<br />

would be interesting to investigate this task in the future.<br />

7.2 Data Compressi<strong>on</strong><br />

Compressi<strong>on</strong> <strong>of</strong> text files is a problem almost equivalent to statistical language modeling.<br />

The objective <strong>of</strong> state <strong>of</strong> the art text compressors is to predict the next character or word<br />

with as high probability as possible, and encode it using algorithmic coding. Decompres-<br />

si<strong>on</strong> is then a symmetric process, where models are built while processing the decoded<br />

data in the same way as they were built during the compressi<strong>on</strong>. Thus, the objective is<br />

to simply maximize P (w|h). An example <strong>of</strong> such compressi<strong>on</strong> approach is the state <strong>of</strong> the<br />

art compressi<strong>on</strong> program ’PAQ’ from M. Mah<strong>on</strong>ey [45, 46].<br />

The difference between language modeling and data compressi<strong>on</strong> is that the data is not<br />

known in advance, and has to be processed <strong>on</strong>-line. As it was described in the previous<br />

chapters, training <strong>of</strong> neural network models with a hidden layer requires several training<br />

passes over the training data for reaching the top performance. Thus, it was not clear if<br />

RNN LMs can be practically applied to data compressi<strong>on</strong> problems.<br />

As data compressi<strong>on</strong> is not topic <strong>of</strong> this thesis, I will not describe experiments in great<br />

detail. However, it is worth observing that it is possible to accurately estimate compressi<strong>on</strong><br />

ratio given by RNN LMs using the RNNLM toolkit. The size <strong>of</strong> a compressed text file is<br />

size <strong>of</strong> (possibly also compressed) vocabulary plus entropy <strong>of</strong> the training data during the<br />

first training epoch (this is simply average entropy per word times number <strong>of</strong> words).<br />

For the further experiments, I have implemented arithmetic coding into a special ver-<br />

si<strong>on</strong> <strong>of</strong> the RNNLM toolkit, thus the following results were obtained with real RNN-<str<strong>on</strong>g>based</str<strong>on</strong>g><br />

data compressor. As the class <str<strong>on</strong>g>based</str<strong>on</strong>g> RNN architecture was used, first the class <strong>of</strong> predicted<br />

word is encoded using arithmetic coding, and then the specific word, as the text data are<br />

96

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!