02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

c<strong>on</strong>fusi<strong>on</strong> am<strong>on</strong>g researchers, and many new results are simply ignored as it is very time<br />

c<strong>on</strong>suming to verify them. To avoid these problems, the performance <strong>of</strong> the proposed<br />

techniques is studied <strong>on</strong> very standard tasks, where it is possible to compare achieved<br />

results to baselines that were previously reported by other researchers 5 .<br />

First, experiments will be shown <strong>on</strong> a well known Penn Treebank Corpus, and the<br />

comparis<strong>on</strong> will include wide variety <strong>of</strong> models that were introduced in secti<strong>on</strong> 2.3. A<br />

combinati<strong>on</strong> <strong>of</strong> results given by various techniques provides very important informati<strong>on</strong><br />

by showing complementarity <strong>of</strong> the different language modeling techniques. Final combina-<br />

ti<strong>on</strong> <strong>of</strong> all techniques that were available to us results in a new state <strong>of</strong> the art performance<br />

<strong>on</strong> this particular data set, which is significantly better than <strong>of</strong> any individual technique.<br />

Sec<strong>on</strong>d, experiments with increasing amount <strong>of</strong> the training data will be shown while<br />

using Wall Street Journal training data (NYT Secti<strong>on</strong>, the same data as used by [23] [79] [49]).<br />

This study will focus <strong>on</strong> both entropy and word error rate improvements. The c<strong>on</strong>clusi<strong>on</strong><br />

seems to be that with increasing amount <strong>of</strong> the training data, the difference in performance<br />

between the RNN models and the back<strong>of</strong>f models is getting larger, which is in c<strong>on</strong>trast to<br />

what was found by Goodman [24] for other advanced LM techniques, such as class <str<strong>on</strong>g>based</str<strong>on</strong>g><br />

models. Experiments with adaptati<strong>on</strong> <strong>of</strong> the RNN language models will be shown <strong>on</strong> this<br />

setup and additi<strong>on</strong>al details and results will be provided for another WSJ setup that can<br />

be much more easily replicated, as it is <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a new open-source speech recogniti<strong>on</strong><br />

toolkit, Kaldi [60].<br />

Third, results will be shown for the RNN model applied to the state <strong>of</strong> the art speech<br />

recogniti<strong>on</strong> system developed by IBM [30] that was already briefly menti<strong>on</strong>ed in Sec-<br />

ti<strong>on</strong> 2.3.5, where we will compare the performance to the current state <strong>of</strong> the art language<br />

model <strong>on</strong> that set (so-called model M). The language models for this task were trained<br />

<strong>on</strong> approximately 400M words. Achieved word error rate reducti<strong>on</strong>s over the best n-gram<br />

model are relatively over 10%, which is a pro<strong>of</strong> <strong>of</strong> usefulness <strong>of</strong> the techniques developed<br />

in this work.<br />

Lastly, comparis<strong>on</strong> <strong>of</strong> performance <strong>of</strong> RNN and n-gram models will be provided <strong>on</strong> a<br />

novel task ”The Micros<strong>of</strong>t Research Sentence Completi<strong>on</strong> Challenge” [83] that focuses <strong>on</strong><br />

ability <strong>of</strong> artificial language models to appropriately complete a sentence where a single<br />

informative word is missing.<br />

5 Many <strong>of</strong> the experiments described in this work can be reproduced by using a toolkit for training<br />

Recurrent neural network (RNN) language models which can be found at http://www.fit.vutbr.cz/<br />

~imikolov/rnnlm/.<br />

25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!