Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
c<strong>on</strong>fusi<strong>on</strong> am<strong>on</strong>g researchers, and many new results are simply ignored as it is very time<br />
c<strong>on</strong>suming to verify them. To avoid these problems, the performance <strong>of</strong> the proposed<br />
techniques is studied <strong>on</strong> very standard tasks, where it is possible to compare achieved<br />
results to baselines that were previously reported by other researchers 5 .<br />
First, experiments will be shown <strong>on</strong> a well known Penn Treebank Corpus, and the<br />
comparis<strong>on</strong> will include wide variety <strong>of</strong> models that were introduced in secti<strong>on</strong> 2.3. A<br />
combinati<strong>on</strong> <strong>of</strong> results given by various techniques provides very important informati<strong>on</strong><br />
by showing complementarity <strong>of</strong> the different language modeling techniques. Final combina-<br />
ti<strong>on</strong> <strong>of</strong> all techniques that were available to us results in a new state <strong>of</strong> the art performance<br />
<strong>on</strong> this particular data set, which is significantly better than <strong>of</strong> any individual technique.<br />
Sec<strong>on</strong>d, experiments with increasing amount <strong>of</strong> the training data will be shown while<br />
using Wall Street Journal training data (NYT Secti<strong>on</strong>, the same data as used by [23] [79] [49]).<br />
This study will focus <strong>on</strong> both entropy and word error rate improvements. The c<strong>on</strong>clusi<strong>on</strong><br />
seems to be that with increasing amount <strong>of</strong> the training data, the difference in performance<br />
between the RNN models and the back<strong>of</strong>f models is getting larger, which is in c<strong>on</strong>trast to<br />
what was found by Goodman [24] for other advanced LM techniques, such as class <str<strong>on</strong>g>based</str<strong>on</strong>g><br />
models. Experiments with adaptati<strong>on</strong> <strong>of</strong> the RNN language models will be shown <strong>on</strong> this<br />
setup and additi<strong>on</strong>al details and results will be provided for another WSJ setup that can<br />
be much more easily replicated, as it is <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> a new open-source speech recogniti<strong>on</strong><br />
toolkit, Kaldi [60].<br />
Third, results will be shown for the RNN model applied to the state <strong>of</strong> the art speech<br />
recogniti<strong>on</strong> system developed by IBM [30] that was already briefly menti<strong>on</strong>ed in Sec-<br />
ti<strong>on</strong> 2.3.5, where we will compare the performance to the current state <strong>of</strong> the art language<br />
model <strong>on</strong> that set (so-called model M). The language models for this task were trained<br />
<strong>on</strong> approximately 400M words. Achieved word error rate reducti<strong>on</strong>s over the best n-gram<br />
model are relatively over 10%, which is a pro<strong>of</strong> <strong>of</strong> usefulness <strong>of</strong> the techniques developed<br />
in this work.<br />
Lastly, comparis<strong>on</strong> <strong>of</strong> performance <strong>of</strong> RNN and n-gram models will be provided <strong>on</strong> a<br />
novel task ”The Micros<strong>of</strong>t Research Sentence Completi<strong>on</strong> Challenge” [83] that focuses <strong>on</strong><br />
ability <strong>of</strong> artificial language models to appropriately complete a sentence where a single<br />
informative word is missing.<br />
5 Many <strong>of</strong> the experiments described in this work can be reproduced by using a toolkit for training<br />
Recurrent neural network (RNN) language models which can be found at http://www.fit.vutbr.cz/<br />
~imikolov/rnnlm/.<br />
25