Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 6<br />
Strategies for Training Large Scale<br />
<strong>Neural</strong> Network <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Models</str<strong>on</strong>g><br />
The experiments <strong>on</strong> the Penn Treebank Corpus have shown that mixtures <strong>of</strong> recurrent<br />
neural networks trained by backpropagati<strong>on</strong> through time provide state <strong>of</strong> the art results<br />
in the field <strong>of</strong> statistical language modeling. However a remaining questi<strong>on</strong> is, if the<br />
performance would be also this good with much larger amount <strong>of</strong> the training data -<br />
the PTB corpus with about 1M training tokens can be c<strong>on</strong>sidered as very small, because<br />
language models are typically trained <strong>on</strong> corpora with orders <strong>of</strong> magnitude more data.<br />
It is not unusual to work with huge training corpora that c<strong>on</strong>sist <strong>of</strong> much more than a<br />
billi<strong>on</strong> words. While applicati<strong>on</strong> to low resource domains (especially for new languages<br />
and for domains where <strong>on</strong>ly small amount <strong>of</strong> relevant data exists) is also a very interesting<br />
research problem, the most c<strong>on</strong>vincing results are those obtained with well tuned state <strong>of</strong><br />
the art systems, which are trained <strong>on</strong> large amounts <strong>of</strong> data.<br />
The experiments in the previous chapter focused <strong>on</strong> obtaining the largest possible<br />
improvement, however some <strong>of</strong> the approaches would become computati<strong>on</strong>ally difficult<br />
to apply to large data sets. In this chapter, we briefly menti<strong>on</strong> existing approaches for<br />
reducing the computati<strong>on</strong>al complexity <strong>of</strong> neural net language models (most <strong>of</strong> these ap-<br />
proaches are also applicable to maximum entropy language models). We propose two new<br />
simple techniques that can be used to reduce computati<strong>on</strong>al complexity <strong>of</strong> the training<br />
and the test phases. We show that these new techniques are complementary to existing<br />
approaches.<br />
Most interestingly, we show that a standard neural network language model can be<br />
70