02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 6<br />

Strategies for Training Large Scale<br />

<strong>Neural</strong> Network <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Models</str<strong>on</strong>g><br />

The experiments <strong>on</strong> the Penn Treebank Corpus have shown that mixtures <strong>of</strong> recurrent<br />

neural networks trained by backpropagati<strong>on</strong> through time provide state <strong>of</strong> the art results<br />

in the field <strong>of</strong> statistical language modeling. However a remaining questi<strong>on</strong> is, if the<br />

performance would be also this good with much larger amount <strong>of</strong> the training data -<br />

the PTB corpus with about 1M training tokens can be c<strong>on</strong>sidered as very small, because<br />

language models are typically trained <strong>on</strong> corpora with orders <strong>of</strong> magnitude more data.<br />

It is not unusual to work with huge training corpora that c<strong>on</strong>sist <strong>of</strong> much more than a<br />

billi<strong>on</strong> words. While applicati<strong>on</strong> to low resource domains (especially for new languages<br />

and for domains where <strong>on</strong>ly small amount <strong>of</strong> relevant data exists) is also a very interesting<br />

research problem, the most c<strong>on</strong>vincing results are those obtained with well tuned state <strong>of</strong><br />

the art systems, which are trained <strong>on</strong> large amounts <strong>of</strong> data.<br />

The experiments in the previous chapter focused <strong>on</strong> obtaining the largest possible<br />

improvement, however some <strong>of</strong> the approaches would become computati<strong>on</strong>ally difficult<br />

to apply to large data sets. In this chapter, we briefly menti<strong>on</strong> existing approaches for<br />

reducing the computati<strong>on</strong>al complexity <strong>of</strong> neural net language models (most <strong>of</strong> these ap-<br />

proaches are also applicable to maximum entropy language models). We propose two new<br />

simple techniques that can be used to reduce computati<strong>on</strong>al complexity <strong>of</strong> the training<br />

and the test phases. We show that these new techniques are complementary to existing<br />

approaches.<br />

Most interestingly, we show that a standard neural network language model can be<br />

70

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!