02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

correlated) to the ability to predict words in a given c<strong>on</strong>text, then we can formally measure<br />

quality <strong>of</strong> our artificial models <strong>of</strong> natural languages. This AI test has been proposed for<br />

example in [44] and more discussi<strong>on</strong> is given in [42].<br />

While it is likely that attempts to build artificial language models that can understand<br />

text in the same way as humans do just by reading huge quantities <strong>of</strong> text data is unreal-<br />

istically hard (as humans would probably fail in such task themselves), language models<br />

estimated from huge amounts <strong>of</strong> data are very interesting due to their practical usage in<br />

wide variety <strong>of</strong> commercially successful applicati<strong>on</strong>s. Am<strong>on</strong>g the most widely known <strong>on</strong>es<br />

are the statistical machine translati<strong>on</strong> (for example popular Google Translate) and the<br />

automatic speech recogniti<strong>on</strong>.<br />

The goal <strong>of</strong> this thesis is to describe new techniques that have been developed to<br />

overcome the simple n-gram models that still remain basically state-<strong>of</strong>-the-art today. To<br />

prove usefulness <strong>of</strong> the new approaches, empirical results <strong>on</strong> several standard data sets<br />

will be extensively described. Finally, approaches and techniques that can possibly lead to<br />

automatic language learning by computers will be discussed, together with a simple plan<br />

how this could be achieved.<br />

1.2 Structure <strong>of</strong> the Thesis<br />

Chapter 2 introduces the statistical language modeling and mathematically defines the<br />

problem. Simple and advanced language modeling techniques are discussed. Also, the<br />

most important data sets that are further used in the thesis are introduced.<br />

Chapter 3 introduces neural network language models and the recurrent architecture,<br />

as well as the extensi<strong>on</strong>s <strong>of</strong> the basic model. The training algorithm is described in detail.<br />

Chapter 4 provides extensive empirical comparis<strong>on</strong> <strong>of</strong> results obtained with various<br />

advanced language modeling techniques <strong>on</strong> the Penn Treebank setup, and results after<br />

combinati<strong>on</strong> <strong>of</strong> these techniques.<br />

The Chapter 5 focuses <strong>on</strong> the results after applicati<strong>on</strong> <strong>of</strong> the RNN language model<br />

to standard speech recogniti<strong>on</strong> setup, the Wall Street Journal task. Results and com-<br />

paris<strong>on</strong> are provided <strong>on</strong> two different setups; <strong>on</strong>e is from the Johns Hopkins University<br />

and allows comparis<strong>on</strong> with competitive techniques such as discriminatively trained LMs<br />

and structured LMs, and the other setup was obtained with an open-source ASR toolkit,<br />

Kaldi.<br />

6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!