02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ecogniti<strong>on</strong>). Maybe even more importantly, several researchers have already pointed out<br />

that building large look-up tables from huge amounts <strong>of</strong> training data (which is equal to<br />

standard n-gram modeling) is not going to provide the ultimate answer to the language<br />

modeling problem, as because <strong>of</strong> curse <strong>of</strong> dimensi<strong>on</strong>ality, we will never have that much<br />

data [5].<br />

The other way around, building language models from huge amounts <strong>of</strong> data (hundreds<br />

<strong>of</strong> billi<strong>on</strong> words or more) is also a very challenging task, and has received recently a lot<br />

<strong>of</strong> attenti<strong>on</strong> [26]. The problems that arise include smoothing, as well as compressi<strong>on</strong><br />

techniques, because it is practically impossible to store the full n-gram models estimated<br />

from such amount <strong>of</strong> data in computer memory. While amount <strong>of</strong> text that is available<br />

<strong>on</strong> the Internet is ever-increasing and computers are getting faster and memory bigger, we<br />

cannot hope to build a database <strong>of</strong> all possible sentences that can ever be said.<br />

In this thesis, recurrent neural network language model (RNN LM) which I have re-<br />

cently proposed in [49, 50] is described, and compared to other successful language mod-<br />

eling techniques. Several standard text corpora are used, which allows to provide detailed<br />

and fair comparis<strong>on</strong> to other advanced language modeling techniques. The aim is at<br />

obtaining the best achievable results by combining all studied models, which leads to a<br />

new state <strong>of</strong> the art performance <strong>on</strong> the standard setup involving part <strong>of</strong> Penn Treebank<br />

Corpus.<br />

Next, it is shown that the RNN <str<strong>on</strong>g>based</str<strong>on</strong>g> language model can be applied to large scale<br />

well-tuned system, and that it provides significant improvements in speech recogniti<strong>on</strong><br />

accuracy. The baseline system for these experiments from IBM (RT04 Broadcast News<br />

speech recogniti<strong>on</strong>) has been recently used in the 2010 Summer Workshop at Johns Hop-<br />

kins University [82]. This system was also used as a baseline for a number <strong>of</strong> papers<br />

c<strong>on</strong>cerning novel type <strong>of</strong> maximum entropy language model, a so-called model M [30] lan-<br />

guage model, which is also used in the performance comparis<strong>on</strong> as it was previously the<br />

state-<strong>of</strong>-the-art language model <strong>on</strong> the given task.<br />

Finally, I try to answer some fundamental questi<strong>on</strong>s <strong>of</strong> language modeling. Namely,<br />

whether the progress in the field is illusory, as is sometimes suggested. And ultimately,<br />

why the new techniques did not reach human performance yet, and what might be the<br />

missing parts and the most promising areas for the future research.<br />

10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!