Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
ecogniti<strong>on</strong>). Maybe even more importantly, several researchers have already pointed out<br />
that building large look-up tables from huge amounts <strong>of</strong> training data (which is equal to<br />
standard n-gram modeling) is not going to provide the ultimate answer to the language<br />
modeling problem, as because <strong>of</strong> curse <strong>of</strong> dimensi<strong>on</strong>ality, we will never have that much<br />
data [5].<br />
The other way around, building language models from huge amounts <strong>of</strong> data (hundreds<br />
<strong>of</strong> billi<strong>on</strong> words or more) is also a very challenging task, and has received recently a lot<br />
<strong>of</strong> attenti<strong>on</strong> [26]. The problems that arise include smoothing, as well as compressi<strong>on</strong><br />
techniques, because it is practically impossible to store the full n-gram models estimated<br />
from such amount <strong>of</strong> data in computer memory. While amount <strong>of</strong> text that is available<br />
<strong>on</strong> the Internet is ever-increasing and computers are getting faster and memory bigger, we<br />
cannot hope to build a database <strong>of</strong> all possible sentences that can ever be said.<br />
In this thesis, recurrent neural network language model (RNN LM) which I have re-<br />
cently proposed in [49, 50] is described, and compared to other successful language mod-<br />
eling techniques. Several standard text corpora are used, which allows to provide detailed<br />
and fair comparis<strong>on</strong> to other advanced language modeling techniques. The aim is at<br />
obtaining the best achievable results by combining all studied models, which leads to a<br />
new state <strong>of</strong> the art performance <strong>on</strong> the standard setup involving part <strong>of</strong> Penn Treebank<br />
Corpus.<br />
Next, it is shown that the RNN <str<strong>on</strong>g>based</str<strong>on</strong>g> language model can be applied to large scale<br />
well-tuned system, and that it provides significant improvements in speech recogniti<strong>on</strong><br />
accuracy. The baseline system for these experiments from IBM (RT04 Broadcast News<br />
speech recogniti<strong>on</strong>) has been recently used in the 2010 Summer Workshop at Johns Hop-<br />
kins University [82]. This system was also used as a baseline for a number <strong>of</strong> papers<br />
c<strong>on</strong>cerning novel type <strong>of</strong> maximum entropy language model, a so-called model M [30] lan-<br />
guage model, which is also used in the performance comparis<strong>on</strong> as it was previously the<br />
state-<strong>of</strong>-the-art language model <strong>on</strong> the given task.<br />
Finally, I try to answer some fundamental questi<strong>on</strong>s <strong>of</strong> language modeling. Namely,<br />
whether the progress in the field is illusory, as is sometimes suggested. And ultimately,<br />
why the new techniques did not reach human performance yet, and what might be the<br />
missing parts and the most promising areas for the future research.<br />
10