Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 3<br />
<strong>Neural</strong> Network <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Models</str<strong>on</strong>g><br />
The use <strong>of</strong> artificial neural networks for sequence predicti<strong>on</strong> is as old as the neural network<br />
techniques themselves. One <strong>of</strong> the first widely known attempts to describe language using<br />
neural networks was performed by Jeff Elman [17], who used recurrent neural network<br />
for modeling sentences <strong>of</strong> words generated by an artificial grammar. The first serious at-<br />
tempt to build a statistical neural network <str<strong>on</strong>g>based</str<strong>on</strong>g> language model <strong>of</strong> real natural language,<br />
together with an empirical comparis<strong>on</strong> <strong>of</strong> performance to standard techniques (n-gram<br />
models and class <str<strong>on</strong>g>based</str<strong>on</strong>g> models) was probably d<strong>on</strong>e by Yoshua Bengio in [5]. Bengio’s<br />
work was followed by Holger Schwenk, who did show that NNLMs work very well in a<br />
state <strong>of</strong> the art speech recogniti<strong>on</strong> systems, and are complementary to standard n-gram<br />
models [68].<br />
However, despite many scientific papers were published after the original Bengio’s<br />
work, no techniques or modificati<strong>on</strong>s <strong>of</strong> the original model that would significantly improve<br />
ability <strong>of</strong> the model to capture patterns in the language were published, at least to my<br />
knowledge 1 . Integrati<strong>on</strong> <strong>of</strong> additi<strong>on</strong>al features into the NNLM framework (such as part<br />
<strong>of</strong> speech tags or morphology informati<strong>on</strong>) has been investigated in [19] [1]. Still, the<br />
accuracy <strong>of</strong> the neural net models remained basically the same, until I have recently shown<br />
that recurrent neural network architecture can work actually better than the feedforward<br />
<strong>on</strong>e [49] [50].<br />
Most <strong>of</strong> the research work did focus <strong>on</strong> overcoming practical problems when using<br />
these attractive models: the computati<strong>on</strong>al complexity was originally too high for real<br />
world tasks. It was reported by Bengio in 2001 that training <strong>of</strong> the original neural net<br />
1 With the excepti<strong>on</strong> <strong>of</strong> Schwenk, who reported better results by using linear interpolati<strong>on</strong> <strong>of</strong> several<br />
neural net models trained <strong>on</strong> the same data, with different random initializati<strong>on</strong> <strong>of</strong> the weights - we denote<br />
this approach further as a combinati<strong>on</strong> <strong>of</strong> NNLMs.<br />
26