02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 3<br />

<strong>Neural</strong> Network <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Models</str<strong>on</strong>g><br />

The use <strong>of</strong> artificial neural networks for sequence predicti<strong>on</strong> is as old as the neural network<br />

techniques themselves. One <strong>of</strong> the first widely known attempts to describe language using<br />

neural networks was performed by Jeff Elman [17], who used recurrent neural network<br />

for modeling sentences <strong>of</strong> words generated by an artificial grammar. The first serious at-<br />

tempt to build a statistical neural network <str<strong>on</strong>g>based</str<strong>on</strong>g> language model <strong>of</strong> real natural language,<br />

together with an empirical comparis<strong>on</strong> <strong>of</strong> performance to standard techniques (n-gram<br />

models and class <str<strong>on</strong>g>based</str<strong>on</strong>g> models) was probably d<strong>on</strong>e by Yoshua Bengio in [5]. Bengio’s<br />

work was followed by Holger Schwenk, who did show that NNLMs work very well in a<br />

state <strong>of</strong> the art speech recogniti<strong>on</strong> systems, and are complementary to standard n-gram<br />

models [68].<br />

However, despite many scientific papers were published after the original Bengio’s<br />

work, no techniques or modificati<strong>on</strong>s <strong>of</strong> the original model that would significantly improve<br />

ability <strong>of</strong> the model to capture patterns in the language were published, at least to my<br />

knowledge 1 . Integrati<strong>on</strong> <strong>of</strong> additi<strong>on</strong>al features into the NNLM framework (such as part<br />

<strong>of</strong> speech tags or morphology informati<strong>on</strong>) has been investigated in [19] [1]. Still, the<br />

accuracy <strong>of</strong> the neural net models remained basically the same, until I have recently shown<br />

that recurrent neural network architecture can work actually better than the feedforward<br />

<strong>on</strong>e [49] [50].<br />

Most <strong>of</strong> the research work did focus <strong>on</strong> overcoming practical problems when using<br />

these attractive models: the computati<strong>on</strong>al complexity was originally too high for real<br />

world tasks. It was reported by Bengio in 2001 that training <strong>of</strong> the original neural net<br />

1 With the excepti<strong>on</strong> <strong>of</strong> Schwenk, who reported better results by using linear interpolati<strong>on</strong> <strong>of</strong> several<br />

neural net models trained <strong>on</strong> the same data, with different random initializati<strong>on</strong> <strong>of</strong> the weights - we denote<br />

this approach further as a combinati<strong>on</strong> <strong>of</strong> NNLMs.<br />

26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!