02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.6 C<strong>on</strong>clusi<strong>on</strong> <strong>of</strong> the Model Combinati<strong>on</strong> Experiments<br />

We have achieved a new state <strong>of</strong> the art results <strong>on</strong> the well-known Penn Treebank Corpus,<br />

as we reduced the perplexity from the baseline 141.2 to 83.5 by combining many advanced<br />

language modeling techniques. Perplexity was further reduced to 79.4 by using adaptive<br />

linear interpolati<strong>on</strong> <strong>of</strong> models and by using larger learning rate for dynamic RNN models.<br />

These experiments were already described in [51].<br />

In the subsequent experiments, we were able to obtain perplexity 78.8 by using<br />

in the model combinati<strong>on</strong> also RNNME models that will be described in the Chapter 6.<br />

This corresp<strong>on</strong>ds to 11.8% reducti<strong>on</strong> <strong>of</strong> entropy over 5-gram model with modified<br />

Kneser-Ney smoothing and no count cut<strong>of</strong>fs - this is more than twice more entropy<br />

reducti<strong>on</strong> than the best previously published result <strong>on</strong> the Penn Treebank data set.<br />

It is quite important and interesting to realize that we can actually rely just <strong>on</strong> a<br />

few techniques to reach near-optimal performance. Combinati<strong>on</strong> <strong>of</strong> RNNLMs and KN5<br />

model with a cache is very simple and straightforward. All these techniques are purely<br />

data driven, with no need for extra domain knowledge. This is in c<strong>on</strong>trast to techniques<br />

that rely for example <strong>on</strong> syntactical parsers, which require human-annotated data. Thus,<br />

my c<strong>on</strong>clusi<strong>on</strong> for the experiments with the Penn Treebank corpus is that techniques that<br />

focus <strong>on</strong> the modeling outperform techniques that focus <strong>on</strong> the features and attempt to<br />

incorporate knowledge provided by human experts. This might suggest that the task <strong>of</strong><br />

learning the language should focus more <strong>on</strong> the learning itself, than <strong>on</strong> hand-designing<br />

features and complex models by linguists. I believe that systems that rely <strong>on</strong> the extra<br />

informati<strong>on</strong> provided by humans may be useful in the short term perspective, but from<br />

the l<strong>on</strong>g term <strong>on</strong>e, the machine learning algorithms will improve and overcome the rule<br />

<str<strong>on</strong>g>based</str<strong>on</strong>g> systems, as there is a great availability <strong>of</strong> unstructured data. Just by looking at<br />

the evoluti<strong>on</strong> <strong>of</strong> the speech recogniti<strong>on</strong> field, it is possible to observe this drift towards<br />

statistical learning. Interestingly, also the research scientists from big companies such as<br />

Google claim that systems without special linguistic features work if not the same, then<br />

even better [58].<br />

61

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!