Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.6 C<strong>on</strong>clusi<strong>on</strong> <strong>of</strong> the Model Combinati<strong>on</strong> Experiments<br />
We have achieved a new state <strong>of</strong> the art results <strong>on</strong> the well-known Penn Treebank Corpus,<br />
as we reduced the perplexity from the baseline 141.2 to 83.5 by combining many advanced<br />
language modeling techniques. Perplexity was further reduced to 79.4 by using adaptive<br />
linear interpolati<strong>on</strong> <strong>of</strong> models and by using larger learning rate for dynamic RNN models.<br />
These experiments were already described in [51].<br />
In the subsequent experiments, we were able to obtain perplexity 78.8 by using<br />
in the model combinati<strong>on</strong> also RNNME models that will be described in the Chapter 6.<br />
This corresp<strong>on</strong>ds to 11.8% reducti<strong>on</strong> <strong>of</strong> entropy over 5-gram model with modified<br />
Kneser-Ney smoothing and no count cut<strong>of</strong>fs - this is more than twice more entropy<br />
reducti<strong>on</strong> than the best previously published result <strong>on</strong> the Penn Treebank data set.<br />
It is quite important and interesting to realize that we can actually rely just <strong>on</strong> a<br />
few techniques to reach near-optimal performance. Combinati<strong>on</strong> <strong>of</strong> RNNLMs and KN5<br />
model with a cache is very simple and straightforward. All these techniques are purely<br />
data driven, with no need for extra domain knowledge. This is in c<strong>on</strong>trast to techniques<br />
that rely for example <strong>on</strong> syntactical parsers, which require human-annotated data. Thus,<br />
my c<strong>on</strong>clusi<strong>on</strong> for the experiments with the Penn Treebank corpus is that techniques that<br />
focus <strong>on</strong> the modeling outperform techniques that focus <strong>on</strong> the features and attempt to<br />
incorporate knowledge provided by human experts. This might suggest that the task <strong>of</strong><br />
learning the language should focus more <strong>on</strong> the learning itself, than <strong>on</strong> hand-designing<br />
features and complex models by linguists. I believe that systems that rely <strong>on</strong> the extra<br />
informati<strong>on</strong> provided by humans may be useful in the short term perspective, but from<br />
the l<strong>on</strong>g term <strong>on</strong>e, the machine learning algorithms will improve and overcome the rule<br />
<str<strong>on</strong>g>based</str<strong>on</strong>g> systems, as there is a great availability <strong>of</strong> unstructured data. Just by looking at<br />
the evoluti<strong>on</strong> <strong>of</strong> the speech recogniti<strong>on</strong> field, it is possible to observe this drift towards<br />
statistical learning. Interestingly, also the research scientists from big companies such as<br />
Google claim that systems without special linguistic features work if not the same, then<br />
even better [58].<br />
61