Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
While for any <strong>of</strong> the previous points I would be able to provide at least several references,<br />
it would be better to define how the new techniques should be evaluated, so that scientific<br />
progress would be measurable:<br />
• experiments should be repeatable: public data sets should be used, or data that are<br />
easily accessible to the scientific community<br />
• techniques that aim to become a new state <strong>of</strong> the art should be compared not against<br />
the weakest possible baseline, but against the str<strong>on</strong>gest baseline, such as combinati<strong>on</strong><br />
<strong>of</strong> all known techniques<br />
• to improve repeatability, the code needed for reproducing the experiments should be<br />
released<br />
• review process for accepting papers that propose new techniques should be at least<br />
partially automated, when it comes to verificati<strong>on</strong> <strong>of</strong> the results<br />
While in some cases it might be difficult to satisfy all these points, it is foolish to claim<br />
that new state <strong>of</strong> the art has been reached, after the perplexity against 3-gram model<br />
drops by 2%; still, such results are still being published at the top level c<strong>on</strong>ferences (and<br />
even sometimes win awards as the best papers).<br />
For these reas<strong>on</strong>s, I have decided to release a toolkit that can be used to train RNN<br />
<str<strong>on</strong>g>based</str<strong>on</strong>g> language models, so that the following experiments can be easily repeated. This<br />
toolkit is introduced and described in Appendix A. Moreover, the following experiments<br />
are performed <strong>on</strong> well known setups, with direct comparis<strong>on</strong> to competitive techniques.<br />
4.1 Comparis<strong>on</strong> <strong>of</strong> Different Types <strong>of</strong> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Models</str<strong>on</strong>g><br />
It is very difficult to objectively compare different language modeling techniques: in prac-<br />
tical applicati<strong>on</strong>s, accuracy is sometimes as important as low memory usage and low<br />
computati<strong>on</strong>al complexity. Also, the comparis<strong>on</strong> that can be found in the scientific papers<br />
is in some cases unfair, as models that aim to find different type <strong>of</strong> regularities are some-<br />
times compared. The most obvious example would be a comparis<strong>on</strong> <strong>of</strong> a l<strong>on</strong>g c<strong>on</strong>text and<br />
a short c<strong>on</strong>text model, such as comparing n-gram model to a cache-like model.<br />
A model that has a potential to discover informati<strong>on</strong> <strong>on</strong>ly in a few preceding words (like<br />
n-gram model or a class <str<strong>on</strong>g>based</str<strong>on</strong>g> model) will be further denoted as a ”short-span model”,<br />
while a model that has ability to represent regularities over l<strong>on</strong>g range <strong>of</strong> words (more<br />
45