02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

While for any <strong>of</strong> the previous points I would be able to provide at least several references,<br />

it would be better to define how the new techniques should be evaluated, so that scientific<br />

progress would be measurable:<br />

• experiments should be repeatable: public data sets should be used, or data that are<br />

easily accessible to the scientific community<br />

• techniques that aim to become a new state <strong>of</strong> the art should be compared not against<br />

the weakest possible baseline, but against the str<strong>on</strong>gest baseline, such as combinati<strong>on</strong><br />

<strong>of</strong> all known techniques<br />

• to improve repeatability, the code needed for reproducing the experiments should be<br />

released<br />

• review process for accepting papers that propose new techniques should be at least<br />

partially automated, when it comes to verificati<strong>on</strong> <strong>of</strong> the results<br />

While in some cases it might be difficult to satisfy all these points, it is foolish to claim<br />

that new state <strong>of</strong> the art has been reached, after the perplexity against 3-gram model<br />

drops by 2%; still, such results are still being published at the top level c<strong>on</strong>ferences (and<br />

even sometimes win awards as the best papers).<br />

For these reas<strong>on</strong>s, I have decided to release a toolkit that can be used to train RNN<br />

<str<strong>on</strong>g>based</str<strong>on</strong>g> language models, so that the following experiments can be easily repeated. This<br />

toolkit is introduced and described in Appendix A. Moreover, the following experiments<br />

are performed <strong>on</strong> well known setups, with direct comparis<strong>on</strong> to competitive techniques.<br />

4.1 Comparis<strong>on</strong> <strong>of</strong> Different Types <strong>of</strong> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Models</str<strong>on</strong>g><br />

It is very difficult to objectively compare different language modeling techniques: in prac-<br />

tical applicati<strong>on</strong>s, accuracy is sometimes as important as low memory usage and low<br />

computati<strong>on</strong>al complexity. Also, the comparis<strong>on</strong> that can be found in the scientific papers<br />

is in some cases unfair, as models that aim to find different type <strong>of</strong> regularities are some-<br />

times compared. The most obvious example would be a comparis<strong>on</strong> <strong>of</strong> a l<strong>on</strong>g c<strong>on</strong>text and<br />

a short c<strong>on</strong>text model, such as comparing n-gram model to a cache-like model.<br />

A model that has a potential to discover informati<strong>on</strong> <strong>on</strong>ly in a few preceding words (like<br />

n-gram model or a class <str<strong>on</strong>g>based</str<strong>on</strong>g> model) will be further denoted as a ”short-span model”,<br />

while a model that has ability to represent regularities over l<strong>on</strong>g range <strong>of</strong> words (more<br />

45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!