02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 4<br />

Evaluati<strong>on</strong> and Combinati<strong>on</strong><br />

<strong>of</strong> <str<strong>on</strong>g>Language</str<strong>on</strong>g> Modeling Techniques<br />

It is very difficult, if not impossible, to compare different machine learning techniques<br />

just by following their theoretical descripti<strong>on</strong>. The same holds for the numerous language<br />

modeling techniques: almost every <strong>on</strong>e <strong>of</strong> them is well-motivated, and some <strong>of</strong> them even<br />

have theoretical explanati<strong>on</strong> why a given technique is optimal, under certain assumpti<strong>on</strong>s.<br />

The problem is that many <strong>of</strong> such assumpti<strong>on</strong>s are not satisfied in practice, when real<br />

data are used.<br />

Comparis<strong>on</strong> <strong>of</strong> advanced language modeling techniques is usually limited by some <strong>of</strong><br />

these factors:<br />

• private data sets that do not allow experiments to be repeated are used<br />

• ad hoc preprocessing is used that favours the proposed technique, or completely<br />

artificial data sets are used<br />

• comparis<strong>on</strong> to proper baseline is completely missing<br />

• baseline technique is not tuned for the best performance<br />

• in comparis<strong>on</strong>, it is falsely claimed that technique X is the state <strong>of</strong> the art (where X<br />

is usually n-gram model)<br />

• possible comparis<strong>on</strong> to other advanced techniques is d<strong>on</strong>e poorly, by citing results<br />

achieved <strong>on</strong> different data sets, or simply by falsely claiming that the other techniques<br />

are too complex<br />

44

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!