Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
• Good theoretical motivati<strong>on</strong><br />
• Simplicity <strong>of</strong> evaluati<strong>on</strong><br />
• Good correlati<strong>on</strong> with system performance<br />
Disadvantages <strong>of</strong> perplexity are:<br />
• It is hard to check that the reported value is correct (mostly normalizati<strong>on</strong> and<br />
”looking into future” related problems)<br />
• Perplexity is <strong>of</strong>ten measured assuming perfect history, while this is certainly not true<br />
for ASR systems: poor performance <strong>of</strong> models that rely <strong>on</strong> l<strong>on</strong>g c<strong>on</strong>text informati<strong>on</strong><br />
(such as cache models) is source <strong>of</strong> c<strong>on</strong>fusi<strong>on</strong> and claims that perplexity is not well<br />
correlated with WER<br />
• Most <strong>of</strong> the research papers compare perplexity values incorrectly - the baseline is<br />
<strong>of</strong>ten suboptimal to ”make the results look better”<br />
Advantages <strong>of</strong> WER:<br />
• Often the final metric we want to optimize; quality <strong>of</strong> systems is usually measured<br />
by some variati<strong>on</strong> <strong>of</strong> WER (such as NIST WER)<br />
• Easy to evaluate, as l<strong>on</strong>g as we have reference transcripti<strong>on</strong>s<br />
Disadvantages <strong>of</strong> WER:<br />
• Results are <strong>of</strong>ten noisy; for small data sets, the variance in WER results can be<br />
absolutely 0.5%<br />
• Overemphasis <strong>on</strong> the frequent, uninformative words<br />
• Reference transcripti<strong>on</strong>s can include errors, spelling mistakes<br />
• Substituted words with the same or similar meaning are as bad mistakes as words<br />
that have the opposite meaning<br />
• Full speech recogniti<strong>on</strong> system is needed<br />
• Improvements are <strong>of</strong>ten task-specific<br />
15