02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 5<br />

Wall Street Journal Experiments<br />

Another important data set frequently used by the speech recogniti<strong>on</strong> community for<br />

research purposes is the Wall Street Journal speech recogniti<strong>on</strong> task. In the following<br />

experiments, we aim to:<br />

• show full potential <strong>of</strong> RNN LMs <strong>on</strong> moderately sized task, where speech recogniti<strong>on</strong><br />

errors are mainly caused by the language model (as opposed to acoustically noisy<br />

tasks where it would be more important to work <strong>on</strong> the acoustic models)<br />

• show performance <strong>of</strong> RNN LMs with increasing amount <strong>of</strong> the training data<br />

• provide comparis<strong>on</strong> to other advanced language modeling techniques in terms <strong>of</strong><br />

word error rate<br />

• describe experiments with open source speech recogniti<strong>on</strong> toolkit Kaldi that can be<br />

reproduced<br />

5.1 WSJ-JHU Setup Descripti<strong>on</strong><br />

The experiments in this secti<strong>on</strong> were performed with data set that was kindly shared<br />

with us by researchers from Johns Hopkins university. We report results after rescoring<br />

100-best lists from DARPA WSJ’92 and WSJ’93 data sets - the same data sets were used<br />

by Xu [79], Filim<strong>on</strong>ov [23], and in my previous work [49]. Oracle WER <strong>of</strong> the 100-best<br />

lists is 6.1% for the development set and 9.5% for the evaluati<strong>on</strong> set. Training data for<br />

the language model are the same as used by Xu [79]. The training corpus c<strong>on</strong>sists <strong>of</strong> 37M<br />

words from NYT secti<strong>on</strong> <strong>of</strong> English Gigaword. The hyper-parameters for all RNN models<br />

62

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!