02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

WER [%]<br />

12.5<br />

12<br />

11.5<br />

11<br />

10.5<br />

10<br />

9.5<br />

9<br />

10 0<br />

10 1<br />

Figure 5.2: WER <strong>on</strong> Eval 92 after rescoring with increasing size <strong>of</strong> N-best list, the baseline<br />

is obtained with 5-gram model.<br />

probably more useful to either produce wider lattices to allow more diverse paths to be<br />

encoded in the lattice, or to use neural net language models directly during decoding.<br />

5.2.1 Approximati<strong>on</strong> <strong>of</strong> RNNME using n-gram models<br />

As was described in Secti<strong>on</strong> 3.4.3, it is possible to approximate complex generative lan-<br />

guage models by sampling huge amount <strong>of</strong> data and building usual n-gram models <str<strong>on</strong>g>based</str<strong>on</strong>g><br />

<strong>on</strong> the generated data. This can be seen as an attempt to precompute large lookup ta-<br />

ble for likely n-gram entries. To dem<strong>on</strong>strate potential <strong>of</strong> this technique which allows<br />

trivial integrati<strong>on</strong> <strong>of</strong> RNN and RNNME models directly into decoders, 15 billi<strong>on</strong> words<br />

were generated from RNNME-480 model. A Good-Turing smoothed 5-gram model was<br />

built <strong>on</strong> top <strong>of</strong> these data, and various results are reported in Table 5.5. It should be<br />

noted that while using modified Kneser-Ney smoothing provides slightly better results for<br />

standal<strong>on</strong>e models <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> the generated data, the results after interpolati<strong>on</strong> with the<br />

baseline 5-gram model are worse than if Good-Turing smoothing is used.<br />

N<br />

Based <strong>on</strong> the results reported in Table 5.5, it is possible to obtain around 0.6% WER<br />

reducti<strong>on</strong> by rescoring lattices using models that were trained <strong>on</strong> additi<strong>on</strong>al data that<br />

were generated from the RNNME-480 model. However, the n-gram model <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> the<br />

68<br />

10 2<br />

10 3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!