- Page 1: VYSOKÉ UČENÍ TECHNICKÉ V BRNĚ
- Page 5 and 6: Contents 1 Introduction 4 1.1 Motiv
- Page 7 and 8: 6.2.3 Reduction of Vocabulary Size
- Page 9 and 10: Maybe the most popular vision of fu
- Page 11 and 12: Chapter 6 presents further extensio
- Page 13 and 14: Chapter 2 Overview of Stati
- Page 15 and 16: 2.1 Evaluation 2.1.1 Perplexity Eva
- Page 17 and 18: ALP can be used to obtain prior pro
- Page 19 and 20: • Good theoretical motivation •
- Page 21 and 22: abilities of n-grams are stored in
- Page 23 and 24: main (static) n-gram model. As the
- Page 25 and 26: There are many popular examples sho
- Page 27 and 28: y Chen et al., who proposed a so-ca
- Page 29 and 30: confusion among researchers, and ma
- Page 31 and 32: language model took almost a week u
- Page 33 and 34: w(t) s(t-1) s(t) U V W y(t) Figure
- Page 35 and 36: ate is halved at start of every new
- Page 37 and 38: or using matrix-vector notation as
- Page 39 and 40: information for more than 5 time st
- Page 41 and 42: A simple solution to the exploding
- Page 43 and 44: output layer changes to computation
- Page 45 and 46: While RNN models can overcome this
- Page 47 and 48: complex or random architectures (su
- Page 49 and 50: While for any of the previous point
- Page 51 and 52: where λ is the interpolation weigh
- Page 53 and 54:
model with default SRILM cutoffs pr
- Page 55 and 56:
experiments, we have used the one i
- Page 57 and 58:
Perplexity (Penn corpus) 145 140 13
- Page 59 and 60:
with syntactical NNLMs would be pre
- Page 61 and 62:
Table 4.3: Combination of individua
- Page 63 and 64:
Table 4.6: Results on Penn Treebank
- Page 65 and 66:
4.6 Conclusion of the Model Combina
- Page 67 and 68:
were: 400 classes, hidden layer siz
- Page 69 and 70:
Entropy per word on the WSJ test da
- Page 71 and 72:
Table 5.3: Results on the WSJ setup
- Page 73 and 74:
Table 5.5: Results for models <stro
- Page 75 and 76:
trained together with a maximum ent
- Page 77 and 78:
wt-3 wt-2 wt-1 D D D P(wt|context)
- Page 79 and 80:
n-gram probabilities. However, it w
- Page 81 and 82:
Table 6.1: Training corpora for NIS
- Page 83 and 84:
Perplexity 360 340 320 300 280 260
- Page 85 and 86:
Entropy per word 9 8.5 8 7.5 7 6.5
- Page 87 and 88:
1 a a a 1 2 3 P(w(t)|*) ONE TWO THR
- Page 89 and 90:
Table 6.4: Perplexity on the evalua
- Page 91 and 92:
Entropy reduction per word over KN4
- Page 93 and 94:
Table 6.6: Perplexity with the new
- Page 95 and 96:
Entropy reduction over KN5 -0.04 -0
- Page 97 and 98:
as a baseline, and 12.3% after resc
- Page 99 and 100:
Table 7.1: BLEU on IWSLT 2005 Machi
- Page 101 and 102:
Table 7.3: Size of compressed text
- Page 103 and 104:
Table 7.4: Accuracy of different la
- Page 105 and 106:
Table 7.6: Entropy on PTB with n-gr
- Page 107 and 108:
8.1 Machine Learning One possible d
- Page 109 and 110:
that almost every non-trivial compu
- Page 111 and 112:
supervision such as one digit at a
- Page 113 and 114:
Chapter 9 Conclusion and Future Wor
- Page 115 and 116:
from the expensive part of the mode
- Page 117 and 118:
Bibliography [1] A. Alexandrescu, K
- Page 119 and 120:
[23] D. Filimonov, M. Harper. A joi
- Page 121 and 122:
[50] T. Mikolov, S. Kombrink, L. Bu
- Page 123 and 124:
[77] W. Wang, M. Harper. The SuperA
- Page 125 and 126:
Test Phase After the model is train
- Page 127 and 128:
• compute sentence-level scores g
- Page 129 and 130:
Appendix B: Data generated from mod
- Page 131 and 132:
Appendix C: Example of decoded utte
- Page 133:
AND SAYS THIS SWING IS WITHIN A REA