Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
abilities <strong>of</strong> n-grams are stored in precomputed tables), reliability coming from simplicity,<br />
and generality (models can be applied to any domain or language effortlessly, as l<strong>on</strong>g as<br />
there exists some training data). N-gram models are today still c<strong>on</strong>sidered as state <strong>of</strong> the<br />
art not because there are no better techniques, but because those better techniques are<br />
computati<strong>on</strong>ally much more complex, and provide just marginal improvements, not critical<br />
for success <strong>of</strong> given applicati<strong>on</strong>. Thus, large part <strong>of</strong> this thesis deals with computati<strong>on</strong>al<br />
efficiency and speed-up tricks <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> simple reliable algorithms.<br />
The weak part <strong>of</strong> n-grams is slow adaptati<strong>on</strong> rate when <strong>on</strong>ly limited amount <strong>of</strong> in-<br />
domain data is available. The most important weakness is that the number <strong>of</strong> possible<br />
n-grams increases exp<strong>on</strong>entially with the length <strong>of</strong> the c<strong>on</strong>text, preventing these models<br />
to effectively capture l<strong>on</strong>ger c<strong>on</strong>text patterns. This is especially painful if large amounts<br />
<strong>of</strong> training data are available, as much <strong>of</strong> the patterns from the training data cannot be<br />
effectively represented by n-grams and cannot be thus discovered during training. The idea<br />
<strong>of</strong> using neural network <str<strong>on</strong>g>based</str<strong>on</strong>g> LMs is <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> this observati<strong>on</strong>, and tries to overcome the<br />
exp<strong>on</strong>ential increase <strong>of</strong> parameters by sharing parameters am<strong>on</strong>g similar events, no l<strong>on</strong>ger<br />
requiring exact match <strong>of</strong> the history H.<br />
2.3 Advanced <str<strong>on</strong>g>Language</str<strong>on</strong>g> Modeling Techniques<br />
Despite the indisputable success <strong>of</strong> basic n-gram models, it was always obvious that these<br />
models are not powerful enough to describe language at sufficient level. As an introduc-<br />
ti<strong>on</strong> to the advanced techniques, simple examples will be given first to show what n-grams<br />
cannot do. For example, representati<strong>on</strong> <strong>of</strong> l<strong>on</strong>g-c<strong>on</strong>text patters is very inefficient, c<strong>on</strong>sider<br />
the following example:<br />
THE SKY ABOVE OUR HEADS IS BLUE<br />
In such sentence, the word BLUE directly depends <strong>on</strong> the previous word SKY. There is<br />
huge number <strong>of</strong> possible variati<strong>on</strong>s <strong>of</strong> words between these two that would not break such<br />
relati<strong>on</strong>ship - for example, THE SKY THIS MORNING WAS BLUE etc. We can even see that<br />
the number <strong>of</strong> variati<strong>on</strong>s can practically increase exp<strong>on</strong>entially with increasing distance <strong>of</strong><br />
the two words from each other in the sentence - we can create many similar sentences for<br />
example by adding all days <strong>of</strong> week in the sentence, such as:<br />
17