02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

abilities <strong>of</strong> n-grams are stored in precomputed tables), reliability coming from simplicity,<br />

and generality (models can be applied to any domain or language effortlessly, as l<strong>on</strong>g as<br />

there exists some training data). N-gram models are today still c<strong>on</strong>sidered as state <strong>of</strong> the<br />

art not because there are no better techniques, but because those better techniques are<br />

computati<strong>on</strong>ally much more complex, and provide just marginal improvements, not critical<br />

for success <strong>of</strong> given applicati<strong>on</strong>. Thus, large part <strong>of</strong> this thesis deals with computati<strong>on</strong>al<br />

efficiency and speed-up tricks <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> simple reliable algorithms.<br />

The weak part <strong>of</strong> n-grams is slow adaptati<strong>on</strong> rate when <strong>on</strong>ly limited amount <strong>of</strong> in-<br />

domain data is available. The most important weakness is that the number <strong>of</strong> possible<br />

n-grams increases exp<strong>on</strong>entially with the length <strong>of</strong> the c<strong>on</strong>text, preventing these models<br />

to effectively capture l<strong>on</strong>ger c<strong>on</strong>text patterns. This is especially painful if large amounts<br />

<strong>of</strong> training data are available, as much <strong>of</strong> the patterns from the training data cannot be<br />

effectively represented by n-grams and cannot be thus discovered during training. The idea<br />

<strong>of</strong> using neural network <str<strong>on</strong>g>based</str<strong>on</strong>g> LMs is <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> this observati<strong>on</strong>, and tries to overcome the<br />

exp<strong>on</strong>ential increase <strong>of</strong> parameters by sharing parameters am<strong>on</strong>g similar events, no l<strong>on</strong>ger<br />

requiring exact match <strong>of</strong> the history H.<br />

2.3 Advanced <str<strong>on</strong>g>Language</str<strong>on</strong>g> Modeling Techniques<br />

Despite the indisputable success <strong>of</strong> basic n-gram models, it was always obvious that these<br />

models are not powerful enough to describe language at sufficient level. As an introduc-<br />

ti<strong>on</strong> to the advanced techniques, simple examples will be given first to show what n-grams<br />

cannot do. For example, representati<strong>on</strong> <strong>of</strong> l<strong>on</strong>g-c<strong>on</strong>text patters is very inefficient, c<strong>on</strong>sider<br />

the following example:<br />

THE SKY ABOVE OUR HEADS IS BLUE<br />

In such sentence, the word BLUE directly depends <strong>on</strong> the previous word SKY. There is<br />

huge number <strong>of</strong> possible variati<strong>on</strong>s <strong>of</strong> words between these two that would not break such<br />

relati<strong>on</strong>ship - for example, THE SKY THIS MORNING WAS BLUE etc. We can even see that<br />

the number <strong>of</strong> variati<strong>on</strong>s can practically increase exp<strong>on</strong>entially with increasing distance <strong>of</strong><br />

the two words from each other in the sentence - we can create many similar sentences for<br />

example by adding all days <strong>of</strong> week in the sentence, such as:<br />

17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!