02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

THE SKY THIS <br />

WAS BLUE<br />

N-gram models with N = 4 are unable to efficiently model such comm<strong>on</strong> patterns in<br />

the language. With N = 10, we can see that the number <strong>of</strong> variati<strong>on</strong>s is so large that we<br />

cannot realistically hope to have such amounts <strong>of</strong> training data that would allow n-gram<br />

models to capture such l<strong>on</strong>g-c<strong>on</strong>text patterns - we would basically have to see each specific<br />

variati<strong>on</strong> in the training data, which is infeasible in practical situati<strong>on</strong>s.<br />

Another type <strong>of</strong> patterns that n-gram models will not be able to model efficiently is<br />

similarity <strong>of</strong> individual words. A popular example is:<br />

PARTY WILL BE ON <br />

C<strong>on</strong>sidering that <strong>on</strong>ly two or three variati<strong>on</strong>s <strong>of</strong> this sentence are present in the training<br />

data, such as PARTY WILL BE ON MONDAY and PARTY WILL BE ON TUESDAY, the n-gram<br />

models will not be able to assign meaningful probability to novel (but similar) sequence<br />

such as PARTY WILL BE ON FRIDAY, even if days <strong>of</strong> the week appeared in the training data<br />

frequently enough to discover that there is some similarity am<strong>on</strong>g them.<br />

As language modeling is closely related to artificial intelligence and language learning,<br />

it is possible to find great amount <strong>of</strong> different language modeling techniques and large<br />

number <strong>of</strong> their variati<strong>on</strong>s across research literature published in the past thirty years.<br />

While it is out <strong>of</strong> scope <strong>of</strong> this work to describe all <strong>of</strong> these techniques in detail, we will<br />

at least make short introducti<strong>on</strong> to the important techniques and provide references for<br />

further details.<br />

2.3.1 Cache <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Models</str<strong>on</strong>g><br />

As stated previously, <strong>on</strong>e <strong>of</strong> the most obvious drawbacks <strong>of</strong> n-gram models is in their<br />

inability to represent l<strong>on</strong>ger term patterns. It has been empirically observed that many<br />

words, especially the rare <strong>on</strong>es, have significantly higher chance <strong>of</strong> occurring again if they<br />

did occur in the recent history. Cache models [32] are supposed to deal with this regularity,<br />

and are <strong>of</strong>ten represented as another n-gram model, which is estimated dynamically from<br />

the recent history (usually few hundreds <strong>of</strong> words are c<strong>on</strong>sidered) and interpolated with the<br />

18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!