02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

supervisi<strong>on</strong> such as <strong>on</strong>e digit at a time, we would need <strong>on</strong> average <strong>on</strong>ly 101<br />

2 × 6 guesses.<br />

The situati<strong>on</strong> might be just like this when we learn the language; if we first see a new<br />

word in various simple c<strong>on</strong>texts, it is easier to guess its meaning than if it appears just<br />

<strong>on</strong>ce in a completely novel situati<strong>on</strong>.<br />

Humans do learn incrementally, and that is not a coincidence. It seems that learning<br />

complex functi<strong>on</strong>s that are compositi<strong>on</strong>s <strong>of</strong> other functi<strong>on</strong>s is a highly n<strong>on</strong>-linear problem,<br />

where SGD will not work and random search would take too l<strong>on</strong>g. Thus, part <strong>of</strong> the<br />

soluti<strong>on</strong> seems to be in using training data that would allow simple functi<strong>on</strong>s to be learned<br />

first, and also using machine learning techniques that can grow with the complexity <strong>of</strong> the<br />

problem.<br />

8.4 Proposal for Future Research<br />

Studying problems stated above in the c<strong>on</strong>text <strong>of</strong> statistical language modeling has several<br />

advantages - the amount <strong>of</strong> involved data can be actually pretty low compared to machine<br />

visi<strong>on</strong> problems, and it can be easier to see what is going wr<strong>on</strong>g when the machine is<br />

unable to learn some basic pattern in the language. A motivati<strong>on</strong> example to show what<br />

patterns the n-gram models cannot represent efficiently is a basic memory - c<strong>on</strong>sider the<br />

following sentences:<br />

APPLE IS APPLE<br />

BALL IS BALL<br />

SUN IS SUN<br />

RED IS RED<br />

THIS IS THIS<br />

It is easy for a human to see that the pattern in such text data is actually <strong>of</strong> the form<br />

X Y X, where Y=IS and X is some string <strong>of</strong> characters that is repeated after occurrence <strong>of</strong><br />

Y. It is simple for a human to predict the next characters in a sequence NOVEL IS ...;<br />

however, n-gram models as well as finite state machines cannot be used for such task.<br />

Interestingly, many simple patterns cannot be represented efficiently by usual models,<br />

including neural networks and c<strong>on</strong>text free grammars.<br />

Thus, the most simple proposal for a research that would aim to get closer to the<br />

human level when it comes to language understanding by a computer, would be to first<br />

define incrementally more complex tasks that would involve basic patterns that humans<br />

107

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!