Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
supervisi<strong>on</strong> such as <strong>on</strong>e digit at a time, we would need <strong>on</strong> average <strong>on</strong>ly 101<br />
2 × 6 guesses.<br />
The situati<strong>on</strong> might be just like this when we learn the language; if we first see a new<br />
word in various simple c<strong>on</strong>texts, it is easier to guess its meaning than if it appears just<br />
<strong>on</strong>ce in a completely novel situati<strong>on</strong>.<br />
Humans do learn incrementally, and that is not a coincidence. It seems that learning<br />
complex functi<strong>on</strong>s that are compositi<strong>on</strong>s <strong>of</strong> other functi<strong>on</strong>s is a highly n<strong>on</strong>-linear problem,<br />
where SGD will not work and random search would take too l<strong>on</strong>g. Thus, part <strong>of</strong> the<br />
soluti<strong>on</strong> seems to be in using training data that would allow simple functi<strong>on</strong>s to be learned<br />
first, and also using machine learning techniques that can grow with the complexity <strong>of</strong> the<br />
problem.<br />
8.4 Proposal for Future Research<br />
Studying problems stated above in the c<strong>on</strong>text <strong>of</strong> statistical language modeling has several<br />
advantages - the amount <strong>of</strong> involved data can be actually pretty low compared to machine<br />
visi<strong>on</strong> problems, and it can be easier to see what is going wr<strong>on</strong>g when the machine is<br />
unable to learn some basic pattern in the language. A motivati<strong>on</strong> example to show what<br />
patterns the n-gram models cannot represent efficiently is a basic memory - c<strong>on</strong>sider the<br />
following sentences:<br />
APPLE IS APPLE<br />
BALL IS BALL<br />
SUN IS SUN<br />
RED IS RED<br />
THIS IS THIS<br />
It is easy for a human to see that the pattern in such text data is actually <strong>of</strong> the form<br />
X Y X, where Y=IS and X is some string <strong>of</strong> characters that is repeated after occurrence <strong>of</strong><br />
Y. It is simple for a human to predict the next characters in a sequence NOVEL IS ...;<br />
however, n-gram models as well as finite state machines cannot be used for such task.<br />
Interestingly, many simple patterns cannot be represented efficiently by usual models,<br />
including neural networks and c<strong>on</strong>text free grammars.<br />
Thus, the most simple proposal for a research that would aim to get closer to the<br />
human level when it comes to language understanding by a computer, would be to first<br />
define incrementally more complex tasks that would involve basic patterns that humans<br />
107