02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Maybe the most popular visi<strong>on</strong> <strong>of</strong> future AI as seen in science ficti<strong>on</strong> movies are<br />

robots and computers communicating with humans using natural language. Turing himself<br />

proposed a test <strong>of</strong> intelligence <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> ability <strong>of</strong> the machine to communicate with humans<br />

using natural language [76]. This choice has several advantages - amount <strong>of</strong> data that<br />

has to be processed can be very small compared to machine that recognizes images or<br />

sounds. Next, machine that will understand just the basic patterns in the language can<br />

be developed first, and scaled up subsequently. The basic level <strong>of</strong> understanding can<br />

be at level <strong>of</strong> a child, or a pers<strong>on</strong> that learns a new language - even such low level <strong>of</strong><br />

understanding is sufficient to be tested, so that it would be possible to measure progress<br />

in ability <strong>of</strong> the machine to understand the language.<br />

Assuming that we would want to build such machine that can communicate in natural<br />

language, the questi<strong>on</strong> is how to do it. Reas<strong>on</strong>able way would be to mimic learning<br />

processes <strong>of</strong> humans. A language is learned by observing the real world, recognizing its<br />

regularities, and mapping acoustic and visual signals to higher level representati<strong>on</strong>s in<br />

the brain and back - the acoustic and visual signals are predicted using the higher level<br />

representati<strong>on</strong>s. Motivati<strong>on</strong> for learning the language is to improve success <strong>of</strong> humans in<br />

the real world.<br />

The whole learning problem might be too difficult to be solved at <strong>on</strong>ce - there are many<br />

open questi<strong>on</strong>s regarding importance <strong>of</strong> individual factors, such as how much data has to<br />

be processed during training <strong>of</strong> the machine, how important is it to learn the language<br />

jointly with observing real world situati<strong>on</strong>s, how important is the innate knowledge, what<br />

is the best formal representati<strong>on</strong> <strong>of</strong> the language, etc. It might be too ambitious to attempt<br />

to solve all these problems together, and to expect too much from models or techniques<br />

that even do not allow existence <strong>of</strong> the soluti<strong>on</strong> (an example might be the well-known<br />

limitati<strong>on</strong>s <strong>of</strong> finite state machines to represent efficiently l<strong>on</strong>ger term patterns).<br />

Important work that has to be menti<strong>on</strong>ed here is the Informati<strong>on</strong> theory <strong>of</strong> Claude<br />

Shann<strong>on</strong>. In his famous paper Entropy <strong>of</strong> printed English [66], Shann<strong>on</strong> tries to estimate<br />

entropy <strong>of</strong> the English text using simple experiments involving humans and frequency<br />

<str<strong>on</strong>g>based</str<strong>on</strong>g> models <strong>of</strong> the language (n-grams <str<strong>on</strong>g>based</str<strong>on</strong>g> <strong>on</strong> history <strong>of</strong> several preceding characters).<br />

The c<strong>on</strong>clusi<strong>on</strong> was that humans are by far better in predicti<strong>on</strong> <strong>of</strong> natural text than n-<br />

grams, especially as the length <strong>of</strong> the c<strong>on</strong>text is increased - this so-called ”Shann<strong>on</strong> game”<br />

can be effectively used to develop more precise test <strong>of</strong> intelligence than the <strong>on</strong>e defined by<br />

Turing. If we assume that the ability to understand the language is equal (or at least highly<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!