02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

By c<strong>on</strong>sidering popular machine learning techniques from the computati<strong>on</strong>al point <strong>of</strong><br />

view, we can see that the principal comp<strong>on</strong>ent analysis, logistic regressi<strong>on</strong> and neural net-<br />

works use small, fixed algorithms with huge amount <strong>of</strong> learnable parameters. Thus, these<br />

techniques are still somewhere between general learning systems and simple databases.<br />

Based <strong>on</strong> the experimental results described in the previous chapters, we can observe<br />

that by allowing the functi<strong>on</strong> f to learn more complex patterns, the generalizati<strong>on</strong> ability<br />

increases. Thus, neural networks with <strong>on</strong>e hidden layer (with n<strong>on</strong>-linear activati<strong>on</strong> func-<br />

ti<strong>on</strong>) have potential to work better than logistic regressi<strong>on</strong>. From the computati<strong>on</strong>al point<br />

<strong>of</strong> view, we can see the n<strong>on</strong>-linearity in the neural networks as the IF command: we can-<br />

not express regularity in the data such as IF OBJECT IS SMALL AND RED, THE OBJECT<br />

IS APPLE without using the IF command, as weighted combinati<strong>on</strong> such as SMALL IS<br />

APPLE, RED IS APPLE would classify any small object as an apple.<br />

We do not have to stop with just a single n<strong>on</strong>-linearity; it can be seen that some other<br />

patterns can require two subsequent IF commands to be expressed efficiently. It is <strong>of</strong>ten<br />

menti<strong>on</strong>ed that single n<strong>on</strong>-linearity in computati<strong>on</strong>al models is sufficient to approximate<br />

any functi<strong>on</strong> - while this is true, such approximati<strong>on</strong> can collapse to the basic database<br />

approach; the logic in functi<strong>on</strong> f is then expressed in the form <strong>of</strong> large number <strong>of</strong> rules such<br />

as IF (X=1.05 AND Z=2.11) THEN Y=1.14. C<strong>on</strong>trary, deeper architecture can represent<br />

some patterns using much fewer rules, which leads to better generalizati<strong>on</strong> when new data<br />

are presented as inputs.<br />

A possible future work in machine learning can be to use models with more n<strong>on</strong>-<br />

linearities, such as deep neural networks that are composed <strong>of</strong> several layers <strong>of</strong> neur<strong>on</strong>s<br />

with n<strong>on</strong>-linear activati<strong>on</strong> functi<strong>on</strong> [6]. This way, it is possible to express efficiently more<br />

patterns. Recurrent neural networks are another example <strong>of</strong> deep architecture. On the<br />

other hand, while the increased depth <strong>of</strong> the computati<strong>on</strong>al models increases the number <strong>of</strong><br />

patterns that can be expressed efficiently, it becomes more difficult to find good soluti<strong>on</strong>s<br />

using techniques such as simple gradient descent.<br />

Even the deep neural network architectures are severely limited, as the number <strong>of</strong><br />

hidden layers (computati<strong>on</strong>al steps) is defined by the programmer in advance, as a hyper-<br />

parameter <strong>of</strong> the model. However, certain patterns cannot be described efficiently by using<br />

c<strong>on</strong>stant number <strong>of</strong> computati<strong>on</strong>al steps (for example any algorithm that involves loop<br />

with variable amount <strong>of</strong> steps, or recursive functi<strong>on</strong>s). Additi<strong>on</strong>ally, feedforward neural<br />

networks do not have ability to represent patterns such as memory; however, practice shows<br />

104

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!