02.04.2013 Views

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

Statistical Language Models based on Neural Networks - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

There is a great variability <strong>of</strong> approaches how <strong>on</strong>e can choose the size <strong>of</strong> the popula-<br />

ti<strong>on</strong>, the probabilities that certain acti<strong>on</strong> will be taken, the way how both crossover and<br />

mutati<strong>on</strong> operators affect the individuals, how populati<strong>on</strong> in new epoch is exactly formed<br />

etc. It is possible to even apply genetic algorithm to this optimizati<strong>on</strong> process itself, to<br />

avoid the need to tune these c<strong>on</strong>stants; this is called meta-genetic programming.<br />

Genetic programming can be successfully applied to small problems and it was reported<br />

that many novel algorithms were found using these or similar approaches. However, for<br />

large problems, the GP seems to be very inefficient; the hope is that through the recom-<br />

binati<strong>on</strong> <strong>of</strong> soluti<strong>on</strong>s using the crossover operator, the GP will find basic atomic functi<strong>on</strong>s<br />

first, and these will be used later to compose more complex functi<strong>on</strong>s. However, it is<br />

sometimes menti<strong>on</strong>ed that GP itself is not really guaranteed to work better than other<br />

simple search techniques, and clearly for difficult large problems, random search is very<br />

inefficient as space <strong>of</strong> possible soluti<strong>on</strong>s increases exp<strong>on</strong>entially with the length <strong>of</strong> their<br />

descripti<strong>on</strong>.<br />

My own attempts to train recurrent neural network language models using genetic al-<br />

gorithms were not very promising; even <strong>on</strong> small problems, the stochastic gradient descent<br />

is orders <strong>of</strong> magnitude faster. However, certain problems cannot be efficiently learned by<br />

SGD, such as storing some informati<strong>on</strong> for l<strong>on</strong>g time periods. For toy problems, RNNs<br />

trained by genetic algorithms were able to easily learn patterns that otherwise cannot be<br />

learned by SGD (such as to store single bit <strong>of</strong> informati<strong>on</strong> for 100 time steps). Thus, an<br />

interesting directi<strong>on</strong> for future research might be to combine GP and SGD.<br />

8.3 Incremental Learning<br />

The way humans naturally approach a complex problem is through its decompositi<strong>on</strong> into<br />

smaller problems that are solved separately. Such incremental approach can be also used<br />

for solving many machine learning problems - even more, it might be crucial to learn com-<br />

plex algorithms that are represented either by deep architectures, or by complex computer<br />

programs. It can be shown that certain complex problems can be solved exp<strong>on</strong>entially<br />

faster, if additi<strong>on</strong>al supervisi<strong>on</strong> in the form <strong>of</strong> simpler examples is provided.<br />

Assume a task to find a six digit number, with a supervisi<strong>on</strong> giving informati<strong>on</strong> if<br />

the candidate number is correct or not. On average, we would need 106<br />

2<br />

attempts to find<br />

the number. However, if we can search for the number incrementally with additi<strong>on</strong>al<br />

106

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!