01.08.2013 Views

Development of a Liquid Scintillator and of Data ... - Borexino - Infn

Development of a Liquid Scintillator and of Data ... - Borexino - Infn

Development of a Liquid Scintillator and of Data ... - Borexino - Infn

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.2 Artificial Neural Networks<br />

this configuration the neurons are grouped in layers: an input layer, one or more hidden layers<br />

<strong>and</strong> an output layer. Each neuron is connected with all the neurons <strong>of</strong> adjacent layers, but no<br />

back-coupling links or links within the same layer are allowed (see fig. 5.5). In principle, one<br />

hidden layer is sufficient to fit any continuous function, but it may well be more practical to<br />

use a network with more than one hidden layer.<br />

Once the weights are set in the training process, the network architecture is completely determined.<br />

There is a variety <strong>of</strong> training methods. In the present case I have used a method<br />

known as back-propagation. A set <strong>of</strong> Monte-Carlo simulated data is presented to the net with<br />

the corresponding output. For each input pattern the energy function is evaluated<br />

ÆÔ<br />

ÆÔ <br />

<br />

Ç Ì<br />

where Ç is the computed network output for the given pattern <strong>and</strong> Ì is the desired value<br />

(target) for the same pattern. The energy function is computed by the net <strong>and</strong> its value is<br />

propagated backwards to modify each weight. The updating function is<br />

¡Û <br />

<br />

Û<br />

Ô ¡Û <br />

<br />

where ¡Û <br />

is the st update <strong>of</strong> the weight Û <strong>and</strong> is the learning parameter,<br />

representing the step length on the error surface that is defined by the value <strong>of</strong> the error as<br />

a function <strong>of</strong> all weights. The last term is a momentum term which stabilizes the training by<br />

avoiding oscillations. The strength <strong>of</strong> the damping is set by the parameter Ô which should be<br />

between 0 <strong>and</strong> 1. The updating <strong>of</strong> the weights can be done after each pattern, after a group <strong>of</strong><br />

patterns or just once after the complete training set, called one epoch. The smaller the number<br />

<strong>of</strong> patterns the energy function is averaged over, the faster, but the more unstable, the training<br />

process can be. The training starts with r<strong>and</strong>omly selected weights <strong>and</strong> relatively high values<br />

<strong>of</strong> the learning rate <strong>and</strong> the momentum; the factors are changed during the training over several<br />

thous<strong>and</strong>s <strong>of</strong> epochs, until the error function reaches a minimum on a validation set, a Monte<br />

Carlo generated set different from the learning set.<br />

It is important that the input variables are <strong>of</strong> the same order <strong>of</strong> magnitude <strong>and</strong> range in order to<br />

obtain stable convergence towards the minimum: the simplest method to obtain this is to renormalize<br />

the input variables to an interval ℄. Furthermore, in order to avoid local minima,<br />

it is recommended to pick the input event at r<strong>and</strong>om from the simulated set, <strong>and</strong>, to prevent<br />

overfitting, to have the number <strong>of</strong> simulated learning patterns at least one order <strong>of</strong> magnitude<br />

higher than the number <strong>of</strong> weights in the net. The problem <strong>of</strong> overfitting arises when there are<br />

too many weights in the net with respect to the number <strong>of</strong> learning patterns, so that the net<br />

becomes very specialized in fitting only those patterns <strong>and</strong> loses its generalization capability.<br />

For the NN computations the Fortran package JetNet 3.4 [Pet93] was used. A three-layer NN<br />

was chosen with a sigmoidal activation function having the output in the range ℄. The<br />

training was performed using the back-propagation algorithm. The training was stopped when<br />

the error function behaviour became flat.<br />

67

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!