25.01.2015 Views

An Introduction to Statistical Machine Learning ... - Samy Bengio

An Introduction to Statistical Machine Learning ... - Samy Bengio

An Introduction to Statistical Machine Learning ... - Samy Bengio

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>An</strong> <strong>Introduction</strong> <strong>to</strong><br />

<strong>Statistical</strong> <strong>Machine</strong> <strong>Learning</strong><br />

- <strong>Introduction</strong> -<br />

<strong>Samy</strong> <strong>Bengio</strong><br />

bengio@idiap.ch<br />

Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP)<br />

CP 592, rue du Simplon 4<br />

1920 Martigny, Switzerland<br />

http://www.idiap.ch/~bengio<br />

IDIAP - January 30, 2003


<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong><br />

1. What is <strong>Machine</strong> <strong>Learning</strong><br />

2. Why is it difficult<br />

3. Basic Principles<br />

(a) Occam’s Razor<br />

(b) <strong>Learning</strong> as a Search Problem<br />

4. Types of Problems<br />

(a) Regression<br />

(b) Classification<br />

(c) Density Estimation<br />

5. Applications<br />

6. Documentation<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 2


What is <strong>Machine</strong> <strong>Learning</strong> (Graphical View)<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 3


What is <strong>Machine</strong> <strong>Learning</strong><br />

• <strong>Learning</strong> is an essential human property<br />

• <strong>Learning</strong> means changing in order <strong>to</strong> be better (according <strong>to</strong> a<br />

given criterion) when a similar situation arrives<br />

• <strong>Learning</strong> IS NOT learning by heart<br />

• <strong>An</strong>y computer can learn by heart, the difficulty is <strong>to</strong> generalize a<br />

behavior <strong>to</strong> a novel situation<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 4


Why <strong>Learning</strong> is Difficult<br />

• Given a finite amount of training data, you have <strong>to</strong> derive a<br />

relation for an infinite domain<br />

• In fact, there is an infinite number of such relations<br />

• How should we draw the relation<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 5


Why <strong>Learning</strong> is Difficult (2)<br />

• Given a finite amount of training data, you have <strong>to</strong> derive a<br />

relation for an infinite domain<br />

• In fact, there is an infinite number of such relations<br />

• Which relation is the most appropriate<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 6


Why <strong>Learning</strong> is Difficult (3)<br />

• Given a finite amount of training data, you have <strong>to</strong> derive a<br />

relation for an infinite domain<br />

• In fact, there is an infinite number of such relations<br />

• ... the hidden test points...<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 7


Occam’s Razor’s Principle<br />

• William of Occam: Monk living in the 14th century<br />

• Principle of Parcimony:<br />

One should not increase, beyond what is necessary, the<br />

number of entities required <strong>to</strong> explain anything<br />

• When many solutions are available for a given problem, we<br />

should select the simplest one<br />

• But what do we mean by simple<br />

• We will use prior knowledge of the problem <strong>to</strong> solve <strong>to</strong> define<br />

what is a simple solution<br />

Example of a prior: smoothness<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 8


<strong>Learning</strong> as a Search Problem<br />

Initial solution<br />

Set of solutions chosen a priori<br />

Optimal solution<br />

Set of solutions<br />

compatible with training set<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 9


Types of Problem<br />

• There are 3 kinds of problems:<br />

◦ regression<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 10


Types of Problem<br />

• There are 3 kinds of problems:<br />

◦ regression, classification<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 11


Types of Problem<br />

• There are 3 kinds of problems:<br />

◦ regression, classification, density estimation<br />

P(X)<br />

X<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 12


• Vision Processing<br />

◦ Face detection/verification<br />

◦ Handwritten recognition<br />

• Speech Processing<br />

Applications<br />

◦ Phoneme/Word/Sentence/Person recognition<br />

• Others<br />

◦ Finance: asset prediction, portfolio and risk management<br />

◦ Telecom: traffic prediction<br />

◦ Data mining: make use of huge datasets kept by large<br />

corporations<br />

◦ Games: Backgammon, go<br />

◦ Control: robots<br />

• ... and plenty of others of course!<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 13


Documentation<br />

• <strong>Machine</strong> learning library: www.Torch.ch<br />

• Journals:<br />

◦ Journal of <strong>Machine</strong> <strong>Learning</strong> Research<br />

◦ Neural Computation<br />

◦ IEEE Transactions on Neural Networks<br />

• Conferences:<br />

◦ NIPS: Neural Information Processing Systems<br />

◦ COLT: Computational <strong>Learning</strong> Theory<br />

◦ ICML: International Conference on <strong>Machine</strong> <strong>Learning</strong><br />

◦ ICANN & ESANN: 2 European conferences<br />

• Books:<br />

◦ Bishop, C. Neural Networks for Pattern Recognition, 1995.<br />

◦ Vapnik, V. The Nature of <strong>Statistical</strong> <strong>Learning</strong> Theory, 1995.<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 14


Documentation<br />

• Search engines:<br />

◦ NIPS online: http://nips.djvuzone.org<br />

◦ NEC: http://citeseer.nj.nec.com/cs<br />

• Other lecture notes: (some are in french...)<br />

◦ <strong>Bengio</strong>, Y.: http://www.iro.umontreal.ca/˜bengioy/ift6266/<br />

◦ Kegl, B.: http://www.iro.umontreal.ca/˜kegl/ift6266/<br />

◦ Jordan, M.:<br />

http://www.cs.berkeley.edu/˜jordan/courses/294-fall98/<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Machine</strong> <strong>Learning</strong> 15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!