23.01.2015 Views

Grégoire de Lassence - Free

Grégoire de Lassence - Free

Grégoire de Lassence - Free

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Grégoire <strong>de</strong> <strong>Lassence</strong><br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

1


Grégoire <strong>de</strong> <strong>Lassence</strong><br />

Responsable Pédagogie et Recherche<br />

Département Académique<br />

Tel : +33 1 60 62 12 19<br />

gregoire.<strong>de</strong>lassence@fra.sas.com<br />

http://www.sas.com/france/aca<strong>de</strong>mic<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


SAS dans le mon<strong>de</strong><br />

1976 : Création en Caroline du Nord<br />

Société privée<br />

CA 2006 : 1,9 milliards $<br />

10 100 employés<br />

24 % du CA réinvesti en R&D<br />

SAS en France<br />

280 collaborateurs<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


SAS Aca<strong>de</strong>mic<br />

Le programme SAS Aca<strong>de</strong>mic développe <strong>de</strong>s partenariats<br />

forts avec les universités et les gran<strong>de</strong>s écoles.<br />

Il a pour objectif <strong>de</strong> :<br />

former les étudiants aux solutions SAS,<br />

fournir <strong>de</strong>s compétences métier reconnues par les entreprises<br />

Son leitmotiv : “Créer le lien entre le mon<strong>de</strong> académique et<br />

celui <strong>de</strong> l’entreprise”.<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

4


Services SAS Aca<strong>de</strong>mic<br />

<br />

<br />

Cours<br />

• Développement <strong>de</strong> programmes, étu<strong>de</strong>s <strong>de</strong> cas, support pédagogique<br />

• Experts SAS,<br />

• e-learning,<br />

• Certification SAS<br />

Club SAS Aca<strong>de</strong>mic<br />

• Universities du Décisionnel SAS (Paques, été, Noël)<br />

• SAS pendant le stage : « CPQ »<br />

• Licence Gratuite à Domicile<br />

• Offres <strong>de</strong> stage et d’embauche <strong>de</strong> nos clients<br />

<br />

Divers<br />

• Newsletter Internationale<br />

• Stu<strong>de</strong>nt Ambassador Competition / Papiers SFF<br />

• Recherche & Chaires<br />

• Evènements & Sponsoring<br />

• …<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

aca<strong>de</strong>mic@fra.sas.com<br />

http://www.sas.com/offices/europe/france/aca<strong>de</strong>mic/in<strong>de</strong>x.html


Pré requis<br />

Les langages SAS, SQL, MDX et Java<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

Site Web :<br />

• http://www.sas.com/<br />

• http://support.sas.com/onlinedoc/913/docMainpage.jsp<br />

• http://www.eisti.fr/~dsi/ http://www.eisti.fr/~info/<br />

• www.bettermanagement.com<br />

• http://<strong>de</strong>cisio.info/<br />

• http://www.sas.com/apps/whitepapers/whitepaper.jsp<br />

• http://www.stat.ucl.ac.be/cours/stat2020/documents/ma<br />

nuels_logiciels/SASV9-Preudhomme.pdf<br />

• http://data.mining.free.fr/<br />

• http://www.lsp.ups-tlse.fr/Besse/<br />

6


Bibliographie<br />

Copyright © 2006, SAS Institute Inc. All rights reserved. 7


Programmes <strong>de</strong> Certification SAS<br />

Généraliste<br />

Spécialiste Data Warehouse<br />

O1<br />

SAS<br />

Certified<br />

Base<br />

Programmer<br />

O2<br />

SAS<br />

Certified<br />

Advanced<br />

Programmer<br />

O3<br />

SAS<br />

Certified<br />

Warehouse<br />

Development<br />

E1<br />

SAS Base<br />

Programming*<br />

SAS<br />

E2<br />

Advanced<br />

Programming**<br />

SAS<br />

E5<br />

Warehouse<br />

Development<br />

Specialist Concepts<br />

SAS<br />

E4<br />

Warehouse<br />

Technology<br />

SAS<br />

E3<br />

Applications<br />

Development<br />

Concepts<br />

*Cours pré-requis : BAS, AVC1<br />

**Cours pré-requis : AVC2, EFFI, SQL, MAC<br />

O4<br />

SAS<br />

Certified<br />

Application<br />

Developer<br />

Spécialiste Développement d'applications<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


Plateforme<br />

décisionnelle<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

9


Quel projet décisionnel <br />

Descriptif<br />

Business Intelligence <br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

10


Prédictif<br />

Analytique<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


Copyright © 2006, SAS Institute Inc. All rights reserved.


Le décisionnel au cœur <strong>de</strong>s processus <strong>de</strong> l’entreprise<br />

Copyright © 2006, 2004, SAS Institute Inc. All rights reserved.<br />

13


Client<br />

Tier<br />

SAS ETL Studio<br />

SAS OLAP Cube Studio<br />

SAS Management Console<br />

SAS Information Map Studio<br />

SAS Enterprise Gui<strong>de</strong><br />

SAS Add-In for Microsoft Office<br />

SAS Web Report Studio<br />

SAS Information Delivery Portal<br />

Middle<br />

Tier<br />

HTTP Server<br />

webDAV<br />

Server<br />

SDK<br />

Java Servlet<br />

Container<br />

Web Infrastructure Kit<br />

Server<br />

Tier<br />

SAS ® 9<br />

Foundation<br />

Workspace<br />

Server<br />

Metadata<br />

Server<br />

Stored Process<br />

Server<br />

SAS/CONNECT<br />

Server<br />

OLAP<br />

Server<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

Le Data Mining avec<br />

Enterprise Miner


Le data mining aujourd’hui<br />

Ces techniques ne sont pas toutes<br />

récentes (années 60-70’s)<br />

Ce qui est nouveau, ce sont surtout :<br />

• quantité <strong>de</strong>s données disponibles<br />

• la puissance <strong>de</strong> calcul <strong>de</strong>s machines<br />

• le retour sur investissement qui peut être<br />

considérable<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


Définition<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

Les 2 familles <strong>de</strong> techniques <strong>de</strong> DM<br />

• Les techniques <strong>de</strong>scriptives :<br />

» segmentation (« clustering »)<br />

» Recherche d’associations (séquences)<br />

» Algorithmes génétiques (SAS OR)<br />

• Les techniques prédictives :<br />

» régression<br />

» arbres <strong>de</strong> décision<br />

» réseaux <strong>de</strong> neurones<br />

» Raisonnement à base <strong>de</strong> cas<br />

» SVM<br />

• Autres choses<br />

Traitement <strong>de</strong> gros volumes et intégration du DM<br />

dans les processus <strong>de</strong> production


Les 10 étapes d’un projet<br />

Choix du sujet - Définition <strong>de</strong>s objectifs<br />

Inventaire <strong>de</strong>s données existantes<br />

Collecte, nettoyage et mise en forme <strong>de</strong>s données<br />

Constitution <strong>de</strong> la base d’analyse<br />

Mise en œuvre <strong>de</strong>s algorithmes (segmentation, scoring…) -<br />

Elaboration <strong>de</strong>s modèles<br />

<br />

<br />

<br />

<br />

<br />

Validation et choix d’un modèle<br />

Déclaration à la CNIL<br />

Déploiement du modèle<br />

Formation <strong>de</strong>s utilisateurs<br />

Analyse <strong>de</strong>s retours <strong>de</strong> l’action et suivi <strong>de</strong>s outils<br />

Source<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

http://data.mining.free.fr/


Les données utilisées en Data Mining<br />

A partir <strong>de</strong>s données opérationnelles :<br />

• Où ( lieux géographiques, Internet, )<br />

• Quand ( Fréquence, récence, )<br />

• Comment ( mo<strong>de</strong> <strong>de</strong> payement, )<br />

• Combien ( nombre <strong>de</strong> TE, )<br />

• Quoi ( Produit, )<br />

•<br />

•<br />

•<br />

Source<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

http://data.mining.free.fr/


Segmentation RFM<br />

Nombre <strong>de</strong><br />

comman<strong>de</strong>s<br />

Récence<br />

4<br />

3<br />

2<br />

1<br />

T – 1<br />

T – 2<br />

1111<br />

1110<br />

1101<br />

1011<br />

0111<br />

1100<br />

1010<br />

1001<br />

0110<br />

0101<br />

1000<br />

0100<br />

T – 3<br />

0011<br />

0010<br />

0001<br />

T - 4<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


Distribution du prénom Charlotte<br />

http://www.meilleursprenoms.com<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


Nuggets<br />

“If you’ve got terabytes of data, and<br />

you’re relying on<br />

data mining to find<br />

interesting things<br />

in there for you,<br />

you’ve lost before<br />

you’ve even begun.”<br />

— Herb E<strong>de</strong>lstein<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


Missing Value Imputation<br />

Inputs<br />

<br />

Cases<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


Mo<strong>de</strong>l Complexity<br />

Too flexible<br />

Not flexible<br />

enough<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


Overfitting<br />

Training Set<br />

Test Set<br />

19 e = 90 % 49 e = 75 %<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


Better Fitting<br />

Training Set<br />

Test Set<br />

34 e = 83% 43 e =78%<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


The Right-Sized Tree<br />

Stunting<br />

Pruning<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

27


A Field Gui<strong>de</strong> to Tree Algorithms<br />

AID<br />

THAID<br />

CHAID<br />

ID3<br />

C4.5<br />

C5.0<br />

CART<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

28


Measurement:<br />

unary - one value<br />

for example, a variable with a particular value that was used to create a data subset<br />

binary - two values<br />

for example, the variable MARITAL that contains No or Yes<br />

nominal - more than two non-numeric values, but no implied or<strong>de</strong>r<br />

for example, STATECOD that contains AK, AL, AR, AZ, etc.<br />

ordinal - more than two but not more than ten numeric values, with implied or<strong>de</strong>r<br />

for example, NUMCARS that contains values from 0 to 3<br />

interval - more than ten numeric values<br />

for example, AMOUNT that contains many different dollar values<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

29


Artificial Neural Networks<br />

Neuron<br />

Hid<strong>de</strong>n Unit<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

30


Multilayer Perceptron<br />

Hid<strong>de</strong>n Layers<br />

Input<br />

Layer<br />

Output Layer<br />

Hid<strong>de</strong>n Unit<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

31


INPUT HIDDEN OUTPUT<br />

AGE<br />

COMBINATION<br />

ACTIVATION<br />

ß 1 + ß 2 AGE+ ß 3 INC<br />

tanh(ß 1 + ß 2 AGE+ ß 3 INC)<br />

=A<br />

INCOME<br />

COMBINATION<br />

ß 4 + ß 5 AGE+ ß 6 INC<br />

ACTIVATION<br />

tanh(ß 4 + ß 5 AGE+ ß 6 INC)<br />

=B<br />

COMBINATION<br />

ß 10 +ß 11 A+ ß 12 B+ß 13 C<br />

COMBINATION<br />

ACTIVATION<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

ß 7 + ß 8 AGE+ ß 9 INC<br />

tanh(ß 7 + ß 8 AGE+ ß 9 INC)<br />

=C<br />

32


Activation Function<br />

Input<br />

Layer<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

33


Universal Approximator<br />

6+A-2B+3C<br />

A<br />

B<br />

C<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

34


Training<br />

• Error Function<br />

• Iterative Optimization<br />

Algorithm<br />

Parameter 1<br />

Parameter 2<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.<br />

35


Association Rules<br />

A B C A C D B C D A D E B C E<br />

Rule<br />

A ⇒ D<br />

C ⇒ A<br />

A ⇒ C<br />

B & C ⇒ D<br />

Support<br />

2/5<br />

2/5<br />

2/5<br />

1/5<br />

Confi<strong>de</strong>nce<br />

2/3<br />

2/4<br />

2/3<br />

1/3<br />

Copyright © 2006, SAS Institute Inc. All rights reserved.


Copyright © 2006, 2003, SAS Institute Inc. All rights reserved. 37

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!