Grégoire de Lassence - Free
Grégoire de Lassence - Free
Grégoire de Lassence - Free
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Grégoire <strong>de</strong> <strong>Lassence</strong><br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
1
Grégoire <strong>de</strong> <strong>Lassence</strong><br />
Responsable Pédagogie et Recherche<br />
Département Académique<br />
Tel : +33 1 60 62 12 19<br />
gregoire.<strong>de</strong>lassence@fra.sas.com<br />
http://www.sas.com/france/aca<strong>de</strong>mic<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
SAS dans le mon<strong>de</strong><br />
1976 : Création en Caroline du Nord<br />
Société privée<br />
CA 2006 : 1,9 milliards $<br />
10 100 employés<br />
24 % du CA réinvesti en R&D<br />
SAS en France<br />
280 collaborateurs<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
SAS Aca<strong>de</strong>mic<br />
Le programme SAS Aca<strong>de</strong>mic développe <strong>de</strong>s partenariats<br />
forts avec les universités et les gran<strong>de</strong>s écoles.<br />
Il a pour objectif <strong>de</strong> :<br />
former les étudiants aux solutions SAS,<br />
fournir <strong>de</strong>s compétences métier reconnues par les entreprises<br />
Son leitmotiv : “Créer le lien entre le mon<strong>de</strong> académique et<br />
celui <strong>de</strong> l’entreprise”.<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
4
Services SAS Aca<strong>de</strong>mic<br />
<br />
<br />
Cours<br />
• Développement <strong>de</strong> programmes, étu<strong>de</strong>s <strong>de</strong> cas, support pédagogique<br />
• Experts SAS,<br />
• e-learning,<br />
• Certification SAS<br />
Club SAS Aca<strong>de</strong>mic<br />
• Universities du Décisionnel SAS (Paques, été, Noël)<br />
• SAS pendant le stage : « CPQ »<br />
• Licence Gratuite à Domicile<br />
• Offres <strong>de</strong> stage et d’embauche <strong>de</strong> nos clients<br />
<br />
Divers<br />
• Newsletter Internationale<br />
• Stu<strong>de</strong>nt Ambassador Competition / Papiers SFF<br />
• Recherche & Chaires<br />
• Evènements & Sponsoring<br />
• …<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
aca<strong>de</strong>mic@fra.sas.com<br />
http://www.sas.com/offices/europe/france/aca<strong>de</strong>mic/in<strong>de</strong>x.html
Pré requis<br />
Les langages SAS, SQL, MDX et Java<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
Site Web :<br />
• http://www.sas.com/<br />
• http://support.sas.com/onlinedoc/913/docMainpage.jsp<br />
• http://www.eisti.fr/~dsi/ http://www.eisti.fr/~info/<br />
• www.bettermanagement.com<br />
• http://<strong>de</strong>cisio.info/<br />
• http://www.sas.com/apps/whitepapers/whitepaper.jsp<br />
• http://www.stat.ucl.ac.be/cours/stat2020/documents/ma<br />
nuels_logiciels/SASV9-Preudhomme.pdf<br />
• http://data.mining.free.fr/<br />
• http://www.lsp.ups-tlse.fr/Besse/<br />
6
Bibliographie<br />
Copyright © 2006, SAS Institute Inc. All rights reserved. 7
Programmes <strong>de</strong> Certification SAS<br />
Généraliste<br />
Spécialiste Data Warehouse<br />
O1<br />
SAS<br />
Certified<br />
Base<br />
Programmer<br />
O2<br />
SAS<br />
Certified<br />
Advanced<br />
Programmer<br />
O3<br />
SAS<br />
Certified<br />
Warehouse<br />
Development<br />
E1<br />
SAS Base<br />
Programming*<br />
SAS<br />
E2<br />
Advanced<br />
Programming**<br />
SAS<br />
E5<br />
Warehouse<br />
Development<br />
Specialist Concepts<br />
SAS<br />
E4<br />
Warehouse<br />
Technology<br />
SAS<br />
E3<br />
Applications<br />
Development<br />
Concepts<br />
*Cours pré-requis : BAS, AVC1<br />
**Cours pré-requis : AVC2, EFFI, SQL, MAC<br />
O4<br />
SAS<br />
Certified<br />
Application<br />
Developer<br />
Spécialiste Développement d'applications<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
Plateforme<br />
décisionnelle<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
9
Quel projet décisionnel <br />
Descriptif<br />
Business Intelligence <br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
10
Prédictif<br />
Analytique<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
Copyright © 2006, SAS Institute Inc. All rights reserved.
Le décisionnel au cœur <strong>de</strong>s processus <strong>de</strong> l’entreprise<br />
Copyright © 2006, 2004, SAS Institute Inc. All rights reserved.<br />
13
Client<br />
Tier<br />
SAS ETL Studio<br />
SAS OLAP Cube Studio<br />
SAS Management Console<br />
SAS Information Map Studio<br />
SAS Enterprise Gui<strong>de</strong><br />
SAS Add-In for Microsoft Office<br />
SAS Web Report Studio<br />
SAS Information Delivery Portal<br />
Middle<br />
Tier<br />
HTTP Server<br />
webDAV<br />
Server<br />
SDK<br />
Java Servlet<br />
Container<br />
Web Infrastructure Kit<br />
Server<br />
Tier<br />
SAS ® 9<br />
Foundation<br />
Workspace<br />
Server<br />
Metadata<br />
Server<br />
Stored Process<br />
Server<br />
SAS/CONNECT<br />
Server<br />
OLAP<br />
Server<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
Le Data Mining avec<br />
Enterprise Miner
Le data mining aujourd’hui<br />
Ces techniques ne sont pas toutes<br />
récentes (années 60-70’s)<br />
Ce qui est nouveau, ce sont surtout :<br />
• quantité <strong>de</strong>s données disponibles<br />
• la puissance <strong>de</strong> calcul <strong>de</strong>s machines<br />
• le retour sur investissement qui peut être<br />
considérable<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
Définition<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
Les 2 familles <strong>de</strong> techniques <strong>de</strong> DM<br />
• Les techniques <strong>de</strong>scriptives :<br />
» segmentation (« clustering »)<br />
» Recherche d’associations (séquences)<br />
» Algorithmes génétiques (SAS OR)<br />
• Les techniques prédictives :<br />
» régression<br />
» arbres <strong>de</strong> décision<br />
» réseaux <strong>de</strong> neurones<br />
» Raisonnement à base <strong>de</strong> cas<br />
» SVM<br />
• Autres choses<br />
Traitement <strong>de</strong> gros volumes et intégration du DM<br />
dans les processus <strong>de</strong> production
Les 10 étapes d’un projet<br />
Choix du sujet - Définition <strong>de</strong>s objectifs<br />
Inventaire <strong>de</strong>s données existantes<br />
Collecte, nettoyage et mise en forme <strong>de</strong>s données<br />
Constitution <strong>de</strong> la base d’analyse<br />
Mise en œuvre <strong>de</strong>s algorithmes (segmentation, scoring…) -<br />
Elaboration <strong>de</strong>s modèles<br />
<br />
<br />
<br />
<br />
<br />
Validation et choix d’un modèle<br />
Déclaration à la CNIL<br />
Déploiement du modèle<br />
Formation <strong>de</strong>s utilisateurs<br />
Analyse <strong>de</strong>s retours <strong>de</strong> l’action et suivi <strong>de</strong>s outils<br />
Source<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
http://data.mining.free.fr/
Les données utilisées en Data Mining<br />
A partir <strong>de</strong>s données opérationnelles :<br />
• Où ( lieux géographiques, Internet, )<br />
• Quand ( Fréquence, récence, )<br />
• Comment ( mo<strong>de</strong> <strong>de</strong> payement, )<br />
• Combien ( nombre <strong>de</strong> TE, )<br />
• Quoi ( Produit, )<br />
•<br />
•<br />
•<br />
Source<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
http://data.mining.free.fr/
Segmentation RFM<br />
Nombre <strong>de</strong><br />
comman<strong>de</strong>s<br />
Récence<br />
4<br />
3<br />
2<br />
1<br />
T – 1<br />
T – 2<br />
1111<br />
1110<br />
1101<br />
1011<br />
0111<br />
1100<br />
1010<br />
1001<br />
0110<br />
0101<br />
1000<br />
0100<br />
T – 3<br />
0011<br />
0010<br />
0001<br />
T - 4<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
Distribution du prénom Charlotte<br />
http://www.meilleursprenoms.com<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
Nuggets<br />
“If you’ve got terabytes of data, and<br />
you’re relying on<br />
data mining to find<br />
interesting things<br />
in there for you,<br />
you’ve lost before<br />
you’ve even begun.”<br />
— Herb E<strong>de</strong>lstein<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
Missing Value Imputation<br />
Inputs<br />
<br />
Cases<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
Mo<strong>de</strong>l Complexity<br />
Too flexible<br />
Not flexible<br />
enough<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
Overfitting<br />
Training Set<br />
Test Set<br />
19 e = 90 % 49 e = 75 %<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
Better Fitting<br />
Training Set<br />
Test Set<br />
34 e = 83% 43 e =78%<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
The Right-Sized Tree<br />
Stunting<br />
Pruning<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
27
A Field Gui<strong>de</strong> to Tree Algorithms<br />
AID<br />
THAID<br />
CHAID<br />
ID3<br />
C4.5<br />
C5.0<br />
CART<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
28
Measurement:<br />
unary - one value<br />
for example, a variable with a particular value that was used to create a data subset<br />
binary - two values<br />
for example, the variable MARITAL that contains No or Yes<br />
nominal - more than two non-numeric values, but no implied or<strong>de</strong>r<br />
for example, STATECOD that contains AK, AL, AR, AZ, etc.<br />
ordinal - more than two but not more than ten numeric values, with implied or<strong>de</strong>r<br />
for example, NUMCARS that contains values from 0 to 3<br />
interval - more than ten numeric values<br />
for example, AMOUNT that contains many different dollar values<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
29
Artificial Neural Networks<br />
Neuron<br />
Hid<strong>de</strong>n Unit<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
30
Multilayer Perceptron<br />
Hid<strong>de</strong>n Layers<br />
Input<br />
Layer<br />
Output Layer<br />
Hid<strong>de</strong>n Unit<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
31
INPUT HIDDEN OUTPUT<br />
AGE<br />
COMBINATION<br />
ACTIVATION<br />
ß 1 + ß 2 AGE+ ß 3 INC<br />
tanh(ß 1 + ß 2 AGE+ ß 3 INC)<br />
=A<br />
INCOME<br />
COMBINATION<br />
ß 4 + ß 5 AGE+ ß 6 INC<br />
ACTIVATION<br />
tanh(ß 4 + ß 5 AGE+ ß 6 INC)<br />
=B<br />
COMBINATION<br />
ß 10 +ß 11 A+ ß 12 B+ß 13 C<br />
COMBINATION<br />
ACTIVATION<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
ß 7 + ß 8 AGE+ ß 9 INC<br />
tanh(ß 7 + ß 8 AGE+ ß 9 INC)<br />
=C<br />
32
Activation Function<br />
Input<br />
Layer<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
33
Universal Approximator<br />
6+A-2B+3C<br />
A<br />
B<br />
C<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
34
Training<br />
• Error Function<br />
• Iterative Optimization<br />
Algorithm<br />
Parameter 1<br />
Parameter 2<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.<br />
35
Association Rules<br />
A B C A C D B C D A D E B C E<br />
Rule<br />
A ⇒ D<br />
C ⇒ A<br />
A ⇒ C<br />
B & C ⇒ D<br />
Support<br />
2/5<br />
2/5<br />
2/5<br />
1/5<br />
Confi<strong>de</strong>nce<br />
2/3<br />
2/4<br />
2/3<br />
1/3<br />
Copyright © 2006, SAS Institute Inc. All rights reserved.
Copyright © 2006, 2003, SAS Institute Inc. All rights reserved. 37