# AB Screening strateg.. - MUCM AB Screening strateg.. - MUCM

Outline

Screening

Alexis Boukouvalas

Neural Computing Research Group, Aston University

June 29 th , 2007

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Outline

Outline

1 Theory

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

2 Experimental Results

Morris Method and Noise

Full Method Comparison

Number of Observations Required

3 Conclusions - Open Questions

4 Reference Slides

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

Project part of the Managing Uncertainty in Complex Models

project

Goal is to examine and extend screening techniques for complex

emulator models to high dimensional input/output spaces.

Emulator setting: a deterministic piece of code is approximated

by an emulator (e.g. Gaussian Process). The code output is

assumed noise-free.

Eliminating unneeded inputs offers both interpretability and model

efficiency

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Definitions

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

Experimental Design The use of mathematical and statistical

methods to select the minimum number of experiments for

optimal coverage of descriptor or variable space.

Latin Hypercube Uniform sampling A square grid containing

sample positions is a Latin square if (and only if) there is only one

sample in each row and each column. A Latin hypercube is the

generalisation of this concept to an arbitrary number of

dimensions, whereby each sample is the only one in each

axis-aligned hyperplane containing it.

Simulator A simulation is an imitation of some real thing, state of

affairs, or process. The act of simulating something generally

entails representing certain key characteristics or behaviors of a

selected physical or abstract system.

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Screening Overview

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

Variable selection methods have been broadly categorised in the

following categories [Guyon 2003]

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Principal Variables

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

Unsupervised method originally proposed by [McCabe,1984] and

extended by [Cumming,2007]

Four optimality criteria, [Cumming,2007] uses

min||(Σ22.1)|| 2 = min∑ i

Σ22.1 is the conditional covariance matrix of the variables not

selected given those selected.

Σ22.1 represents the information left in the variables that are not

selected

Basic algorithm uses forward selection

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

λi

Algorithm

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

S22.1 = R

For each variable j not yet selected compute hj = ∑ p

2

i=1

(λiαji)

where λi the eigenvalue and αji the eigenvector element of the

conditional correlation matrix S22.1

Select variable with largest hj

Obtain the unscaled partial correlation matrix

S22.1 = R22 − R21R −1

11 R12 where R11 the correlation of the

variables in the selected set, R22 the outstanding variables and

R12,R21 the cross correlations between the two sets

Loop until desired subset size is reached

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Results

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

For simple regression model y = x1 + x2 with no noise added on

domain [0,4] with a Morris design (4 levels,r=10) and 4 extra

variables with 70 observations

Var h

X5 1.1529

X2 1.0358

X1 0.9920

X3 0.9678

X4 0.8589

X6 0.8886

Principal Variables applicable when we have a good estimated of

the input distribution or have real observations

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Least Angle Regression (LARS)

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

Less greedy version of traditional forward selection

Based on a linear model µ = ∑ m j=1 xjβj

Algorithm takes m (number of variables) steps to complete

At each step a factor is added to the model µA+ = µA + γuA

uA is a constrained linear combination of variables

The input variables are assumed to be linearly independent

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

LASSO

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

As has been described in [Efron,Hastie 2002] the LASSO and

stagewise forward selection are special cases of the LARS method.

LASSO is defined as Total squared error

Absolute norm

S(β) =

T (β) =

n

∑(yi

− µi)

i=1

2

m

j=1

(1)

|βj| (2)

The algorithm aims to minimize S(β) subject to T (β) ≤ t for a bound t.

The authors proved this is equivalent to the LARS algorithm with an

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Morris Method

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

Generates r trajectory designs and computes elementary effects.

Each point in the trajectory differs from the previous point in only

one variable by a fixed ∆. For k variables, trajectory has k + 1

points, changing each variable exactly once. Start point is

random.

Compute elementary effect for each variable di(xl) = y(xl+1)−y(xl)

Compute moments of distribution for each variable

µ∗ =

r

∑ |

i=1

di

r

| µ =

r

i=1

di

r

σ =

r

(di − µ)

i=1

2

r

µ∗ is total effect measure, a high value indicating large influence.

σ indicates non-linear and interaction effects.

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

f(x) = 3x1 + x 2 2 + x2 × sin(x3) r=10 levels=4 ∆=2

σσ

0.00 0.04 0.08 0.12

X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15

X3

0.0 0.2 0.4 0.6

μ *

X1

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

X2

Var µ∗ σ

X1 0.409 6.409e-17

X2 0.657 1.407e-01

X3 0.179 1.085e-01

X4...15 0 0

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

f(x) ′ = f(x)+N(0,σn) Signal Variance:Noise Variance=10:1

σ

0.15 0.20 0.25 0.30

X7

X15

X14 X3

X9

X4

X8

X13

X5

X10

X12

X11

X6

X1

0.0 0.2 0.4 0.6 0.8 1.0

μ *

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

X2

Var µ∗ σ

X2 0.768 0.256

X1 0.346 0.207

X6 0.304 0.318

X3 0.234 0.264

X14 0.213 0.262

X7 0.212 0.286

X9 0.211 0.254

X15 0.201 0.275

X11 0.183 0.233

X5 0.155 0.182

X4 0.154 0.204

X13 0.146 0.178

X8 0.140 0.184

X10 0.136 0.167

X12 0.122 0.131

Embedded Methods

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

The only embedded method examined was Automatic Relevance

Determination (ARD) where the characteristic length scales λi

determine the input relevance

Two covariance functions were examined

Squared Exponential (SE)

k(xp,xq) = σ 2 f exp(−(xp − xq) ′ × P −1 × (xp − xq)/2) + σ 2 nδpq

k(xp,xq) = σ 2 f [1 + (xp − xq) τ × P −1 × (xp − xq)/(2α)] −α

where σn is the variance of the assumed i.i.d Gaussian noise, σf is the

signal variance and P = diag(λ) 2

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Introduction and Motivation

Overview of methods

Principal Variables

Least Angle Regression

Morris Method

ARD

The results presented use the RQ covariance function rather than the

SE since

SE proved less accurate performing similarly to the linear

methods examined

The SE covariance required the addition of the noise

hyperparameter σn without which it proved numerically unstable.

Thus although the functions examined are infinitely differentiable

making the SE function theoretically appropriate, in preliminary results

the RQ performed better.

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Morris Method and Noise

Full Method Comparison

Number of Observations Required

ARD or Morris?

f(x) ′ = 3x1 +x 2 2 +x2 ×sin(x3)+N(0,var(f(x))×Noiselevel)

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Noise lvl Morris ARD Morris ARD LHU

0 0 0 0

0.1 1.3 0.7 0

0.3 2.4 2.0 0

0.7 2.3 2.1 0

1.5 2.6 2.4 0

3.1 2.5 2.4 0

6.3 2.7 2.6 1

12.7 2.7 2.4 1

25.5 2.6 2.7 1

51.1 2.6 2.5 2

Table: Average Variable Mistakes

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Morris Method and Noise

Full Method Comparison

Number of Observations Required

Morris and noise? f(x) ′ = f(x) + N(0,var(f(x)) × 0.3)

Can we fix Morris for noisy case by changing R or ∆?

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Experimental setup

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

No noise was added to the input

Morris Method and Noise

Full Method Comparison

Number of Observations Required

All variables (incl.augmented) were sampled uniformly from latin

hypercube or from the Morris trajectory design

Results have been average over 5 realizations

Ratio of truly relevant to irrelevant variables is constant at 1:4

Intrinsic Dimensionality is assumed to be known and is fixed

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Methods tested

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Morris Method and Noise

Full Method Comparison

Number of Observations Required

CorrCoef: Pearson Correlation Coefficient (variable ranking)

LinFS: Employ a forward selection subset selection strategy using

a multivariate linear regression model.

LARS - LASSO

ARD: Employ the ARD method with an RQ covariance to rank the

input variables.

Morris with R=10, jump=1/4, levels=4

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Morris Method and Noise

Full Method Comparison

Number of Observations Required

f(x) = ∑ D i=1 sin(xi) D=2/8 - 11/44 r=10 ∆=2

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Morris Method and Noise

Full Method Comparison

Number of Observations Required

f(x) = ∑ D i=2 sin(xixi−1) D=2/8 - 16/64 r=10 ∆=2

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Morris Method and Noise

Full Method Comparison

Number of Observations Required

f(x) = ∑ D i=1 −xisin( |xi|) D=2/8 - 19/76 r=10 ∆=2

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Elapsed Time

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Morris Method and Noise

Full Method Comparison

Number of Observations Required

Optimization time for ARD kernel grows exponentially

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Morris Method and Noise

Full Method Comparison

Number of Observations Required

Observations required for 90% accuracy f(x) = ∑ D i=1 sin(xi)

Noise lvl=10 −4

Figure: Alexis Boukouvalas, SQ Exponential NCRG Seminar 29/06/2007 Screening

Figure: RQ Kernel

Decision Flow

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Open questions

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Hierarchical models, i.e. grouping variables

Incorporating expert knowledge of relevant inputs

Sequential methods + design

Sparse Sequential Gaussian Processes for sequential screening

and emulation of large models (where large training datasets will

be required even when the experimental design is optimal)

Model selection using screening techniques for models linear in

parameters

Projective methods in conjunction with screening (e.g. PCA for

upper bound on dimensionality)

Multiple Outputs

Morris method and weak effect variables - are they screened out?

(Gradient not zero, choice of ∆ important?)

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Summary

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

If model is available and deterministic, Morris method can be very

effective and efficient since number of model evaluations grows

linearly with number of variables.

ARD is relatively accurate and widely applicable but does not

scale well. Furthermore choice of kernel and sufficient

optimization are critical.

Linear methods are not as robust

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Screening Overview

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Variable selection methods have been broadly categorised in three

categories [Guyon 2003]

Variable Ranking. Input variables are ranked according to the

prediction accuracy of each input calculated against the model

output.

Wrapper methods. The emulator is used to assess the predictive

power of subsets of variables

Embedded methods. For both variable ranking and wrapper

methods, the emulator is considered a perfect black box. In

embedded methods, the variable selection is done as part of the

training of the emulator.

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Wrapper Methods

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Forward selection where variables are progressively incorporated

in larger and larger subsets

Backward elimination proceeds in the opposite direction.

Efroymsons algorithm aka stepwise selection. Proceed as

forward selection but after each variable is added, check if any of

the selected variables can be deleted without significantly

Exhaustive search where all possible subsets are considered.

Branch and Bound. Eliminate subset choices as early as

possible. E.g. is variables A-Z, RSS of A,B subset 100, then C-Z

subset branch need not be followed if RSS of all C-Z variables >

100.

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Least Angle Regression (LARS)

Less greedy version of traditional forward selection

Based on a linear model µ = ∑ m j=1 xjβj

Algorithm takes m (number of variables) steps to complete

Construct uA the bisector of the active set A

Find γ=min( C+cj

) where C is maximum current correlation

A+αj

αj the inner product x ′ j uA

Update µA+ = µA + γuA

At each step the maximal current correlations c = X τ (y − µ)

decline equally

LASSO equivalency: constraint LARS that the sign of any

non-zero βj must agree with the sign of the current correlation cj

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Sequential screening

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Authors of LARS propose main effects measure first, then second

order effects only on variables with strong main effects etc

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

Morris Method

Theory

Experimental Results

Conclusions - Open Questions

Reference Slides

Most recommended method from sensitivity analysis area [Saltelli,2000]

Generates r trajectory designs and computes elementary effects.

Each point in the trajectory differs from the previous point in only one variable by a fixed ∆. For k variables, trajectory

has k + 1 points, changing each variable exactly once. Start point is random.

Compute elementary effect for each variable. If x l increased by ∆ d i (x l ) = y(x l+1 )−y(x l )

Compute moments of distribution for each variable

µ∗

=

σ =

where µ computed as µ∗ without using absolute values

r

∑ |

i=1

di r |

r

(di − µ)

i=1

2

r

µ∗ is total effect measure, a large value indicating large influence. σ indicates non-linear and interaction effects.

Alexis Boukouvalas, NCRG Seminar 29/06/2007 Screening

More magazines by this user
Similar magazines