LISA: Using JMP to design experiments and analyze the results

LISA: Using JMP to design 

experiments and analyze the results 

Liaosa Xu 

Sept, 2012 

1

Course Outline 

Why We Need Custom Design 

The General Approach 

JMP Examples 

Potential Collinearity Issues 

Prior Design Evaluations 

Augmented Design 

Design from Candidate Set 

2

Why Custom Design 

Sometimes standard designs may not work, 

Computer generated (Custom/Optimal) designs 

are alternatives. 

An irregular experimental region 

Involving categorical and continuous variables 

A nonstandard model 

Unusual sample size requirements 

3


An irregular experimental region (Montgomery 2009) 

 

If the region of interest for the experiment is not a cube or a sphere. 

standard designs may not be possible. 

An experimenter is investigating the properties 

of a particular adhesive. x 1 is the amount of 

adhesive, x 2 is the cure temperature. The prior 

knowledge is: 

a) If too little adhesive and too low cure 

temperature, the parts will not bond. 

b) If both factors are at high levels, the parts will 

be either damaged by heat stress or an 

inadequate bond will result. 

4


Categorical Variables 

Custom design can obtain a model in the 

presence of categorical variables with multiple 

levels. 

Examples of categorical factors are machine, 

operator, solvent and catalyst. 

5


A nonstandard model 

Sometimes the experimenter may have some special 

knowledge or insight about the process being studied 

that may suggest a nonstandard model (specific 

interaction terms and specific quadratic terms. 

For example, the model proposed from prior knowledge 

is 

 

 

Note: this is not full response surface model 

6


Unusual sample size requirements 

Occasionally, we may need reduce the runs required by 

standard designs. 

For example, we intend to fit a second-order model with 

four variables. The model has 15 terms to estimate. Central 

composite design (CCD) requires 26-30 runs. Since the 

runs are expensive or time-consuming, we only could 

afford less than 20 runs. We can use computer-generated 

design to reduce the number of runs. 

7

The General Approach of Custom Design 

The usual approach for Custom Design is: 

1) Specify a model 

2) Determine the region of interest 

Linear Constraints 

3) Select number of runs to make 

4) Specify the optimality criterion 

5) Create the design, consider adding some center-point runs. 

8

Model Specification in Custom Design 

The Model specification: 

All designs are model dependent. 

By default, JMP put all main effects as the model terms. 

Consider adding two way interactions between each pair of 

factors. 

For prediction purpose, consider using Response Surface 

Model, with I-Optimal criterion. 

Use your educated guess to specify the model terms. 

9

Optimality Criterion in Custom Design 

Custom design is also called optimal design since it is the best 

with respect to some criterion. 

Popular choice is D-Optimal design, which gives the most 

precise estimate of the effects jointly. 

D-Optimal designs are most useful to determine the 

important factors in the model . (Most appropriate for 

screening experiment) 

D-Optimal designs are not preferred when primary goal is 

prediction. 

10

Optimality Criterion in Custom Design 

Another choice in JMP is I-Optimal design, which 

seeks to minimize the average prediction variance 

over the design space. 

When the prediction ability of the model is the 

major concern, the I-Optimal Design is preferred. 

JMP selects the I-Optimal Design by default for 

response surface designs. 

11

JMP Example of Custom Design 

A three factor (two numerical plus one categorical) design was used to 

determine the operating conditions for modeling the amount of 

extraction. 

x 1 : centrifuge inlet temperature [40, 80] 

x 2 : extraction temperature [40, 60] 

 

 

x 3 : solvent A, B and C 

We use indicator variable z 1 and z 2 to denote x 3 ’s discrete levels 

z 

1 

1, 

if Ais 

assigned 

 

0, 

otherwise 

The response surface model can be written as 

z 

2 

Centrifuge inlet minus extraction 

>=0 

i.e., x 1 -x 2 ≥0 

1, 

if Bisassigned 

 

0, 

otherwise 

2 2 

y 0 1x1 2x2 12x1x2 11x1 22x2 

z z zx zx zx zx 

1 1 2 2 11 1 1 21 2 1 12 1 2 22 2 2


Design space for x 1 and x 2 

60 

x 1 -x 2 ≥0 

x 2 

50 

Feasible 

Region 

40 

40 50 60 70 80 

x 1 

13


JMP->DOE->Custom Design 

Factors 

Constraints 

Model Specification


Set Random Seed to be 

1000, (Illustration only, 

not in Practice!!) 

Simulate Responses 

(used for 

collinearity 

detection later) 

Optimality 

Criterion 

Set Number of Starts to 

be 1000 

Let’s change to 

D optimality 

Number of 

Runs 

Click here to generate design


Design Output 

After you create the design, get 

some rough evaluations of your 

design before you run it! 

Evaluation Information 

(More details later) 

Randomize your Design 

to make table!!


Simulated response 

You can have your data here!

Collinearity Problems 

Because the custom designs considered were not orthogonal, 

multicollinearity is possible. 

Multicollinearity occurs when two or more predictors in the 

model are correlated and provide redundant information about 

the response. 

It was considered a potential problem for three reasons: 

a ) Large variances and covariances when estimating the 

regression coefficients 

b) The instability and wrong sign of regression coefficients. A 

little “perturbing” in response variables would lead to the large 

change of effects estimation or even opposite signs. 

c) Often confusing and misleading results 

18


Detecting multicollinearity 

Calculate the variance inflation factors (VIF) for each predictor x j : 

19


Detecting multicollinearity using JMP 

▼Red Triangle (next to 

“Model” in the upper-left 

panel of the data table) → 

Run Script 

Click the Run Model 

button 

Scroll to the Parameter 

Estimates section 

Right click on the table 

→ Columns → VIF 

20


Detecting multicollinearity using JMP 

No VIF is larger than 10, no severe multicollinearity in 

this design 

21

Design Evaluations: 

The prediction variance for any factor setting or 

overall design space is the product of the error 

variance and a quantity that depends on the 

design and the factor setting. 

This ratio, called the relative variance of 

prediction, can be calculated before acquiring the 

data. 

It is ideal for the prediction variance to be small 

throughout the allowable regions of the factors.

Design Evaluations-Prediction 


Variance Profile 

Prediction Variance Profile 

The prediction variance 0.5 is 

relative to the error variance. 

For example, if the estimated 

(prior) variance of experimental 

error (MSE) is 10, then the 

prediction variance of y at 

center value of x 1 (=0) is 

10*0.5=5. 

Control-click on the factor to set a factor level of your choice. 

You can drag the vertical trace 

lines to change the factor 

settings to different points.


Prediction Variance Profile 

Maximum Desirability command on the Prediction Variance Profile title 

bar identifies the maximum (as the worst case) prediction variance 

(1.321) for the model. 

Comparing the prediction variance profilers for two designs side-by-side 

is one way to compare two designs.


Fraction of Design Space 

The Fraction of Design Space (FDS) 

plot is a way to see how much of the 

model prediction variance lies above (or 

below) a given value. 

The X axis is the proportion or 

percentage of prediction space, ranging 

from 0 to 100%, and the Y axis is the 

range of prediction variance values. 

Note: 90 th quantile prediction 

variance value is well-suited in a 

variety of scenarios. 

Using the crosshair tool shows that 90% 

of the possible factor settings have a 

relative predictive variance less than 

0.91.

Design Evaluations-Design 

Diagnostics 

Design Diagnostics 

These efficiency measures are single 

numbers attempting to quantify one design 

characteristic. While the maximum efficiency 

is 100 for any criterion, an efficiency of 100% 

is impossible for many design problems. 

D: Minimize the joint confidence 

region for regression coefficients. 

It is best to use these design measures to 

compare two competitive designs with the 

same model and number of runs rather than 

as some absolute measure of design quality. 

G: Minimize the maximum scaled 

prediction variance. 

A: Minimize the sum variances of 

all regression coefficients.

Why Augmented Design? 

Experimentation is an iterative process, we can not assume 

that one successful screening experiment has optimized 

our process. 

Four common reasons for an unsatisfied experiment: 

The specified model is inadequate. 

The results predicted from the experiment are not 

reproducible. 

Many trials failed. 

Important conditions, often an optimum, lie outside the 

experimental region. 

27

Motivation for Augmented Design 

----Model Inadequacy 

The inadequacies of the model may be revealed during 

the analysis of the data. 

The investigated relationship may be more complicated 

than expected. 

Inclusion of higher-order polynomial terms 

Transformations of factors or response 

Detection of ‘lurking’ variables. 

Ranges of some factors are wrong*. 

28


----Failing Trials 

If many individual trials fail, there may not 

be sufficient data to estimate the 

parameters of the model. 

It’s important to find out if there is some 

technical mishap or whether there is 

something more fundamentally amiss. 

29


----Optimum Issues 

To define an optimum of the response or of some 

performance characteristic, this seems to lie appreciably 

outside the present experimental region, experimental 

confirmation of this prediction will be necessary. 

30


We now consider the augmentation of a design by the 

addition of a specified number of new trials. 

The augmentation includes the need for a higher-order 

model, a different design region, the introduction of a 

new factor or deduction of non-important factors. 

The new design will depend on the trials for which the 

response is known, although not usually on the values 

of the responses. 

31

Examples of Augmented Design 

Example A chemical engineer investigates the effects of six 

factors on the percent reaction yield of a chemical process. A 2 6-2 

fractional factorial design (screening design) is implemented with 

2 additional center runs. 

The data has shown the significant curvature for some factors, 

but the collinearity prevents us to identify and determine the 

correct factor(s). 

The engineers also would like to know if it is beneficial to 

increase the amount of catalyst to have a higher yield from 

current concentration of 0.2M. 

With 3 slected factors and catalyst concentrations, the 

augmented design with 8 additional runs is used to fit the 

response surface model for the four factors.

Augmented Design in JMP 

Open Augmented design 18 Runs.JMP in JMP 

Right click red triangle next to Screening Run Script 

33


34 

The screening analysis indicates that there is 

curvature effect for X3, but it is aliased 

completely with all other quadratic factors 

X1 2 and X4 2 .


Right click red triangle next to Full Factorial Model with X1, X3 and 

X4 Run Script 

The lack of fit 

test indicates 

that the full 

factorial 

model is not 

adequate. 

35


Augmented design with different model specification 

DOE Augment Design 

Click OK. 

36


Augmentation Choices Augment 

You can change the 

upper and lower 

limit for new runs, 

which would give 

you different 

prediction region. 

Note, catalyst now 

is with upper 

bound 0.8. 

Usually, the additional runs would be performed in different 

day, we may consider the day as blocking effect in the 

model and check this box. 

37


Specify RSM 

Again, design is model 

dependent, specify the 

correct model is crucial 

for a satisfactory 

experiment, in the 

augmented design, 

usually you would 

change some model 

terms such as 

interactions, quadratic 

effects, etc. 

38


Choose the optimality 

criterion to be I- 

optimal. 

Number of Starts to 

be 1000. 

Random seed to be 

1000. 

Specify the total number 

of runs including original 

runs (18+8=26) 

Click Make Design. 

Click Make Table. 

39


New runs 

grouped in 

block 2. 

40


○ Additional Runs 

∆ Original Design points 

For X1, X3 and X4, the 

original runs are generated 

on corner and center points, 

the added runs are generated 

from axial and facial points.


Open Augmented design 26 Runs.JMP in JMP 

Right click red triangle next to Model Run Script 

The lack of fit test is not significant 

for RS Model. 

From the effect tests, X1 2 and X3 2 

quadratic effects are significant and 

catalyst effect is significant at 

alpha=0.1. 

42


Right click red triangle next to Reduced Model Run Script 

The lack of fit test is not significant 

for RS Model. 

All model effects are significant at 

alpha=0.05. 

43


Right click red triangle next to Prediction Profile Maximize Desirability 

The predicted yield is maximized at X1=1, X3=1, X4=-1 and 

Catalyst=0.8 with predicted value 27.25. Confirm it as an 

additional run. 

44

Design from a Candidate Set 

Motivations for Designs Based on a Candidate Set 

What to consider when using Candidate Set 

Design 

Candidate Set Design in JMP 

45

What is a Candidate Set? 

Candidate Set of Design Points - are the total group of 

possible data points from which the actual design points can 

be chosen. 

For example: to construct quantitative structure–activity 

relationship (QSAR) models, which help summarize a supposed 

relationship between chemical structures and biological activity 

of chemicals, the chemist may search the chemical compound 

database to get some candidate compounds. 

A QSAR has the form as: 

46

Why Design Based on a Candidate Set 

We may not have full control of experimental 

factors and are limited to choice of some factor 

combinations 

The design space is complex with irregular factor 

settings and complicated non-linear factor 

constraints. 

As the term suggests, Candidate Set Design help us 

pick the best design points from the candidate set 

with respect to some criteria. 

47


Example Mitchell (1974a): An animal scientist wants to compare wildlife densities 

in four different habitats over a year. However, due to the cost of experimentation, 

only 12 observations can be made. The following model is postulated for the 

density in habitat during month : 

 

This model includes the habitat as a classification variable , the effect of time 

with an overall linear drift term , and cyclic behavior in the form of a Fourier 

series. There is no intercept term in the model. 

Note, there are 12 Months and 4 habitats, we can create 

a candidate set with 48 points.


Open Mitchell.csv in JMP 

Data set contains the 48 candidate points and includes the four cosine variables 

(c1, c2, c3, and c4) and three sine variables (s1, s2, ands3).


JMP will not do randomization with Candidate Set Design, 

do it manually! 

Due to the limitation of design space, we may end up with 

design with severe collinearity 

Look at Variation Influence Factors (VIF) to assess the 

lack of orthogonality


Be sure to assess the quality of the design (e.g., FDS 

plots, statistical power – relative variance of coefficients) 

Check your final design space to diagnose possible 

problems 

Remember that each design point in the candidate set can 

only be selected once


Change the data type of some factors in JMP file 

Right click on Habitat column → Column 

Information→ Data Type → Character→OK 

You need specify the correct data type in this step since we 

can’t change this in custom design.


DOE Custom Design Add Factor Covariate 

Have to select one 

factor at a time in 

JMP 9. 

JMP will treat Habitat as original categorical data type in 

candidate set file


Model Specification


Custom Design ▼ Optimality Criterion Make D-Optimal Design 

Set seed to be 193030034. 

Set number of start to be 100. 

Specify 12 in Number 

of Runs 

Click Make Design 

Click Make Table


Candidate set design and evaluations 

Non-randomized 

Note for this design, we actually can not do 

randomization due to the property of Month factor. Let’s 

assume it can be randomized for illustration purpose.


To do randomization and VIFs calculation manually 

in JMP you need response data 

Right click on Y column → Formula 

Scroll in the “Functions” box, choose Random → 

Random Uniform, then click the OK button 

Right click on Y column → Sort 

▼Red Triangle (next to “Model” in the upper-left panel of 

the data table) → Run Script 

Check No intercept 

Click the Run Model button 

Scroll to the Parameter Estimates section 

Right click on the table → Columns → VIF


Run the experiments 

according to this 

randomized order. 

VIFs

More capabilities of JMP 

1. Split-Plot/Split-Split-Plot Design 

2. Saturated and Supersaturated Design 

3. Mixture Design 

4. Choice Design Space 

5. Non-linear Design 

6. Space Filling Design / Design for Computer 

Experiments

Workshop 

Scenario 1: Suppose an experimenter is investigating the properties of a 

particular adhesive. is the amount of adhesive, is the cure temperature . 

The prior knowledge is: 

If too little adhesive and too low cure temperature, the parts will not bond. 

. 

If both factors are at high levels, the parts will be either damaged by heat stress 

or an inadequate bond will result. 

 

The model of interest is a non-standard model, i.e., 

 

Also due to the budget constraint, only 30 runs can be offered.

Workshop 

Scenario 2: Meyer, et al. (1996), demonstrates how to use the augment 

designer in JMP to resolve ambiguities left by a screening design. In this study, 

a chemical engineer investigates the effects of five factors on the percent 

reaction of a chemical process. 

To begin, open Reactor 8 Runs.jmp, and augment this design with additional 8 

runs to incorporate All Two-Factor Interactions. 

Set seed to be 12834729. 

After you create the design, you can open Reactor Augment 

Data.jmp to do the variable selection using stepwise regression. 

Note: Choose P-value Threshold from the Stopping Rule menu, Mixed from 

the Direction menu, and make sure Prob to Enter is 0.050 and Prob to 

Leave is 0.100. These are not the default values.

Workshop 

Scenario 3: An automotive engineer wants to fit a quadratic (Response 

Surface) model to fuel consumption data in order to find the values of the 

control variables that minimize fuel consumption (refer to Vance 1986). The 

three control variables AFR (air fuel ratio), EGR (exhaust gas recirculation), 

and SA(spark advance) and their possible settings are shown in the following 

table: 

Variable Values 

AFR 15 16 17 18 

EGR 0.020 0.177 0.377 0.566 0.921 1.117 

SA 10 16 22 28 34 40 46 52 

Rather than run all 192 (4×6×8) combinations of these factors (saved as 

candidate set workshop 3.csv), the engineer would like to see whether the 

total number of runs can be reduced to 50 in an optimal fashion.


The Signal to Noise Ratio (often abbreviated SNR or S/N) is a measure 

used in science and engineering to quantify how much a signal has been 

corrupted by noise. 

It is defined as the ratio of signal power to the noise power corrupting the 

signal. A ratio higher than 1:1 indicates more signal than noise. 

In less technical terms, signal-to-noise ratio compares the level of a 

desired signal (such as music) to the level of background noise. The 

higher the ratio, the less obtrusive the background noise is. 

In statistical analysis, signal in SNR is defined as the regression 

coefficient of model terms, noise is defined as experimental error (Model 

error) in terms of standard deviation.


The Power column shows the 

power of the design as specified to 

detect effects of a certain size 

(SNR) at given significance level. 

Here, assume the model error std. 

dev. ()=2.5, the true coefficient 

value of 2 for X2 is 2.5, then the 

SNR=2.5/2.5=1.0, the probability 

(Power) to identify such effect is 

0.402 at significance level 0.05. 

In JMP, you can change the SNR setting (e.g. Signal to 

Noise=1 i.e. 2 /=1) and significance level (.05).


Design – Power of the Design 

e.x. If we consider the coefficient of 

Extraction Temp is sharp with respect 

to the random noise, say 2 is twice 

( 2 =5.0) as large as the Noise, i.e. 

SNR ( =2), we have more power 

to detect it (0.909 vs 0.402)


Significance Level increases, the Power increase. 

Signal to Noise Ratio increases, the Power increase. 

Note: If your design turns out to have very low power 

with even large Signal to Noise ratio settings, then one 

needs to question whether it is worth running the 

experiment!

LISA: Using JMP to design experiments and analyze the results

Create successful ePaper yourself

Delete template?

Save as template?