Instructor's Solutions Manual for Learning SAS in the Computer Lab

Instructor Solutions Manual 

for 

Learning SAS in the Computer Lab 

3 rd EDITION 

Rebecca J. Elliott 

Statistically Significant 

Christopher H. Morrell 

Loyola University Maryland 

Prepared by 

Christopher H. Morrell 

Loyola University Maryland 

Australia • Brazil • Japan • Korea • Mexico • Singapore • Spain • United Kingdom • United States

© 2010 Brooks/Cole, Cengage Learning 

ALL RIGHTS RESERVED. No part of this work covered by the 

copyright herein may be reproduced, transmitted, stored, or 

used in any form or by any means graphic, electronic, or 

mechanical, including but not limited to photocopying, 

recording, scanning, digitizing, taping, Web distribution, 

information networks, or information storage and retrieval 

systems, except as permitted under Section 107 or 108 of the 

1976 United States Copyright Act, without the prior written 

permission of the publisher except as may be permitted by the 

license terms below. 

For product information and technology assistance, contact us at 

Cengage Learning Customer & Sales Support, 

1-800-354-9706 

For permission to use material from this text or product, submit 

all requests online at www.cengage.com/permissions 

Further permissions questions can be emailed to 

permissionrequest@cengage.com 

Printed in the United States of America 

1 2 3 4 5 6 7 11 10 09 08 07 

ISBN-13: 978-0-495-82797-9 

ISBN-10: 0-495-82797-5 

Brooks/Cole 

20 Channel Center Street 

Boston, MA 02210 

USA 

Cengage Learning is a leading provider of customized 

learning solutions with office locations around the globe, 

including Singapore, the United Kingdom, Australia, 

Mexico, Brazil, and Japan. Locate your local office at: 

international.cengage.com/region 

Cengage Learning products are represented in 

Canada by Nelson Education, Ltd. 

For your course and learning solutions, visit 

academic.cengage.com 

Purchase any of our products at your local college 

store or at our preferred online store 

www.ichapters.com 

NOTE: UNDER NO CIRCUMSTANCES MAY THIS MATERIAL OR ANY PORTION THEREOF BE SOLD, LICENSED, AUCTIONED, 

OR OTHERWISE REDISTRIBUTED EXCEPT AS MAY BE PERMITTED BY THE LICENSE TERMS HEREIN. 

Dear Professor or Other Supplement Recipient: 

Cengage Learning has provided you with this product (the 

“Supplement”) for your review and, to the extent that you adopt 

the associated textbook for use in connection with your course 

(the “Course”), you and your students who purchase the 

textbook may use the Supplement as described below. Cengage 

Learning has established these use limitations in response to 

concerns raised by authors, professors, and other users 

regarding the pedagogical problems stemming from unlimited 

distribution of Supplements. 

Cengage Learning hereby grants you a nontransferable license 

to use the Supplement in connection with the Course, subject to 

the following conditions. The Supplement is for your personal, 

noncommercial use only and may not be reproduced, posted 

electronically or distributed, except that portions of the 

Supplement may be provided to your students IN PRINT FORM 

ONLY in connection with your instruction of the Course, so long 

as such students are advised that they 

READ IMPORTANT LICENSE INFORMATION 

may not copy or distribute any portion of the Supplement to any 

third party. You may not sell, license, auction, or otherwise 

redistribute the Supplement in any form. We ask that you take 

reasonable steps to protect the Supplement from unauthorized 

use, reproduction, or distribution. Your use of the Supplement 

indicates your acceptance of the conditions set forth in this 

Agreement. If you do not accept these conditions, you must 

return the Supplement unused within 30 days of receipt. 

All rights (including without limitation, copyrights, patents, and 

trade secrets) in the Supplement are and will remain the sole and 

exclusive property of Cengage Learning and/or its licensors. The 

Supplement is furnished by Cengage Learning on an “as is” basis 

without any warranties, express or implied. This Agreement will 

be governed by and construed pursuant to the laws of the State 

of New York, without regard to such State’s conflict of law rules. 

Thank you for your assistance in helping to safeguard the integrity 

of the content contained in this Supplement. We trust you find the 

Supplement a useful teaching tool.

CONTENTS 

PREFACE...................................................................................................................................... iv 

MODULE 1: THE BASICS ..........................................................................................................1 

MODULE 2: MORE SAS BASICS..............................................................................................4 

MODULE 3: DATA MANAGEMENT........................................................................................7 

MODULE 4: SAS FUNCTIONS ................................................................................................10 

MODULE 5: DESCRIPTIVE STATISTICS I............................................................................12 

MODULE 6: PROC GCHART...................................................................................................14 

MODULE 7: DESCRIPTIVE STATISTICS II ..........................................................................17 

MODULE 8: GENERATING RANDOM OBSERVATIONS...................................................21 

MODULE 9: X-Y PLOTS ..........................................................................................................23 

MODULE 10: ONE SAMPLE TESTS FOR µ, p.........................................................................26 

MODULE 11: TWO SAMPLE T-TESTS ....................................................................................31 

MODULE 12: ONE-WAY ANOVA ............................................................................................33 

MODULE 13: TWO-WAY ANOVA AND MORE .....................................................................36 

MODULE 14: MODEL CHECKING IN ANOVA ......................................................................38 

MODULE 15: CORRELATIONS ................................................................................................41 

MODULE 16: SIMPLE LINEAR REGRESSION .......................................................................43 

MODULE 17: MODEL CHECKING IN REGRESSION. ...........................................................46 

MODULE 18: MULTIPLE LINEAR REGRESSION..................................................................50 

MODULE 19: MULTIPLE REGRESSION-CHOOSING THE BEST MODEL.........................53 

MODULE 20: TESTS FOR CATEGORICAL DATA.................................................................56 

MODULE 21: NON-PARAMETRIC TESTS ..............................................................................60 

MODULE 22: ANALYSIS OF COVARIANCE..........................................................................62 

MODULE 23: LOGISTIC REGRESSION...................................................................................63 

MODULE 24: MATRIX COMPUTATIONS...............................................................................64 

MODULE 25: MACRO VARIABLES AND PROGRAMS ........................................................66 

iii

PREFACE 

This solutions manual provides the SAS code needed for problems in Learning SAS in the 

Computer Lab, 3 rd Edition. There are many possible ways to write programs that will run and 

generate the desired output. This manual provides one set of solutions. In this manual, SAS 

code will be displayed in a Courier font. 

Parts of problems (a, b, c, and so on) are often related and should be incorporated in one SAS 

program. The solution may have program code common to all parts of the problem listed first, 

followed by code for particular parts listed under a, b, c, and so on. In some cases, more common 

code follows the code for the parts. 

Problems in the early chapters call for label and title statements as well as the use of PROC 

FORMAT. Solutions for later chapters do not include these statements although I recommend 

they be assigned. Students should also be required/strongly encouraged to properly document 

their SAS program with comments. 

There are many different ways to read the data sets included with the manual. I have used 

different formats throughout the solutions manual as examples. Instructors may also wish to 

include some data sets as Microsoft Excel files for the students to read so that students can gain 

experience reading data in this common format. 

In Learning SAS in the Computer Lab, 3 rd Edition, I recommend that SAS code be formatted in 

ways that make the code easy to read and debug. In order to save space, I have not included such 

formatting in the solutions. 

For some problems, answers to the statistical questions are provided. This may help to decide 

which problems to assign. 

iv

1.1 data one; 

input pH time temp; 

datalines; 

4.5 20 125 

4.1 22 133 

4.8 18 149 

4.0 26 120 

5.0 25 120 

6.0 21 138 

; 

run; 

proc print; run; 

1.2 Use the same data step as in 1.1 and then 

proc print; var temp pH; run; 

MODULE 1: THE BASICS 

1.3 data sizes; 

input size $ color $ price shipcost; 


large red 18.97 0.25 

medium blue 24.68 1.10 

x-large black 29.99 1.75 

small orange 15.89 0.90 

; 

run; 

proc print; 

var size color price shipcost; run; 



var color size price; run; 

1.5 data schools; 

input school $ no_teach no_stud; 


granite 5829 200486 

jordan 12433 318992 

davis 2358 126331 

; 

run; 


var school no_teach no_stud; run; 

1

1.6 The input statement in 1.1 changes to 

input pH 1-3 time 5-6 temp 8-10; datalines; 


input @1 pH @5 time @8 temp; 


input size $ 1-7 color $ 9-14 price 16-20 shipcost 23-26; 


input @1 size $7. @9 color $6. @16 price 5.2 @23 shipcost 4.2; 

1.10 data appoint; 

input time $ 1-5 person $ 8-12 where $ 15-27 

subject $ 29-44 length 48-49; 


11:OO Sally room 30 personnel review 45 

1:00 Jim Jim's office brake design 30 

3:00 Nancy lab test results 30 

; 

run; 


var time person where subject length; run ; 


input @1 time $5. @8 person $5. @15 where $12. 

@29 subject $16. @48 length 2.0; 

2

1.12 data popcorn; 

input @1 brand $20. @22 time $4. @27 notpop 3.0; 


Orville Redenbacker 2:15 80 






Smith's 2:15 170 

Smith's 2:15 147 

Smith's 2:30 196 

Smith's 2:30 114 

Smith's 2:45 98 

Smith's 2:45 90 

Pop Secret 2:15 215 






; 

run; 


3

MODULE 2: MORE SAS BASICS 

2.1 a data one; infile 'utility.dat'; 

input @1 month $3. @5 year 2. phone 9-14 fuel 18-22 

elec 25-29; 

if month='Jan' then monthnum=l; 

else if month='Feb' then monthnum=2; 

else if month='Mar' then monthnum=3; 

else if month='Apr' then monthnum=4; 

else if month='May' then monthnum=5; 

else if month='Jun' then monthnum=6; 

else if month='Jul' then monthnum=7; 

else if month='Aug' then monthnum=8; 

else if month='Sep' then monthnum=9; 

else if month='Oct' then monthnum=lO; 

else if month='Nov' then monthnum=ll; 

else if month='Dec' then monthnum=12; 

totalexp = phone + fuel + elec; 

run; 


b Use the same data step as in (a) and then 

proc sort; by year monthnum; run; 

proc print; by year; 

var month phone; run; 

c Use the same data step as in (a) and then 

proc sort; by monthnum year; run; 

proc print; by monthnum; 

var year phone; run; 

d Use the same data step as in (a) and then 

proc print; where year = 92; run; 

e Use the same data step as in (a) and then 

proc sort data = one; by year; 


where month = 'Jan' or month='Feb' or month='Mar'; 

by year; run; 

f Sort by year and month to compare years across months. 

Sort by month and year to compare months across years. 

4

2.2 a data one; infile 'china#l.dat'; 

input year total exports imports; 

deficit = exports - imports; 

run; 


b data two; set one; 

if 1955

2.5 a, b proc format; 

value $ktfmt 'o' = 'Overhand' 'f' = 'Figure8'; 

value rfmt 1 = 'Cotton' 2 = 'Twine' 3 = 'Nylon'; 

value kdfmt 1 = 'Parallel' 2 = 'Perpendicular'; 

run; 

data one; infile 'knots.dat'; 

input Knot_Type $ 4 Rope 7 Knot_Direction 10 Weight 13-15; 

Break_Weight=Weight-162; 

Brk_Wgt_kg=Break_Weight/2.2; 

format Knot_Type $ktfmt. Rope rfmt. Knot_Direction kdfmt.; 

run; 

proc sort; 

by descending Break_Weight; run; 


2.6 proc format; 

value htnfmt 1='Normotensive' 2='IDH' 3='ISH' 4='Hypertension'; 

run; 

data one; infile 'btt.dat'; 

input childid sex bweight gestage momage parity 

mdbp msbp momeduc mmedaid socio 

dbp5 sbp5 ht5 wt5 hdl5 ldl5 trig5 smoke5 medaid5 socio5; 

bmi5 = wt5/(ht5*ht5); 

if msbp >= 140 and mdbp >= 90 then htn = 4; 

else if msbp >= 140 and mdbp < 90 then htn = 3; 

else if msbp < 140 and mdbp >= 90 then htn = 2; 

else if msbp < 140 and mdbp < 90 then htn = 1; 

else if msbp = . or mdbp = . then htn = .; 

format htn htnfmt.; run; 

a data one10; set one; if _n_

MODULE 3: DATA MANAGEMENT 

3.1 data one; infile 'china#l.dat'; 

input year 1-4 total 6-10 exports 12-16 imports 18-22; 

run ; 

/ * It is first necessary to put data in year order before 

computing the change in exports or imports * / 

a proc sort; by year; run; 

data two; set one; 

/ * The next two lines compute change in exports */ 

lastyrex = lag(exports); 

changeex = exports - lastyrex; 

b / * The next two lines compute change in imports */ 

lastyrim = lag(imports); 

changeim = imports - lastyrim; run; 


var year exports lastyrex changeex imports lastyrim changeim; 

run; 

3.2 data utils; infile 'utility.dat'; 

input @1 month $3. 85 year 2.0 phone 9-14 fuel 18-22 elec 25-29; 

if month = 'Jan' then monthnum =l; 

else if month = 'Feb' then monthnum =2; 

else if month = 'Mar' then monthnum =3; 

else if month = 'Apr' then monthnum =4; 

else if month = 'May' then monthnum =5; 

else if month = 'Jun' then monthnum =6; 

else if month = 'Jul' then monthnum =7; 

else if month = 'Aug' then monthnum =8; 

else if month = 'Sep' then monthnum =9; 

else if month = 'Oct' then monthnum =lo; 

else if month = 'Nov' then monthnum =11; 

else if month = 'Dec' then monthnum =12; 

run; 

/* Put data in year month order * / 

proc sort; by year monthnum; run; 

a data year90; set utils; if year = 90; 

lastmonth = lag(phone); 

change = phone - lastmonth; run; 

proc print; var year month phone lastmonth change; run; 

b data winter; set utils; if month = 'Jan'; 

lastyr = lag(fue1); 

change = fuel - lastyr; run; 

proc print; var month year fuel lastyr change; run; 

7

3.4 data DH; 

input flavor $ 1-10 height; brand = 'DH'; 


DevilsFood 39.0 


White 30.5 

White 34.5 

Yellow 37.0 

Yellow 35.0 

; 

run; 

data BC; 

input flavor $ 1-10 height; brand = 'BC'; 


Yellow 35.5 

Yellow 36.0 



White 32.5 

White 32.5 

; 

run; 

a * Concatenate the two data sets ; 

data Cake; set DH BC; 

file 'Module3-4a.dat'; 

put flavor $ 1-10 brand $ 12-13 height 15-18 .1; run; 

b * Reformulate Duncan Hines data for match merging ; 

data DH1; set dh; 

dhht = height; keep flavor dhht; run; 

proc sort; by flavor; run; 

* Reformulate Betty Crocker data for match merging; 

data BC1; set BC; 

bcht = height; keep flavor bcht; run; 

proc sort; by flavor; run; 

data Cake1; merge dh1 bc1; by flavor; 

file 'Module3-4b.dat'; 

put flavor $ 1-10 dhht 12-16 .1 bcht 18-22 .1; run; 

3.5 data ml_first25; infile 'moonlake.dat' obs=25; 

input propane 1 naturalgas 2 eeproducts 3 sshacwhs 4 ewrs 5 

remr 6 garbage 7 tagto 8 internet 9 hss 10 New 12 OneBill 14 

NG 15 Elec 16 PG 17 FuelOil 18 Wood 19 Coal 20 Solar 21 Source 22 

AgeHeat 24 TypeWater 25 Agewater 26 HowLng 27 PCHome 34 PCPlan 35 

Internet 36 Provider 37 Age 40 Educ 41 Income 42 sex 43; run; 


8

3.6 data ml_26_50; 

infile 'moonlake.dat' firstobs=26 obs=50; 

Input statement as in 3.5. 

data ml_251_300; 

infile 'moonlake.dat' firstobs=251 obs = 300; 

Input statement as in 3.5. 

data ml2; 

set ML_26_50 ML_251_300; run; 

proc print data = ml2; run; 

9

MODULE 4: SAS FUNCTIONS 

4.1 data well; infile 'well#l.dat'; 

input @1 date $8. nitrate zinc TDS; 

month = substr(date,l,3); 

day = substr(date,4,2); 

run; 


4.2 data one; input value; 

posval = abs(value); 

root = sqrt(posva1); 

newval = sqrt(abs(va1ue)) ; 


2.7 

-6.9 

3.4 

0.5 

1.3 

; run; 


4.3 data one; input x; 

a cumprob = probbnm1(0.23,13,x); 

b greater = 1 - cumprob; 

c if x = 0 then lessprob = .; 

else lessprob = probbnm1(.23,13,x-1); 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

; run; 


10

4.4 data binomial; input x; n = 5; p = 0.40; 

a cdf=probbnml(p, n, x); 

b pdf=cdf-lag(cdf); 

if x = 0 then pdf = cdf; 


0 

1 

2 

3 

4 

5 

; 

run; 


4.5 data norm; mu=12.6; sigma=2.3; 

x = 10; 

z=(x-mu)/sigma; 

x1 = 15; 

z1=(x1-mu)/sigma; 

x2 = 7.6; 

z2=(x2-mu)/sigma; 

a prob_a=probnorm(z); 

b prob_b=probnorm(z1)-probnorm(z2); 

run; 


11

MODULE 5: DESCRIPTIVE STATISTICS I 


input @1 date $6. @5 year 2.0 phone 9-14 fuel 17-22 elec 25-29; 

total = phone + fuel + elec; 

label phone = 'phone costs' 

fuel = 'fuel costs' 

elec = 'electricity costs' 

total = 'total utility costs'; run; 

proc univariate plot; var phone fuel elec total; 

id date; title 'Descriptive Stats for utility Costs'; 

run; 

Extreme phone costs: Low--Jan92, Jan89, Dec91, Oct92, Jan93. 

High—May90, Jan91, Apr90, Jan90, Jun90. No outliers. 

Extreme fuel costs: Low--Jul92, Ju190, Aug90, Ju189, Aug89. 

High--Jan92, Feb89, Jan89, Feb93, Jan92. No outliers. 

Extreme elec costs: Low--Jun92, Sep91, Mar92, Apr90, Mar90. 

High--Jun89, Nov88, Jan89, Oct88, Dec88. Dec88 is an outlier. 

Extreme total costs: Low--Sep92, Aug92, Oct92, Aug91, May92. 

High--Jan89, Feb91, Dec88, Jan90, Jan91. No outliers. 

5.2 Use data step as in 5.1 and then 

proc sort; by year; run; 

proc univariate; by year; var total; id date; 

title 'Total utility Costs for each Year'; run; 

5.3 proc format; value lsfmt 1 = "Athletic" 2 = "Sedentary"; run; 

data athlete; infile 'athlete.dat'; 

input sbp 1-3 dbp 6-7 sex $ 10 ls 13; 

label sbp = 'Systolic Blood Pressure' 

dbp = 'Diastolic Blood Pressure' ls = 'Lifestyle'; 

format ls lsfmt.; run; 

proc sort data = athlete; by sex ls; run; 

* Compare bp's among the 4 sex by lifestyle groups ; 

a proc univariate plots; var dbp; by sex ls; 

title 'Description of diastolic bp by sex and lifestyle'; 

run; 

b proc univariate plots normal; var sbp; 

probplot sbp / normal(mu = est sigma = est); 

title 'Checking whether sbp is normal'; run; 

12

5.4 data one; infile 'china#l.dat'; 

input year 1-4 total 6-10 exports 12-16 imports 18-22; 

deficit = imports - exports; run; 

proc univariate plot; 

var imports exports deficit; 

id year; title 'Statistics on China''s Trade'; run; 


value bfmt 1 = 'Duracell' 2 = 'Energizer' 3 = 'Rayovac' 

4 = 'Radio Shack'; run; 

data one; infile 'battery.dat'; 

input brand 1 load 4-6 time 9-11; 

label brand = 'Battery Brand' 

time = 'Time to discharge'; 

format brand bfmt.; run; 

proc boxplot; 

plot time*brand / boxstyle=schematic cboxes = black; 

title 'Comparing discharge times among battery brands'; run; 

5.6 data park; infile 'parking.dat'; 

input id miles; if miles = 99 then miles = .; 

label miles = 'Distance live from campus'; run; 

proc univariate plot; var miles; id id; 

title 'Descriptive statistics of distance live from campus'; run; 

5.7 data quarterback; infile 'quarterback.dat'; 

input player $ 5-22 rating 101-105; 

label rating = 'Quarterback rating'; run; 

proc univariate plot; var rating; id player; 

title 'Descriptive statistics of quarterback ratings'; run; 


value sfmt 1 = 'Male' 2 = 'Female'; run; 

data btt; infile 'btt.dat'; 

input childid 1-4 sex 6 bweight 8-11 gestage 13-14; 

label bweight = 'Birth weight' 

gestage = 'Gestational age'; 

format sex sfmt.; run; 

proc sort; by sex; run; 

proc univariate plot; 

var bweight gestage; 

id childid; 

by sex; 

title 'Statistics for birth weight and gestaional age by sex'; 

run; 

13

MODULE 6: PROC GCHART 

6.1 data one; infile 'utility.dat'; 

input @1 date $char6. @5 year 2.0 phone fuel elec; 

total = phone + fuel + elec; 

label phone = 'phone costs' 

fuel = 'fuel costs' 

elec = 'electricity costs' 

total = 'total utility costs'; run; 

proc gchart; 

vbar phone fuel elec total / space = 0; 

title 'Histograms of utility costs'; run; 

The distributions are right skewed. 


data two; set one; if 90


value $sexfmt 'F'='Female' 'M'='Male'; run; 

data run; 

infile 'running.dat'; 

input class sex $ @5 minute1 1.0 @7 second1 2.0 

@10 minute2 1.0 @12 second2 2.0; 

time1 = minute1*60 + second1; 

time2 = minute2*60 + second2; 

label class = 'Grade in School' 

time1 = 'Running Time for First Race' 

time2 = 'Running Time for Second Race'; 

format sex sexfmt.; run; 

goptions htext = 2; 

proc gchart data = run; 

vbar time1 / space = 0 width = 10 midpoints = 70 to 140 by 10; 

vbar time2 / space = 0 width = 10 midpoints = 70 to 130 by 10; 

run; 


value sfmt 1 = 'Natural Gas' 2 = 'Electricity' 3 = 'Propane Gas' 

4 = ' ' 5 = 'Wood' 6 = 'Coal'; 

value incfmt 1='=$75,000' 6='Refuse'; run; 

data ml; 

infile 'moonlake.dat'; 

input propane 1 Source 22 Income 42; 

label propane = "Interest in purchasing propane (1=Not, 5=Very)" 

Source = "Primary Energy Source for Heat" 

Income = "Annual Household Income"; 

format Source sfmt. Income incfmt.; run; 

proc gchart data = ml; 

* The bars for source ordered from highest to lowest; 

hbar source / midpoints = 1 3 2 5 6 ; 

hbar propane / midpoints = 1 to 6 by 1; 

hbar income / midpoints = 1 to 6 by 1; run; 

15


value fsfmt 0 = 'Student' 1 = 'Faculty/Staff'; 

value usrn 1='Usually' 2='Sometimes' 3='Rarely' 4='Never'; 

run; 

data park; 

infile 'parking.dat'; 

input id miles bus_convenient carpool years status bus Monday 

Tuesday Wednesday Thursday Friday drive permit meters lots; 

if id = 400 then fac_staff = 0; 

if years = 99 then years = .; if bus = 99 then bus = .; 

if Monday = 99 then Monday = .; 

if Tuesday = 99 then Tuesday = .; 

if Wednesday = 99 then Wednesday = .; 

if Thursday = 99 then Thursday = .; 

if Friday = 99 then Friday = .; 

busdays = Monday + Tuesday + Wednesday + Thursday + Friday; 

if bus = 2 then busdays = 0; 

if lots = 99 then lots = .; 

format fac_staff fsfmt. bus yn. lots usrn.; run; 

proc gchart data = park; 

a hbar years / space = 0 width = 6 midpoints = 1 to 7 by 1; 

c vbar busdays / 

space = 0 width = 10 midpoints = 0 to 5 by 1; run; 

b proc sort data = park; by fac_staff bus; run; 

proc gchart data = park; 

hbar lots / midpoints = 1 to 4 by 1; 

by fac_staff bus; run; 


value mefmt 1 = '= HS'; run; 


input childid 1-4 momeduc 29 socio 33 socio5 73; 

format momeduc mefmt.; run; 


a proc gchart data = btt; 

vbar momeduc / midpoints = 1 to 4 by 1; run; 


b proc gchart data = btt; 

hbar socio socio5 / midpoints = 0 to 4 by 1; run; 

16

MODULE 7: DESCRIPTIVE STATISTICS II 

7.1 data ml; infile 'moonlake.dat'; 

input propane 1 naturalgas 2 eeproducts 3 sshacwhs 4 ewrs 5 

remr 6 garbage 7 tagto 8 internet 9 hss 10; run; 

data omitmissing; set ml; 

if propane = 6 then propane = .; 

if naturalgas = 6 then naturalgas = .; 

if eeproducts = 6 then eeproducts = .; 

if sshacwhs = 6 then sshacwhs = .; 

if ewrs = 6 then ewrs = .; 

if remr = 6 then remr = .; 

if garbage = 6 then garbage = .; 

if tagto = 6 then tagto = .; 

if internet = 6 then internet = .; 

if hss = 6 then hss = .; run; 

proc means data = omitmissing; 

var propane naturalgas eeproducts sshacwhs ewrs remr garbage 

tagto internet hss; run; 



data running; infile 'running.dat'; 

input class 1 sex $ 3 min1 5 sec1 7-8 min2 10 sec2 12-13; 

time1=min1*60+sec1; 


label time1 = 'Time for Race 1' 

time2 = 'Time for Race 2'; 


proc means data = running; 

class sex class; var time1; run; 


value lsfmt 1 = "Athletic" 2 = "Sedentary"; run; 




dbp = 'Diastolic Blood Pressure' 

ls = 'Lifestyle'; 


proc means data = athlete; 

class ls; var sbp dbp; run; 

7.4 data golf; infile 'golf.dat'; 

input Golfer 1 Compression 3-5 Material 8 Distance ; run; 

proc means data = golf; 

class Golfer; var distance; run; 

17


value contfmt 0 = 'Not Contaminated' 1 = 'Contaminated'; run; 

data well; infile 'well#1.dat'; 

input date $ 1-5 month $ 1-3 day 4-5 year 7-8 nitrate 11-15 .3 

zinc 18-22 .3 TDS 25-27; 

if (nitrate > 0.12) or (zinc > 0.02) or (TDS > 516) then 

contaminate = 1; 

else contaminate = 0; 

format contaminate contfmt.; run; 

a proc freq; 

table contaminate; 

b table contaminate*year; run; 

All of the data in 1990 is contaminated. In 1991, half is contaminated. 


value outfmt 0 = 'Failure' 1 = 'Success'; run; 

data one; 

infile 'survresp.dat'; 

input incentive n_cont n_treat r_cont r_treat; 

label n_cont = 'Sample size for control group' 

n_treat = 'Sample size for treatment group' 

r_cont = 'Response rate for control group' 

r_treat = 'Response rate for treatment group'; 

if r_cont < r_treat then outcome = 1; 

else outcome = 0; 

format outcome outfmt.; run; 

proc freq; 

tables outcome outcome*incentive; run; 

18


value bgfmt 0 = 'Bad' 1 = 'Good'; run; 

data skin; infile 'sclero.dat'; 

input clinic id drug thickl thick2 mobill mobil2 assessl assess2; 

if thickl > thick2 then r_thick = 1; else r_thick = 0 ; 

if mobill < mobil2 then r_mobil = 1; else r_mobil = 0; 

if assessl > assess2 then r_assess = 1; else r_assess = 0; 

label r_thick = 'Skin thickening improvement' 

r_mobil = 'Skin mobility improvement' 

r_assess = 'Patient assessment improvement'; 

format r_thick bgfmt. r_mobil bgfmt. r_assess bgfmt. ; run; 

a proc freq; tables clinic; run; 

Clinics #46 and #49 had the largest number of patients in the study. 

b proc freq; 

tables drug*clinic; run; 

c proc freq data = skin; 

where clinic = 46 or clinic = 48 or clinic = 49; 

tables clinic*drug*(r_thick r_mobil r_assess); run; 

d proc freq data = skin; 

where drug = 1; 

tables r_thick*r_assess; run; 

34.38%, 21.88% 

19


value sfmt 1 = 'Natural Gas' 2 = 'Electricity' 3 = 'Propane Gas' 

4 = ' ' 5 = 'Wood' 6 = 'Coal'; 

value agefmt 1='18-34' 2='35-49' 3='50-64' 4='>=65' 5='Refuse'; 

run; 

data moonlake; infile 'moonlake.dat'; 

input propane 1 internet 9 NG 15 Elec 16 PG 17 FuelOil 18 Wood 19 

Coal 20 Solar 21 Source 22 Internet 36 Age 40; 

label propane = "Interest in purchasing propane (1=Not, 5=Very)" 

Source = "Primary Energy Source for Heat" 

format Solar availfmt. Source sfmt. age agefmt.; run; 

proc freq data = moonlake; 

a table Source ; 

b table NG Elec PG FuelOil Wood Coal Solar; 

c table NG*Propane; 

d table Internet*age; run; 

e proc freq data = moonlake; 

where PCHome = 1; 

table Internet*age; run; 


value mefmt 1 = '= HS'; run; 


input momeduc 29 socio 33 socio5 73; 

format momeduc mefmt.; run; 

proc freq data = btt; 

table socio*socio5 socio*momeduc; run; 

20

MODULE 8: GENERATING RANDOM OBSERVATIONS 

8.1 data one; do i=1 to 1000; obs = rannor(4241)*20 + 50; 

output; end; run; 

proc gchart data = one; 

vbar obs / space = 0; 

title 'Random samples from N(50,400)'; run; 

8.2 a data one; do i=1 to 50; obs = rannor(70776)*10 + 10; 


proc gchart; 


title 'Random sample of 50 obs of N(10,lOO)'; run; 

b data two; do i=1 to 500; obs = rannor(70776)*10 + 10; 


proc gchart; 



c data three; do i=1 to 5000; obs = rannor(70776)*10 + 10; 


proc gchart; 



8.3 data exp; do i = 1 to 1000; x = ranexp(6664)/7; 


proc gchart; 

vbar x / space = 0 width = 6; 

title 'An exponential distribution with lambda=7'; run; 

8.4 data poisson; do i = 1 to 700; y = ranpoi(9001, 5); 


proc gchart; 

vbar y / space = 0; 

title 'A Poisson distribution with mean=5'; run; 

8.5 data bin; do i = 1 to 500; xval = ranbin(2721, 40, 0.2); 


proc gchart; 

vbar xval / space = 0 midpoints = 0 to 20 by 1; 

title 'A Binomial distribution with n=40 and p=0.2'; run; 

21

8.6 data new; do i = 1 to 1000; 

x1 = ranexp(434911)/7; 

x2 = ranexp(434911)/7; 

x3 = ranexp(434911)/7; 

x4 = ranexp(434911)/7; 

x5 = ranexp(434911)/7; 

x6 = ranexp(434911)/7; 

x7 = ranexp(434911)/7; 

x8 = ranexp(434911)/7; 

x9 = ranexp(434911)/7; 

x10= ranexp(434911)/7; 

average = (x1+x2+x3+x4+x5+x6+x7+x8+x9+x10)/10; output; end; run; 

proc gchart; 

vbar average / space = 0 width = 6; 

title 'Distribution of average of exponential r.v.''s '; run; 

8.7 data uniform; do i = 1 to 1000; 

val1= ranuni(887890)*10 + 10; 

va12= ranuni(887890)*10 + 10; 

va13= ranuni(887890)*10 + 10; 

va14= ranuni(887890)*10 + 10; 

va15= ranuni(887890)*10 + 10; 

va16= ranuni(887890)*10 + 10 ; 

va17= ranuni(887890)*10 + 10; 

va18= ranuni(887890)*10 + 10; 

va19= ranuni(887890)*10 + 10; 

vall0=ranuni(887890)*10 + 10; 

ave = (val1+va12+va13+va14+va15+va16+va17+va18+va19+vall0)/10; 


proc gchart; 

vbar ave / space = 0 width = 6; 

title 'Distribution of average of a Uniform r.v. on (10,20)'; 

run; 

22

MODULE 9: X-Y PLOTS 

9.1 data utility; infile 'utility.dat'; 

input month $ 1-3 year 5-6 phone 9-15 fuel 17-22 elec 25-29; 

total=phone + fuel + elec; 

if month = 'Jan' then mnth = 1; 

else if month = 'Feb' then mnth = 2; 

else if month = 'Mar' then mnth = 3; 

else if month = 'Apr' then mnth = 4; 

else if month = 'May' then mnth = 5; 

else if month = 'Jun' then mnth = 6; 

else if month = 'Jul' then mnth = 7; 

else if month = 'Aug' then mnth = 8; 

else if month = 'Sep' then mnth = 9; 

else if month = 'Oct' then mnth = 10; 

else if month = 'Nov' then mnth = 11; 

else mnth = 12; 

if 89

9.3 data well; infile 'well#8.dat'; 

input @1 month $3. @4 day 2. @7 year 2. zinc; 

if month = 'Jan' then mo = 1; 

else if month = 'Feb' then mo = 2; 

else if month = 'Mar' then mo = 3; 

else if month = 'Apr' then mo = 4; 

else if month = 'May' then mo = 5; 

else if month = 'Jun' then mo = 6; 

else if month = 'Jul' then mo = 7; 

else if month = 'Aug' then mo = 8; 

else if month = 'Sep' then mo = 9; 

else if month = 'Oct' then mo = 10; 

else if month = 'Nov' then mo = 11; 

else if month = 'Dec' then mo = 12; 

format date date7. ; 

date = mdy (mo, day, year) ; run; 

proc sort; by year mo day; run; 

goptions csymbol = black; 

symbol1 value = dot i = join; 

proc gplot; by year; 

plot zinc*date; 

title 'Zinc concentrations over time'; run; 

9.4 data one; infile 'handinj.dat'; 

input id $ type $ dayslost cost; 

label dayslost = 'Days of work lost' 

cost = 'Cost in Irish pounds'; run; 

goptions csymbol = black htext = 2; 

symbol1 value = dot; 

proc gplot; 

plot dayslost*cost; 

title 'Lost work days vs. cost'; run; 

9.5 data two; infile 'survresp.dat'; 

input incentive n_cont n_treat r_cont r_treat; 

improve =(r_treat - r_cont)/r_cont; 

label n_cont = 'Sample size for control group' 

n_treat = 'Sample size for treatment group' 

r_cont = 'Response for control group' 

r_treat = 'Response for treatment group' 

improve = 'Improvement in response rate'; run; 



proc gplot; 

plot improve*incentive; 

title 'Improvement in response vs. types of incentive'; 

run; 

24

9.6 data athlete; infile 'athlete.dat'; 



dbp = 'Diastolic Blood Pressure'; 

run; 



proc gplot; 

plot sbp*dbp; 

title 'Plot of systolic vs. diastolic blood pressure'; run; 

9.7 data injury; infile 'injury.dat'; 

input year 1-4 burns 6-10 amputations 12-16; run; 

goptions csymbol = black; 

symbol1 value = dot i = join; 

symbol2 value = star i = join line = 2; 

axis1 label = ('Injuries'); 

legend1 label = (H = 1.5 cell) value = (H = 1.5 cell); 

proc gplot; 

plot burns*year=1 amputations*year=2 / 

overlay vaxis=axis1 legend=legend1; 

title 'Plot of burns and amputations by year'; run; 


value $efmt 's' = 'Southern' 'n' = 'Northern'; run; 

data trees; infile 'trees.dat'; 

input location $ 1 elevation 3-6 damage 8-9; 

format location $efmt.; run; 


symbol1 value = 'n'; 

symbol2 value = 's'; 

proc gplot data = trees; 

plot damage*elevation = location; run; 


input YdsPerGame 63-67 TD 70-71 Int 74-75 Rating 101-105; run; 



proc gplot; 

plot Rating*(YdsPerGame TD Int); run; 

25

MODULE 10: ONE SAMPLE TESTS FOR μ, p 

10.1 data well; infile 'well#1.dat'; 

input @11 nitrate 4. @18 zinc 5. @25 tds 3.; 

testnitr = nitrate - 0.1; testzinc = zinc - 0.01; 

testtds = tds - 475; run; 

proc means n mean std t probt; 

var testnitr testzinc testtds; run; 

a p-value = 0.22725 

b p-value = 0.05735 

c p-value = 0.0001 


input @9 phone 6. @17 fuel 6. @25 elec 5.; 

testphone = phone - 50; 

testelec = elec - 30; run; 

proc means n mean std t probt; 

var testphone testelec; run; 

a p-value < 0.00005 

b p-value = 0.0498 

10.3 data running; infile 'running.dat'; 




testt1_78=time1-78; 

testt2_95=time2-95; 



run; 

proc means data = running n mean std t probt; 

where sex = 'F'; 

var testt1_78 testt2_95; run; 

a p-value = 0.0217 

b p-value = 0.03435 

26

10.4 data debate; infile 'debate.dat'; 

input id school gender compare argue research reason speak; 

if compare = 1 then debate_more =1; 

else debate_more =0; 

if compare = . then debate_more = .; 

if argue = 1 then argue_very =1; 

else argue_very =0; 

if argue = . then argue_very = .; 

if research = 1 then research_very =1; 

else research_very =0; 

if research = . then research_very = .; 

if reason = 1 then reason_very =1; 

else reason_very =0; 

if reason = . then reason_very = .; 

if speak = 1 then speak_very =1; 

else speak_very =0; 

if speak = . then speak_very = .; run; 

proc freq; 

a tables debate_more / chisq testp = (0.25, 0.75); 

c tables argue_very / chisq testp = (0.2, 0.8); 

e tables research_very / chisq testp = (0.25, 0.75); 

f tables reason_very / chisq testp = (0.05, 0.95); run; 

a) pˆ = 0.771, p-value = 0.3857/2 = 0.19285. 

c) pˆ = 0.853, p-value = 0.0187. 

e) pˆ = 0.722, p-value = 0.2564/2 = 0.1282. 

f) pˆ = 0.893, p-value < 0.0001. 

b proc freq; where school = 8; 

tables debate_more / chisq testp = (0.25, 0.75); run; 

d proc freq; where gender = 1; 

tables argue_very / chisq testp = (0.2, 0.8); run; 

g proc freq; where gender=2 and school=9; 

tables speak_very / chisq testp = (0.25, 0.75); run; 

b) pˆ = 0.887, p-value= 0.0127/2 = 0.00635. 

d) pˆ = 0.881, p-value=0.0058. 

g) pˆ = 0.708, p-value = 0.6374/2 = 0.3187. 

27

10.5 data src; infile 'src.dat'; 

input @8 environ 2. @18 plant_an 2. @30 employ 2. @55 libcon 1.; 

if environ in (8, 9, 10) then env_strong = 1; 

else if 1

10.7 data bball; 

input baskets; 


12 

8 

11 

10 

12 

6 

10 

14 

12 

8 

12 

12 

6 

8 

12 

15 

13 

9 

11 

10 

; 

run; 

proc means data = bball n mean std clm; 

var baskets; run; 

Note: The clm option tells proc means to compute a confidence interval for the mean. 

29




input childid 1-4 sex 6 bweight 8-11 gestage 13-14; 

testgage=gestage - 266/7; 

if bweight < 2500 then low_wt = 1; 

else low_wt = 0; 

if bweight = . then low_wt = .; 

testbwgt = bweight - 3332; 



a tables sex / chisq testp = (0.5, 0.5); 

c tables low_wt / chisq testp = (0.918, 0.082); run; 

b, d proc means data = btt n mean std t probt; 

var testgage testbwgt; run; 

a) Girls = 0.475, p-value = 0.4552. 

b) t = 1.96, p-value = 0.0518. 

c) = 0.055, p-value = 0.1517. 

d) t = −5.29, p-value is

MODULE 11: TWO SAMPLE T-TESTS 

11.1 data lens; infile 'cataract.dat'; 

input type $ astig; run; 

proc ttest; class type; var astig; run; 

Variances unequal: t=−2.00, p-value=0.0724. 

11.2 data gas; infile 'gas.dat'; 

input @43 trans $1. @45 mileage 4.; run; 

proc ttest; class trans; var mileage; run; 

Variances unequal: t=4.03, p-value=0.0051/2 = 0.00255. 

11.3 data grades; infile 'grades.dat'; 

input @5 gender $1. @25 final 3.0; run; 

proc ttest; class gender; var final; run; 

Variances equal: t=1.00, p-value=0.3229. 

11.4 data hands; infile 'handinj.dat'; 

input @7 type $5. @13 days 2.0 @16 cost 4.0; run; 

proc ttest; class type; var days cost; run; 

a Variances unequal: t=−1.08, p-value=0.2904. 

b Variances unequal: t=−0.68, p-value=0.5039. 

11.5 data src; infile 'src.dat'; 

input @6 gender $1. @8 environ 2.0; run; 

proc ttest; class gender; var environ; run; 

Variances equal: t=−0.33, p-value=0.7391. 

11.6 data robots; infile 'robot.dat'; 

input put_humn put_robt qul_humn qul_robt; 

put_diff = put_humn - put_robt; 

qul_diff = qul_humn - qul_robt; run; 

proc means n mean std t prt; 

var put_diff qul_diff; run; 

a Paired t-test: t=−2.63, p-value=0.0340. 

b Paired t-test: t=−1.96, p-value=0.0914. 

31










proc ttest data = running; 

class sex; var time1 time2; run; 

Time1: Variances unequal: t=2.31, p-value=0.0411. 

Time2: Variances equal: t=2.33, p-value=0.0336. 


value lsfmt 1 = "Athletic" 2 = "Sedentary"; run; 







proc ttest data = athlete; 

class ls; var dbp sbp; run; 

DBP: Variances equal: t=−2.02, p-value=0.0503. 

SBP: Variances unequal: t=−5.75, p-value

MODULE 12: ONE-WAY ANOVA 

12.1 data one; infile 'taillite.dat'; 

input @13 zone 2. @4 truck 1. @17 response 3. @7 group 1.; run ; 

a data zone30; set one; if zone = 30 and group = 1; run; 

proc glm; class truck; 

model response = truck; 

means truck / tukey lines; run; 

F=16.4, p-value

12.4 data airplanes; infile 'airplanes.dat' delimiter = ','; 

input design $ paper $ hang_time; run; 

a proc anova data = airplanes; class design; 

model hang_time = design; 

means design / snk lines ; run; 

F = 9.91, p-value=0.0004, design groups: glide vs. dart, sonic. 

b proc anova data = airplanes; class paper; 

model hang_time = paper; 

means paper / snk lines ; run; 

F = 1.29, p-value=0.2954. 

12.5 data popcorn; input @1 brand $20. @22 time $4. @27 notpop 3.0; 








Smith's 2:15 170 

Smith's 2:15 147 

Smith's 2:30 196 

Smith's 2:30 114 

Smith's 2:45 98 

Smith's 2:45 90 







; 

run; 

proc anova data = popcorn; 

class brand; 

model notpop = brand; 

means brand / tukey; run; 

F = 4.30, p-value=0.0334, Smith’s = Pop Secret, Pop Secret = Orville Redenbacker 

34


value bfmt 1 = 'Duracell' 2 = 'Energizer' 

3 = 'Rayovac' 4 = 'Radio Shack'; run; 

data battery; infile 'battery.dat'; 

input Brand 1 load 4-6 time 9-11; 

format Brand bfmt.; run; 

a proc anova data = battery; 

class brand; 

model time = brand; 

means brand; run; 

F = 0.02,p-value=0.9950. 

b proc anova data = battery; 

class load; 

model time = load; 

means load / snk; run; 

F=996.24, p-value

MODULE 13: TWO-WAY ANOVA AND MORE 


input @4 type 1. @7 group 1. @I0 position 1. @13 zone 2. 

@17 response 3. @23 follow 2.; run; 

a proc glm; class group type; 

model response = group type group*type; 

means group type / tukey lines; run; 

Group and type are significant. The interaction is not. 

Type groupings: 4 vs. 3, 1, 2. Group: F=4.63, p-value=0.0317. 

Type: F=9.38, p-value

13.4 data calls; infile 'calls.dat'; 

input week shift day $ number; run; 

proc glm; class shift day; 

model number = shift day shift*day; 

means shift day; run; 

Model: F=0.95, p-value=0.5087. 

Shift: F=2.26, p-value=0.1080; 

Day: F=1.00, p-value=0.4119. 

SxD: F=0.60, p-value=0.7788. 







proc glm data = battery; class load brand; 

model time = load brand load*brand; run; 

No interaction: p = 0.8208; 

No effect in Brand on time: p = 0.1117; 

Significant effect of load on time: P < 0.0001; 

13.6 data airplanes; infile 'airplanes.dat' delimiter = ','; 

input design $ paper $ hang_time; run; 

proc glm data = airplanes; class paper design; 

model hang_time = paper design paper*design; run; 

Significant interaction: p = 0.0149. 

13.10 data btt; infile 'btt.dat'; 

input childid 1-4 sex 6 msbp 25-27 mmedaid 31 socio 33; run; 

proc glm data = btt; class mmedaid socio; 

model msbp = socio mmedaid socio*mmedaid; means socio*mmedaid; 

run; 

socio: F = 2.28, p-value=0.0623. 

mmedaid: F = 6.05, p-value=0.0147. 

socio*mmedaid: F = 5.94, p-value=0.0156. 

37

MODULE 14: MODEL CHECKING IN ANOVA 


input @4 type 1. @7 group 1. @I0 position 1. @13 zone 2. 

@17 response 3. @23 follow 2.; run; 

a proc glm; class group type; 

model response = group type group*type; 

output out=new p=yhat student = sresid; run; 

goptions csymbol = black htext = 2; symbol1 value = dot; 

proc gplot data = new; 

plot sresid*yhat / vref = 0; plot sresid*group / vref = 0; 

plot sresid*type / vref = 0; run; 

proc univariate data = new normal; var sresid; 

probplot sresid / normal (mu = est sigma = est) square; 

run; 

b proc glm; class group zone; 

model response = group zone group*zone; 





plot sresid*zone / vref = 0; run; 



run; 

c proc glm; class group position; 

model response = group position group*position; 





plot sresid*position / vref = 0; run; 



run; 

14.2 data one; infile 'brownie.dat'; input day pan $ mix $ width; run; 

proc glm; class pan mix; 

model width = pan mix pan*mix; 




plot sresid*yhat / vref = 0; 

plot sresid*pan / vref = 0; 

plot sresid*mix / vref = 0; run; 


probplot sresid / normal (mu = est sigma = est) square; run; 

38

14.3 data wear; infile 'wear.dat'; 

input grit $ 1-5 cut wear; run; 

proc glm; 

class grit cut; 

model wear = grit cut grit*cut; 






plot sresid*grit / vref = 0; 

plot sresid*cut / vref = 0; run; 



14.4 data calls; infile 'calls.dat'; 

input week shift day $ number; run; 

proc glm; 

class shift day; 

model number = shift day shift*day; 






plot sresid*shift / vref = 0; 

plot sresid*day / vref = 0; run; 









proc glm data = battery; 

class load brand; 

model time = load brand load*brand; 






plot sresid*load / vref = 0; 

plot sresid*brand / vref = 0; run; 



39


input childid 1-4 sex 6 msbp 25-27 mmedaid 31 socio 33; run; 

proc glm data = btt; 

class mmedaid socio; 

model msbp = socio mmedaid socio*mmedaid; 

means socio*mmedaid; 






plot sresid*socio / vref = 0; 

plot sresid*mmedaid / vref = 0; run; 



14.11 proc format; value sfmt 1 = 'Male' 2 = 'Female'; run; 


input childid 1-4 sex 6 bweight 8-11 momeduc 29; 



class momeduc; 

model bweight = momeduc; 

means momeduc / hovtest = levene; 






plot sresid*momeduc / vref = 0; run; 



Constant variance assumption OK: p-value = 0.3738. 

Normality of residuals OK: p-value = 0.5631. 

Plots all look OK. 

40

MODULE 15: CORRELATIONS 

15.1 data one; infile 'electric.dat'; 

input house income air index number load; run; 

proc corr; 

var house number index income; run; 

a No, p-value


input @9 phone 6. @17 fuel 6. @25 elec 5.; run; 

proc corr; 

var phone fuel elec; run; 

Fuel and electricity. 

15.6 proc format; value $sexfmt 'F'='Female' 'M'='Male'; run; 








proc corr data = running; 

var time1 time2; run; 

proc sort data = running; 

by sex class; run; 

proc corr data = running; 

var time1 time2; by sex class; run; 


input Rank 1-2 Player $ 5-22 Team $ 25-27 Comp 30-32 Att 35-37 

Pct 40-43 AttPerGame 46-49 Yds 52-55 Avg 58-60 YdsPerGame 63-67 

TD 70-71 Int 74-75 FirstDown 77-80 FirstDownPct 83-86 

Over20 89-90 Over40 93-94 Sack 97-98 Rating 101-105; run; 

proc corr data = quarterback; 

var rating comp pct yds int sack; run; 

Percent completed has the highest correlation with quarterback rating followed by total 

yards and number of completions. 

42

MODULE 16: SIMPLE LINEAR REGRESSION 

16.1 goptions csymbol = black htext = 2; symbol1 value = dot; 

data one; infile bonescor.dat'; 

input index ccratio csi width score pct; run; 

proc reg; 

model score = pct; 

plot score*pct; run; 

a Yhat = 4.845 + .0253x, R 2 =0.0864. 

c Bone score and % young normal do not appear to be linearly related. 


input house income air applindx number peakload; run; 

a proc reg; 

model peakload = air; 

plot peakload*air; run; 

Yhat = 2.265 + 0.742x, R 2 =0.8598. 

b proc reg; 

model peakload = applindx; 

plot peakload*applindx; run; 

Yhat=-0.729 + 0.947x, R 2 =0.7851. 

c proc reg; 

model peakload = number; 

plot peakload*number; run; 

Yhat=4.809 - 0.0581x, R 2 =0.0045. 

43

16.3 data one; infile 'gas.dat'; 

input disp power torque ratio axle barrel speed clen 

cwid cwt trans mileage; run; 

a proc reg; 

model power = disp; 

plot power*disp ; run; 

Yhat = 33.5 + 0.362x, R 2 =0.8848. 

b proc reg; 

model torque = disp; 

plot torque*disp ; run; 

Yhat = 15.48 + .7085x, R 2 =0.9793. 

c proc reg; 

model torque= power; 

plot torque*power ; run; 

Yhat = -27.835 + 1.794x, R 2 =0.9300. 

d proc reg; 

model mileage = disp; 

plot mileage*disp ; run; 

Yhat=33.49 - 0.0471x, R 2 =0.7601. 

e proc reg; 

model mileage = torque; 

plot mileage*torque ; run; 

Yhat=33.996 - 0.064x, R 2 =0.7214. 

f proc reg; 

model mileage = power; 

plot mileage*power; run; 

Yhat=35.35 - 0.112x, R 2 =0.6345. 

16.4 data one; infile electric.dat'; 


a proc reg; 

model peakload = air / clm; run; 

b proc reg; 

model peakload = applindx / cli; run; 

44




a proc reg; 

model power = disp / clm; run; 

b proc reg; 

model torque = disp / cli; run; 

c proc reg; 

model torque= power / clm; run; 

d proc reg; 

model mileage = disp / cli; run; 

e proc reg; 

model mileage = torque / clm; run; 

f proc reg; 

model mileage = power / cli; run; 


input Rank Player $ 5-22 Team $ 25-27 Comp Att Pct AttPerGame Yds 

Avg YdsPerGame TD Int FirstD FirstDP P20 P40 Sck Rate; run; 

proc reg data = quarterback; 

model rate = pct / clm cli; 

plot rate*pct; 

title 'Regression model for Quarterback Rating'; run; 

a Yhat = -60.92505 + 2.340x, R 2 =0.6331. 

c The line appears to fit the trend in the data very well. 

45

MODULE 17: MODEL CHECKING IN REGRESSION 

17.1 goptions csymbol = black htext = 2; 


data one; infile 'bonescor.dat' ; 

input index ccratio csi width score pct; run; 

proc reg; 

model score = pct; 

plot score*p.; 

plot student.*p.; 

plot student.*pct; 

output out = new p = yhat r = resid student = sresid; run; 



17.2 data one; infile 'electric.dat' ; 


a proc reg data = one; 

model peakload = air; 

plot peakload*p.; plot student.*p.; plot student.*air; 




run; 

There may be some curvature in the residual plots. A linear model may not be 

appropriate. 

b proc reg data = one; 

model peakload = applindx; 

plot peakload*p.; plot student.*p.; plot student.*applindx; 


proc univariate data = new normal ; var sresid; 


run; 

Assumptions appear to be satisfied. 

c proc reg data = one; 

model peakload = number; 

plot peakload*p.; plot student.*p.; plot student.*number; 




run; 


46




a proc reg data = one; model power = disp; 

plot power*p.; plot student.*p.; plot student.*disp; 




run; 


b proc reg data = one; model torque = disp; 

plot torque*p.; plot student.*p.; plot student.*disp; 




run; 

There may be increasing variation. 

c proc reg data = one; model torque= power; 

plot torque*p.; plot student.*p.; plot student.*power; 




run; 

There may be some curvature in the residuals. 

d proc reg data = one; model mileage = disp; 

plot mileage*p.; plot student.*p.; plot student.*disp; 




run; 


e proc reg data = one; model mileage = torque; 

plot mileage*p.; plot student.*p.; plot student.*torque; 




run; 


47

f proc reg data = one; 

model mileage = power; 

plot mileage*p.; plot student.*p.; plot student.*power; 




run; 


17.4 data one; infile grades.dat'; 

input id $ sex $ class $ quiz exam1 exam2 lab finalexam; run; 

a proc reg data=one; 

model finalexam = exam1; 

plot finalexam*p.; plot student.*p.; plot student.*exam1; 




run; 

Data contains an outlier. 

b proc reg data=one; 

model finalexam = exam2; 

plot finalexam*p.; plot student.*p.; plot student.*exam2; 




run; 

Data contains an outlier. 

c proc reg data=one; 

model finalexam = quiz; 

plot finalexam*p.; plot student.*p.; plot student.*quiz; 




run; 


48


input Rank Player $ 5-22 Team $ 25-27 Comp Att Pct AttPerGame Yds 

Avg YdsPerGame TD Int FirstD FirstDP P20 P40 Sck Rate; run; 


model rate = pct; 

plot rate*p.; 






There may be increasing variation. 

49

MODULE 18: MULTIPLE LINEAR REGRESSION 

18.1 goptions csymbol = black htext = 2; symbol1 value = dot; 

data one; infile 'gas.dat'; 

input @45 mileage 4. @7 power 3. @38 car_wt 4. @11 torque 3. @1 

disp 5.; run; 

a proc reg; model mileage = power car_wt torque; 

output out=new1 p=yhat student=resid; run; 

proc gplot data=new1; 

plot mileage*yhat; title 'Mileage vs. yhat'; 

plot resid*yhat / vref = 0; title 'Resid vs. yhat'; 

plot resid*power / vref = 0; title 'Resid vs. xl'; 

plot resid*car_wt / vref = 0; title 'Resid vs. x2'; 

plot resid*torque / vref = 0; title 'Resid vs. x3'; run; 

proc univariate data=new1 normal; var resid; 

probplot resid / normal (mu = est sigma = est) square; 

title 'Probability plot of residuals'; run ; 

b proc reg data=one; model mileage = power disp torque; 

output out=new2 p=yhat2 student=resid2; run; 

proc gplot data=new2; 

plot mileage*yhat2; title 'Mileage vs. yhat'; 

plot resid2*yhat2 / vref = 0; title 'Resid vs. yhat'; 

plot resid2*power / vref = 0; title 'Resid vs. xl'; 

plot resid2*disp / vref = 0; title 'Resid vs. x2'; 

plot resid2*torque / vref = 0; title 'Resid vs. x3'; run; 

proc univariate data = new2 normal; var resid2; 

probplot resid2 / normal (mu = est sigma = est) square; 

title 'Probability plot of residuals'; run ; 

18.2 data one; infile 'grades.dat'; 

input id $ sex $ class $ quiz exam1 exam2 lab final; run; 

proc reg; model final = quiz exam1 exam2 lab; 

plot final*p.; plot student.*p.; 

plot student.*quiz; plot student.*exam1; 

plot student.*exam2; plot student.*lab; 

output out=new p=yhat student=sresid; 

title 'Multiple Regression Model and Model Checking Plots'; run; 

There appears to be a low outlier. 

proc univariate data=new normal; var sresid; 


title 'Probability plot of residuals'; run; 

With a p-value of 0.0239 for the normality test, the residuals may be nonnormal. 

50


input house income air_cond index fam_num peakload; run; 

proc reg; model peakload = house income air_cond fam_num; 

plot peakload*p.; plot student.*p.; 

plot student.*house; plot student.*income; 

plot student.*air_cond; plot student.*fam_num; 

output out = new p=yhat student=sresid; 

title 'Multiple Regression Model and Model Checking Plots'; run; 




Number in family: t=1.401, p-value=0.1668. Family number is not needed in the model. 

The model assumptions appear valid. 

18.4 data prod; input prod temp light; 

templight=temp*light; 


45 64 60 

49 64 65 

47 66 60 

57 66 65 

48 68 60 

53 68 65 

51 70 60 

54 70 65 

56 72 60 

64 72 65 

; 

run; 

proc reg data = prod; 

model prod = temp light templight; run; 

Interaction not significant – eliminate. 

proc reg data = prod; model prod = temp light; 

plot prod*p.; 


plot student.*temp; 

plot student.*light; 

output out = new p=yhat student=sresid; run; 




R 2 = 79.92%, model p-value = 0.0163; 

Residuals are normal: p-value = 0.8184; 

Plots: may be increasing variation, curvature vs. temp, and increasing variability with 

light. 

51



data quarterback; infile 'quarterback.dat'; 






model rating = comp pct yds int sack; 

plot rating*p.; 


plot student.*comp; 


plot student.*yds; 

plot student.*int; 

plot student.*sack; 

output out=new p=yhat student=sresid; run; 



R 2 = 0.9433, model p-value < 0.0001. 

Sack is not needed in the model: p-value = 0.4234. 

Residuals not quite normal: p-value = 0.0137. 

Otherwise residual plots look OK. 

52

MODULE 19: MULTIPLE REGRESSION 

CHOOSING THE BEST MODEL 


input disp power torque ratio axle barrels speeds car_ln car_wd 

car_wt trans mileage; run; 

proc reg; 

model mileage = disp power torque speeds car_wt car_ln / 

influence collin spec ; run; 


input disp power torque ratio axle barrels speeds car_ln car_wd 

car_wt trans mileage; run; 

a proc reg; 

model mileage = disp power torque speeds car_wt car_ln 

/ selection=stepwise; run; 

b proc reg; 


/ selection=backward; run; 

c proc reg; 


/ selection=forward; run; 

d proc reg; 


/ selection=rsquare cp adjrsq mse; run; 

19.5 data grades; infile 'grades.dat'; 

input quiz 9-10 examl 12-14 exam2 16-18 lab 20-22 final 25-27; 

run; 

proc reg; 

model final = examl exam2 quiz lab / influence collin spec ; run; 

53

19.6 data grades; infile grades.dat'; 

input quiz 9-10 examl 12-14 exam2 16-18 lab 20-22 final 25-27; 

run; 

a proc reg; model final = examl exam2 quiz lab 

/ selection = stepwise; run; 

b proc reg; model final = examl exam2 quiz lab 

/ selection = backward; run; 

c proc reg; model final = examl exam2 quiz lab 

/ selection = forward; run; 

d proc reg; model final = examl exam2 quiz lab 

/ selection = rsquare cp adjrsq mse; run; 

19.8 data pharmacy; infile 'pharmacy.dat'; 

input pharmacy 1-2 volume 5-6 floor_space 9-12 rx_space 15-16 

parking 19-20 shop_center 23 income 26-27; 

format shop_center scfmt.; 

sc_fs=shop_center*floor_space; sc_rxs=shop_center*rx_space; 

sc_p=shop_center*parking; sc_i=shop_center*income; run; 

* Model allowing for interactions between shopping center 

and other variables; 

proc reg data = pharmacy; 

model volume = floor_space rx_space parking shop_center income 

sc_fs sc_rxs sc_p sc_i; run; 

All interactions are not significant. 

After sequentially eliminating non-significant terms we obtain the final model. 


symbol1 value = dot; symbol2 value = square; 

* Final model; 

proc reg data = pharmacy; 

model volume = floor_space rx_space; plot volume*p.; 

output out=new p=yhat student=sresid; run; 



Normality of residuals OK. 


plot sresid*yhat=shop_center / vref = 0; 

plot sresid*floor_space=shop_center / vref = 0; 

plot sresid*rx_space=shop_center / vref = 0; run; 

Residual plots all look OK. 

54






a proc reg data = quarterback; 

model rating = Comp Att Pct AttPerGame Yds Avg YdsPerGame 

TD Int FirstDown FirstDownPct Over20 Over40 Sack 

/ selection = stepwise; run; 

Using defaults, variables in final model: Avg, TD, Int, Pct, Att, Comp. 

b proc reg data = quarterback; 



/ selection = backwards; run; 

Using defaults, variables in final model: Pct, AttPerGame, Yds, Avg, YdsPerGame, TD, 

Int, FirstDown, FirstDownPct, Over40. 

c proc reg data = quarterback; 



/ selection = forward; run; 

Using defaults, variables in final model: Avg, TD, Int, Pct, Over40, Att, Comp, 

YdsPerGame, AttPerGame, Yds. 

55

MODULE 20: TESTS FOR CATEGORICAL DATA 

20.1 data debate; infile 'debate.dat'; 

input id school gender compare argue research reason speak ; 

if school = 3 or school = 5 or school = 6 or school = 8; 

if research = 2 or research = 3 then research = 4; run; 

proc freq data=debate; 

tables (research reason speak argue)*school / chisq expected; 

run; 

20.2 Use data step from 20.1 and then 

data skyline; set debate; if school = 8; run; 

proc freq data=skyline; 

tables gender*(compare argue research reason speak) 

/ chisq expected; run; 

20.3 proc format; value gfmt 1 = 'Female' 5 = 'Male'; run; 

data src; infile 'src.dat'; 

input id gender environ quality air health plants 

jobslost pop jobs hours income age party libcon; 

if environ =1 or environ =2 or environ =3 then env =1; 

else if 4



value yn 1 = 'Yes' 2 = 'No'; 

value yndk 1 = 'Yes' 2 = 'No' 3 = 'Dont Know'; 

value statfmt 1='OnCampus' 2='OffCampus' 3='On+Off' 4='DontWork'; 

value perfmt 1 = 'No' 2 = 'Yearly' 3 = 'Quarterly'; 

value usrn 1='Usually' 2='Sometimes' 3='Rarely' 4='Never'; run; 

data park; infile 'parking.dat'; 

input id miles bus_convenient carpool years status bus Monday 

Tuesday Wednesday Thursday Friday drive permit meters lots; 


if bus_convenient = 99 then bus_convenient = .; 

if id = . then fac_staff = .; 

if miles = 99 then miles = .; 

if carpool = 99 then carpool = .; 

if years = 99 then years = .; 

if status = 99 then status = .; 

if bus = 99 then bus = .; 

if Monday = 99 then Monday = .; 

if Tuesday = 99 then Tuesday = .; 

if Wednesday=99 then Wednesday = .; 

if Thursday = 99 then Thursday = .; 

if Friday = 99 then Friday = .; 

if drive = 99 then drive = .; 

if permit = 99 then permit = .; 

if meters = 99 then meters = .; 

if lots = 99 then lots = .; 

format fac_staff fsfmt. bus yn. bus_convenient yndk. 

status statfmt. permit perfmt. meters usrn. lots usrn.; run; 

proc freq data = park; 

b table fac_staff*bus_convenient/chisq expected cellchi2; 

d table fac_staff*meters/chisq expected cellchi2; 

f table bus*permit/chisq expected cellchi2; run; 

proc sort data =park; by fac_staff; run; 

proc freq data =park; 

table bus*permit/chisq expected cellchi2; 

by fac_staff; run; 

57


value mfmt 1 = 'Never' 2 = 'Occasional' 3 = 'Regular'; 

value pfmt 1 = 'Neither' 2 = 'One' 3 = 'Both'; run; 

data a; 

input s_marijuana p_alc_drug count; 


1 1 141 

1 2 68 

1 3 17 

2 1 54 

2 2 44 

2 3 11 

3 1 40 

3 2 51 

3 3 19 

; run; 

proc freq data = a; 

table p_alc_drug*s_marijuana / chisq expected cellchi2; 

weight count; 

format s_marijuana mfmt. p_alc_drug pfmt.; run; 


value agefmt 1 = '15-54' 2 = '55-64' 3 = '65-74' 4 = 'Over 74'; 

value locfmt 1 = 'Home' 2 = 'Acute-Care' 3 = 'Chronic-Care'; 

run; 

data a; 

input Age Location Count; 


1 1 94 

1 2 418 

1 3 23 

2 1 116 

2 2 524 

2 3 34 

3 1 156 

3 2 581 

3 3 109 

4 1 138 

4 2 558 

4 3 238 

; run; 

proc freq data = a; 

table age*location / chisq expected cellchi2; 

weight count; 

format age agefmt. location locfmt.; run; 

58




input childid 1-4 sex 6 momeduc 29 mmedaid 31 socio 33 smoke5 69 

medaid5 71 socio5 73; 



a tables sex / chisq testp = (0.5, 0.5); 

b tables momeduc*socio / chisq expected; 

c tables smoke5*socio5 / chisq expected; 

d tables medaid5*socio5 / chisq expected; run; 

Note: In (b), (c), and (d) there are many low expected frequencies. Combining the 

socioeconomic categories 3 and 4 into a single category may help. Fisher’s exact test is a 

better solution but the computations can take a long time if the sample size is large. 

59

MODULE 21: NON-PARAMETRIC TESTS 


input id type group position zone resptime folltime; run; 

a proc npar1way wilcoxon data = one; 

where zone = 30; 

class type; var resptime; run; 

p-value

21.4 proc format; value lsfmt 1 = 'Athletic' 2 = 'Senentary'; run; 

data athlete; 

infile 'athlete.dat'; input sbp 1-3 dbp 6-7 sex $ 10 ls 13; 





proc npar1way wilcoxon data = athlete; 

class sex; var sbp dbp; run; 

SBP: p-value = 0.0366. SBP significantly different between males and females; 

DBP: p-value < 0.0001. DBP significantly different between males and females; 

proc sort data = athlete; 

by sex; 

run; 

* Check normality assumption that would be needed for t-test; 

proc univariate normal data = athlete; 

var sbp dbp; by sex; 

SBP: Shapiro-Wilks p-values: Female = 0.0455, Male = 0.0170 – SBP is not quite 

normal. 

DBP: Shapiro-Wilks p-values: Female = 0.5903, Male = 0.4913 - DBP is normal. 


input childid 1-4 bweight 8-11 momeduc 29; run; 

proc npar1way wilcoxon anova data = btt; 

class momeduc; var bweight; run; 

Kruskal Wallis p-value = 0.0881. 

ANOVA p-value = 0.0931. 

61

MODULE 22: ANALYSIS OF COVARIANCE 


input @45 mileage 4. @43 trans $1. @25 speeds $1. @38 car_wt 4. 

@11 torque 3.; run; 

a proc glm; class trans speeds; 

model mileage = trans speeds car_wt / solution; run; 

b proc glm; class trans speeds; 

model mileage = trans speeds torque / solution; run; 

22.2 data two; infile 'dummy.dat'; 

input species $ 1 impactor $ 3-5 stiff1 stiff2 calcium magnesium; 

run; 

a proc glm; class species impactor; 

model stiff1 = species impactor calcium / solution; run ; 

b proc glm; class species impactor; 

model stiff1 = species impactor magnesium; run; 

22.4 proc format; value sfmt 1 = 'Male' 2 = 'Female'; run; 


input childid 1-4 sex 6 bweight 8-11 gestage 13-14 mmedaid 31; 



class sex momeduc mmedaid; 

model bweight = sex mmedaid sex*mmedaid gestage; run; 

62

MODULE 23: LOGISTIC REGRESSION 



value yn 1= 'Yes' 2 = 'No'; 

value yndk 1= 'Yes' 2 = 'No' 3 = 'Dont Know'; run; 

data park; infile 'parking.dat'; 

input id miles bus_convenient carpool years status bus; 


if carpool = 99 then carpool = .; 

if years = 99 then years = .; 

if bus = 99 then bus = .; 

if bus_convenient = 99 then bus_convenient = .; 

format fac_staff fsfmt. bus yn. bus_convenient yndk.; run; 

a proc logist data = park; 

model bus_convenient = fac_staff years; run; 

b proc logist data = park; 

model bus = fac_staff years; run; 

c proc logist data = park; 

model carpool = fac_staff years; run; 




input childid 1-4 sex 6 bweight 8-11 gestage 13-14 momage 16-17 

parity 19 mdbp 21-23 msbp 25-27 mmedaid 31; 


a proc logist data = btt; 

model sex = bweight gestage parity; run; 

proc logist data = btt; 

model sex = parity; run; 

b proc logist data = btt; 

model mmedaid = bweight gestage momage parity mdbp msbp; 

run; 

proc logist data = btt; 

model mmedaid = momage; run; 

63

MODULE 24: MATRIX COMPUTATIONS 

24.1 to 24.3 require the following initial creation of matrices A, B, and C. 

proc iml; 

A = { 2 1 0 3, -1 0 2 4, 4 -2 7 0}; 

B = {-4 3 5 1, 2 2 1 -1, 3 2 -4 5}; 

C = {5, 4, 8}; 

print A B C; 

24.1 a D = A+B; 

b E = A-B; 

c F = A#B; 

d G = A/B; 

print D E F G; 

24.2 a H = A//B; 

b I = A||B; 

c J = A(|,3|); 

d K = B(|2,|); 

e L = B(|1:2,3:4|); 

print H I J K L; 

24.3 a M=T(B); 

b D=A*t(B); 

c N=det(D); 

d O=trace(D); 

e P = diag(D); * Note diag produces a diagonal matrix; 

Q = vecdiag(D); 

f R = solve(D, C); 

print M D, N O, P Q R; 

quit; 

64

24.4 data a; input x1 x2 x3; 


1 4 0.2 

1 5 0.2 

1 6 0.2 

1 7 0.2 

1 4 0.3 

1 5 0.3 

1 6 0.3 

1 7 0.3 

1 4 0.4 

1 5 0.4 

1 6 0.4 

1 7 0.4 

run; 

proc iml; 

use A; * To make data set A available within proc iml; 

read all var {x1 x2 x3} into X; 

Y = {4.3, 5.5, 6.8, 8.0, 4.0, 5.2, 6.6, 7.5, 2.0, 4.0, 5.7, 6.5}; 

I12=I(12); 

J12=J(12, 12, 1); 

print X Y I12 J12; 

a B=inv(X`*X)*X`*Y; 

b A=X*B; 

c C=Y`*Y-Y`*J12*Y/12; 

d D=Y`*Y-B`*X`*Y; 

e E=Y-X*B; 

f F=C-D; 

g G=D/9; 

h H=X*inv(X`*X)*X`; 

k K=Y`*(I12-H)*Y; 

l L=Y`*(H-J12/12)*Y; 

m M=G*inv(X`*X); 

n N=sqrt(diag(M)); 

o O=(I12-H)*Y; 

print B A C E, D F G, H, K L M N O; 

* Create a SAS data set containing a matrix for use in 24.5; 

create ydata from y[colname={y}]; append from y; 

quit; 

24.5 data reg; 

merge a ydata; run; 

proc reg data = reg; 

model y = x2 x3 / p r influence; run; 

65

MODULE 25: MACRO VARIABLES AND PROGRAMS 


proc format; 

value lsfmt 1 = "Athletic" 2 = "Sedentary"; 

value $sfmt 'M' = 'Male' 'F'= 'Female'; run; 






format ls lsfmt. sex $sfmt.; run; 

%macro boxt(data, y, x); 

proc sort data = &data; by &x; run; 

(i) proc boxplot data = &data; 

plot &y*&x / boxstyle=schematic; run; 

(ii) proc ttest data = &data; 

class &x; var &y; run; 

%mend boxt; 

a %boxt(athlete, sbp, sex); 

b %boxt(athlete, dbp, sex); 

c %boxt(athlete, sbp, ls); 

d %boxt(athlete, dbp, ls); 

66



symbol2 value = square; 

data elec; infile 'electric.dat'; 

input hs 1-3 fi 6-11 acc 14-16 ai 19-23 fm 26-28 phl 31-35; 

label hs = 'House Size' 

fi = 'Family Income' 

acc = 'Air Conditioning Capacity' 

phl = 'Peak Hour Load'; run; 

%MACRO simplereg(data, yvar, xvar); 

(i) proc gplot data = &data; 

plot &yvar * &xvar; 

title "Plot of &yvar vs. &xvar"; 

run; 

(ii) proc corr data = &data; 

var &yvar &xvar; 

title "Correlation of &yvar vs. &xvar"; 

run; 

(iii) proc reg data = &data; 

model &yvar = &xvar; 

(iv) plot &yvar * p. p.*p. / overlay; 

plot student.*p.; plot student.* &xvar; 

title "Regression of &yvar vs. &xvar and model-checking 

plots"; 

run; 

%MEND simplereg; 

a %simplereg(elec, phl, hs); 

b %simplereg(elec, phl, fi); 

c %simplereg(elec, phl, acc); 

67

Instructor's Solutions Manual for Learning SAS in the Computer Lab

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?