04.01.2013 Views

Instructor's Solutions Manual for Learning SAS in the Computer Lab

Instructor's Solutions Manual for Learning SAS in the Computer Lab

Instructor's Solutions Manual for Learning SAS in the Computer Lab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Instructor <strong>Solutions</strong> <strong>Manual</strong><br />

<strong>for</strong><br />

<strong>Learn<strong>in</strong>g</strong> <strong>SAS</strong> <strong>in</strong> <strong>the</strong> <strong>Computer</strong> <strong>Lab</strong><br />

3 rd EDITION<br />

Rebecca J. Elliott<br />

Statistically Significant<br />

Christopher H. Morrell<br />

Loyola University Maryland<br />

Prepared by<br />

Christopher H. Morrell<br />

Loyola University Maryland<br />

Australia • Brazil • Japan • Korea • Mexico • S<strong>in</strong>gapore • Spa<strong>in</strong> • United K<strong>in</strong>gdom • United States


© 2010 Brooks/Cole, Cengage <strong>Learn<strong>in</strong>g</strong><br />

ALL RIGHTS RESERVED. No part of this work covered by <strong>the</strong><br />

copyright here<strong>in</strong> may be reproduced, transmitted, stored, or<br />

used <strong>in</strong> any <strong>for</strong>m or by any means graphic, electronic, or<br />

mechanical, <strong>in</strong>clud<strong>in</strong>g but not limited to photocopy<strong>in</strong>g,<br />

record<strong>in</strong>g, scann<strong>in</strong>g, digitiz<strong>in</strong>g, tap<strong>in</strong>g, Web distribution,<br />

<strong>in</strong><strong>for</strong>mation networks, or <strong>in</strong><strong>for</strong>mation storage and retrieval<br />

systems, except as permitted under Section 107 or 108 of <strong>the</strong><br />

1976 United States Copyright Act, without <strong>the</strong> prior written<br />

permission of <strong>the</strong> publisher except as may be permitted by <strong>the</strong><br />

license terms below.<br />

For product <strong>in</strong><strong>for</strong>mation and technology assistance, contact us at<br />

Cengage <strong>Learn<strong>in</strong>g</strong> Customer & Sales Support,<br />

1-800-354-9706<br />

For permission to use material from this text or product, submit<br />

all requests onl<strong>in</strong>e at www.cengage.com/permissions<br />

Fur<strong>the</strong>r permissions questions can be emailed to<br />

permissionrequest@cengage.com<br />

Pr<strong>in</strong>ted <strong>in</strong> <strong>the</strong> United States of America<br />

1 2 3 4 5 6 7 11 10 09 08 07<br />

ISBN-13: 978-0-495-82797-9<br />

ISBN-10: 0-495-82797-5<br />

Brooks/Cole<br />

20 Channel Center Street<br />

Boston, MA 02210<br />

USA<br />

Cengage <strong>Learn<strong>in</strong>g</strong> is a lead<strong>in</strong>g provider of customized<br />

learn<strong>in</strong>g solutions with office locations around <strong>the</strong> globe,<br />

<strong>in</strong>clud<strong>in</strong>g S<strong>in</strong>gapore, <strong>the</strong> United K<strong>in</strong>gdom, Australia,<br />

Mexico, Brazil, and Japan. Locate your local office at:<br />

<strong>in</strong>ternational.cengage.com/region<br />

Cengage <strong>Learn<strong>in</strong>g</strong> products are represented <strong>in</strong><br />

Canada by Nelson Education, Ltd.<br />

For your course and learn<strong>in</strong>g solutions, visit<br />

academic.cengage.com<br />

Purchase any of our products at your local college<br />

store or at our preferred onl<strong>in</strong>e store<br />

www.ichapters.com<br />

NOTE: UNDER NO CIRCUMSTANCES MAY THIS MATERIAL OR ANY PORTION THEREOF BE SOLD, LICENSED, AUCTIONED,<br />

OR OTHERWISE REDISTRIBUTED EXCEPT AS MAY BE PERMITTED BY THE LICENSE TERMS HEREIN.<br />

Dear Professor or O<strong>the</strong>r Supplement Recipient:<br />

Cengage <strong>Learn<strong>in</strong>g</strong> has provided you with this product (<strong>the</strong><br />

“Supplement”) <strong>for</strong> your review and, to <strong>the</strong> extent that you adopt<br />

<strong>the</strong> associated textbook <strong>for</strong> use <strong>in</strong> connection with your course<br />

(<strong>the</strong> “Course”), you and your students who purchase <strong>the</strong><br />

textbook may use <strong>the</strong> Supplement as described below. Cengage<br />

<strong>Learn<strong>in</strong>g</strong> has established <strong>the</strong>se use limitations <strong>in</strong> response to<br />

concerns raised by authors, professors, and o<strong>the</strong>r users<br />

regard<strong>in</strong>g <strong>the</strong> pedagogical problems stemm<strong>in</strong>g from unlimited<br />

distribution of Supplements.<br />

Cengage <strong>Learn<strong>in</strong>g</strong> hereby grants you a nontransferable license<br />

to use <strong>the</strong> Supplement <strong>in</strong> connection with <strong>the</strong> Course, subject to<br />

<strong>the</strong> follow<strong>in</strong>g conditions. The Supplement is <strong>for</strong> your personal,<br />

noncommercial use only and may not be reproduced, posted<br />

electronically or distributed, except that portions of <strong>the</strong><br />

Supplement may be provided to your students IN PRINT FORM<br />

ONLY <strong>in</strong> connection with your <strong>in</strong>struction of <strong>the</strong> Course, so long<br />

as such students are advised that <strong>the</strong>y<br />

READ IMPORTANT LICENSE INFORMATION<br />

may not copy or distribute any portion of <strong>the</strong> Supplement to any<br />

third party. You may not sell, license, auction, or o<strong>the</strong>rwise<br />

redistribute <strong>the</strong> Supplement <strong>in</strong> any <strong>for</strong>m. We ask that you take<br />

reasonable steps to protect <strong>the</strong> Supplement from unauthorized<br />

use, reproduction, or distribution. Your use of <strong>the</strong> Supplement<br />

<strong>in</strong>dicates your acceptance of <strong>the</strong> conditions set <strong>for</strong>th <strong>in</strong> this<br />

Agreement. If you do not accept <strong>the</strong>se conditions, you must<br />

return <strong>the</strong> Supplement unused with<strong>in</strong> 30 days of receipt.<br />

All rights (<strong>in</strong>clud<strong>in</strong>g without limitation, copyrights, patents, and<br />

trade secrets) <strong>in</strong> <strong>the</strong> Supplement are and will rema<strong>in</strong> <strong>the</strong> sole and<br />

exclusive property of Cengage <strong>Learn<strong>in</strong>g</strong> and/or its licensors. The<br />

Supplement is furnished by Cengage <strong>Learn<strong>in</strong>g</strong> on an “as is” basis<br />

without any warranties, express or implied. This Agreement will<br />

be governed by and construed pursuant to <strong>the</strong> laws of <strong>the</strong> State<br />

of New York, without regard to such State’s conflict of law rules.<br />

Thank you <strong>for</strong> your assistance <strong>in</strong> help<strong>in</strong>g to safeguard <strong>the</strong> <strong>in</strong>tegrity<br />

of <strong>the</strong> content conta<strong>in</strong>ed <strong>in</strong> this Supplement. We trust you f<strong>in</strong>d <strong>the</strong><br />

Supplement a useful teach<strong>in</strong>g tool.


CONTENTS<br />

PREFACE...................................................................................................................................... iv<br />

MODULE 1: THE BASICS ..........................................................................................................1<br />

MODULE 2: MORE <strong>SAS</strong> BASICS..............................................................................................4<br />

MODULE 3: DATA MANAGEMENT........................................................................................7<br />

MODULE 4: <strong>SAS</strong> FUNCTIONS ................................................................................................10<br />

MODULE 5: DESCRIPTIVE STATISTICS I............................................................................12<br />

MODULE 6: PROC GCHART...................................................................................................14<br />

MODULE 7: DESCRIPTIVE STATISTICS II ..........................................................................17<br />

MODULE 8: GENERATING RANDOM OBSERVATIONS...................................................21<br />

MODULE 9: X-Y PLOTS ..........................................................................................................23<br />

MODULE 10: ONE SAMPLE TESTS FOR µ, p.........................................................................26<br />

MODULE 11: TWO SAMPLE T-TESTS ....................................................................................31<br />

MODULE 12: ONE-WAY ANOVA ............................................................................................33<br />

MODULE 13: TWO-WAY ANOVA AND MORE .....................................................................36<br />

MODULE 14: MODEL CHECKING IN ANOVA ......................................................................38<br />

MODULE 15: CORRELATIONS ................................................................................................41<br />

MODULE 16: SIMPLE LINEAR REGRESSION .......................................................................43<br />

MODULE 17: MODEL CHECKING IN REGRESSION. ...........................................................46<br />

MODULE 18: MULTIPLE LINEAR REGRESSION..................................................................50<br />

MODULE 19: MULTIPLE REGRESSION-CHOOSING THE BEST MODEL.........................53<br />

MODULE 20: TESTS FOR CATEGORICAL DATA.................................................................56<br />

MODULE 21: NON-PARAMETRIC TESTS ..............................................................................60<br />

MODULE 22: ANALYSIS OF COVARIANCE..........................................................................62<br />

MODULE 23: LOGISTIC REGRESSION...................................................................................63<br />

MODULE 24: MATRIX COMPUTATIONS...............................................................................64<br />

MODULE 25: MACRO VARIABLES AND PROGRAMS ........................................................66<br />

iii


PREFACE<br />

This solutions manual provides <strong>the</strong> <strong>SAS</strong> code needed <strong>for</strong> problems <strong>in</strong> <strong>Learn<strong>in</strong>g</strong> <strong>SAS</strong> <strong>in</strong> <strong>the</strong><br />

<strong>Computer</strong> <strong>Lab</strong>, 3 rd Edition. There are many possible ways to write programs that will run and<br />

generate <strong>the</strong> desired output. This manual provides one set of solutions. In this manual, <strong>SAS</strong><br />

code will be displayed <strong>in</strong> a Courier font.<br />

Parts of problems (a, b, c, and so on) are often related and should be <strong>in</strong>corporated <strong>in</strong> one <strong>SAS</strong><br />

program. The solution may have program code common to all parts of <strong>the</strong> problem listed first,<br />

followed by code <strong>for</strong> particular parts listed under a, b, c, and so on. In some cases, more common<br />

code follows <strong>the</strong> code <strong>for</strong> <strong>the</strong> parts.<br />

Problems <strong>in</strong> <strong>the</strong> early chapters call <strong>for</strong> label and title statements as well as <strong>the</strong> use of PROC<br />

FORMAT. <strong>Solutions</strong> <strong>for</strong> later chapters do not <strong>in</strong>clude <strong>the</strong>se statements although I recommend<br />

<strong>the</strong>y be assigned. Students should also be required/strongly encouraged to properly document<br />

<strong>the</strong>ir <strong>SAS</strong> program with comments.<br />

There are many different ways to read <strong>the</strong> data sets <strong>in</strong>cluded with <strong>the</strong> manual. I have used<br />

different <strong>for</strong>mats throughout <strong>the</strong> solutions manual as examples. Instructors may also wish to<br />

<strong>in</strong>clude some data sets as Microsoft Excel files <strong>for</strong> <strong>the</strong> students to read so that students can ga<strong>in</strong><br />

experience read<strong>in</strong>g data <strong>in</strong> this common <strong>for</strong>mat.<br />

In <strong>Learn<strong>in</strong>g</strong> <strong>SAS</strong> <strong>in</strong> <strong>the</strong> <strong>Computer</strong> <strong>Lab</strong>, 3 rd Edition, I recommend that <strong>SAS</strong> code be <strong>for</strong>matted <strong>in</strong><br />

ways that make <strong>the</strong> code easy to read and debug. In order to save space, I have not <strong>in</strong>cluded such<br />

<strong>for</strong>matt<strong>in</strong>g <strong>in</strong> <strong>the</strong> solutions.<br />

For some problems, answers to <strong>the</strong> statistical questions are provided. This may help to decide<br />

which problems to assign.<br />

iv


1.1 data one;<br />

<strong>in</strong>put pH time temp;<br />

datal<strong>in</strong>es;<br />

4.5 20 125<br />

4.1 22 133<br />

4.8 18 149<br />

4.0 26 120<br />

5.0 25 120<br />

6.0 21 138<br />

;<br />

run;<br />

proc pr<strong>in</strong>t; run;<br />

1.2 Use <strong>the</strong> same data step as <strong>in</strong> 1.1 and <strong>the</strong>n<br />

proc pr<strong>in</strong>t; var temp pH; run;<br />

MODULE 1: THE BASICS<br />

1.3 data sizes;<br />

<strong>in</strong>put size $ color $ price shipcost;<br />

datal<strong>in</strong>es;<br />

large red 18.97 0.25<br />

medium blue 24.68 1.10<br />

x-large black 29.99 1.75<br />

small orange 15.89 0.90<br />

;<br />

run;<br />

proc pr<strong>in</strong>t;<br />

var size color price shipcost; run;<br />

1.4 Use <strong>the</strong> same data step as <strong>in</strong> 1.3 and <strong>the</strong>n<br />

proc pr<strong>in</strong>t;<br />

var color size price; run;<br />

1.5 data schools;<br />

<strong>in</strong>put school $ no_teach no_stud;<br />

datal<strong>in</strong>es;<br />

granite 5829 200486<br />

jordan 12433 318992<br />

davis 2358 126331<br />

;<br />

run;<br />

proc pr<strong>in</strong>t;<br />

var school no_teach no_stud; run;<br />

1


1.6 The <strong>in</strong>put statement <strong>in</strong> 1.1 changes to<br />

<strong>in</strong>put pH 1-3 time 5-6 temp 8-10; datal<strong>in</strong>es;<br />

1.7 The <strong>in</strong>put statement <strong>in</strong> 1.1 changes to<br />

<strong>in</strong>put @1 pH @5 time @8 temp;<br />

1.8 The <strong>in</strong>put statement <strong>in</strong> 1.3 changes to<br />

<strong>in</strong>put size $ 1-7 color $ 9-14 price 16-20 shipcost 23-26;<br />

1.9 The <strong>in</strong>put statement <strong>in</strong> 1.3 changes to<br />

<strong>in</strong>put @1 size $7. @9 color $6. @16 price 5.2 @23 shipcost 4.2;<br />

1.10 data appo<strong>in</strong>t;<br />

<strong>in</strong>put time $ 1-5 person $ 8-12 where $ 15-27<br />

subject $ 29-44 length 48-49;<br />

datal<strong>in</strong>es;<br />

11:OO Sally room 30 personnel review 45<br />

1:00 Jim Jim's office brake design 30<br />

3:00 Nancy lab test results 30<br />

;<br />

run;<br />

proc pr<strong>in</strong>t;<br />

var time person where subject length; run ;<br />

1.11 The <strong>in</strong>put statement <strong>in</strong> 1.10 changes to<br />

<strong>in</strong>put @1 time $5. @8 person $5. @15 where $12.<br />

@29 subject $16. @48 length 2.0;<br />

2


1.12 data popcorn;<br />

<strong>in</strong>put @1 brand $20. @22 time $4. @27 notpop 3.0;<br />

datal<strong>in</strong>es;<br />

Orville Redenbacker 2:15 80<br />

Orville Redenbacker 2:15 89<br />

Orville Redenbacker 2:30 57<br />

Orville Redenbacker 2:30 60<br />

Orville Redenbacker 2:45 60<br />

Orville Redenbacker 2:45 46<br />

Smith's 2:15 170<br />

Smith's 2:15 147<br />

Smith's 2:30 196<br />

Smith's 2:30 114<br />

Smith's 2:45 98<br />

Smith's 2:45 90<br />

Pop Secret 2:15 215<br />

Pop Secret 2:15 78<br />

Pop Secret 2:30 98<br />

Pop Secret 2:30 83<br />

Pop Secret 2:45 75<br />

Pop Secret 2:45 65<br />

;<br />

run;<br />

proc pr<strong>in</strong>t; run;<br />

3


MODULE 2: MORE <strong>SAS</strong> BASICS<br />

2.1 a data one; <strong>in</strong>file 'utility.dat';<br />

<strong>in</strong>put @1 month $3. @5 year 2. phone 9-14 fuel 18-22<br />

elec 25-29;<br />

if month='Jan' <strong>the</strong>n monthnum=l;<br />

else if month='Feb' <strong>the</strong>n monthnum=2;<br />

else if month='Mar' <strong>the</strong>n monthnum=3;<br />

else if month='Apr' <strong>the</strong>n monthnum=4;<br />

else if month='May' <strong>the</strong>n monthnum=5;<br />

else if month='Jun' <strong>the</strong>n monthnum=6;<br />

else if month='Jul' <strong>the</strong>n monthnum=7;<br />

else if month='Aug' <strong>the</strong>n monthnum=8;<br />

else if month='Sep' <strong>the</strong>n monthnum=9;<br />

else if month='Oct' <strong>the</strong>n monthnum=lO;<br />

else if month='Nov' <strong>the</strong>n monthnum=ll;<br />

else if month='Dec' <strong>the</strong>n monthnum=12;<br />

totalexp = phone + fuel + elec;<br />

run;<br />

proc pr<strong>in</strong>t; run;<br />

b Use <strong>the</strong> same data step as <strong>in</strong> (a) and <strong>the</strong>n<br />

proc sort; by year monthnum; run;<br />

proc pr<strong>in</strong>t; by year;<br />

var month phone; run;<br />

c Use <strong>the</strong> same data step as <strong>in</strong> (a) and <strong>the</strong>n<br />

proc sort; by monthnum year; run;<br />

proc pr<strong>in</strong>t; by monthnum;<br />

var year phone; run;<br />

d Use <strong>the</strong> same data step as <strong>in</strong> (a) and <strong>the</strong>n<br />

proc pr<strong>in</strong>t; where year = 92; run;<br />

e Use <strong>the</strong> same data step as <strong>in</strong> (a) and <strong>the</strong>n<br />

proc sort data = one; by year;<br />

proc pr<strong>in</strong>t;<br />

where month = 'Jan' or month='Feb' or month='Mar';<br />

by year; run;<br />

f Sort by year and month to compare years across months.<br />

Sort by month and year to compare months across years.<br />

4


2.2 a data one; <strong>in</strong>file 'ch<strong>in</strong>a#l.dat';<br />

<strong>in</strong>put year total exports imports;<br />

deficit = exports - imports;<br />

run;<br />

proc pr<strong>in</strong>t; run;<br />

b data two; set one;<br />

if 1955


2.5 a, b proc <strong>for</strong>mat;<br />

value $ktfmt 'o' = 'Overhand' 'f' = 'Figure8';<br />

value rfmt 1 = 'Cotton' 2 = 'Tw<strong>in</strong>e' 3 = 'Nylon';<br />

value kdfmt 1 = 'Parallel' 2 = 'Perpendicular';<br />

run;<br />

data one; <strong>in</strong>file 'knots.dat';<br />

<strong>in</strong>put Knot_Type $ 4 Rope 7 Knot_Direction 10 Weight 13-15;<br />

Break_Weight=Weight-162;<br />

Brk_Wgt_kg=Break_Weight/2.2;<br />

<strong>for</strong>mat Knot_Type $ktfmt. Rope rfmt. Knot_Direction kdfmt.;<br />

run;<br />

proc sort;<br />

by descend<strong>in</strong>g Break_Weight; run;<br />

proc pr<strong>in</strong>t; run;<br />

2.6 proc <strong>for</strong>mat;<br />

value htnfmt 1='Normotensive' 2='IDH' 3='ISH' 4='Hypertension';<br />

run;<br />

data one; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put childid sex bweight gestage momage parity<br />

mdbp msbp momeduc mmedaid socio<br />

dbp5 sbp5 ht5 wt5 hdl5 ldl5 trig5 smoke5 medaid5 socio5;<br />

bmi5 = wt5/(ht5*ht5);<br />

if msbp >= 140 and mdbp >= 90 <strong>the</strong>n htn = 4;<br />

else if msbp >= 140 and mdbp < 90 <strong>the</strong>n htn = 3;<br />

else if msbp < 140 and mdbp >= 90 <strong>the</strong>n htn = 2;<br />

else if msbp < 140 and mdbp < 90 <strong>the</strong>n htn = 1;<br />

else if msbp = . or mdbp = . <strong>the</strong>n htn = .;<br />

<strong>for</strong>mat htn htnfmt.; run;<br />

a data one10; set one; if _n_


MODULE 3: DATA MANAGEMENT<br />

3.1 data one; <strong>in</strong>file 'ch<strong>in</strong>a#l.dat';<br />

<strong>in</strong>put year 1-4 total 6-10 exports 12-16 imports 18-22;<br />

run ;<br />

/ * It is first necessary to put data <strong>in</strong> year order be<strong>for</strong>e<br />

comput<strong>in</strong>g <strong>the</strong> change <strong>in</strong> exports or imports * /<br />

a proc sort; by year; run;<br />

data two; set one;<br />

/ * The next two l<strong>in</strong>es compute change <strong>in</strong> exports */<br />

lastyrex = lag(exports);<br />

changeex = exports - lastyrex;<br />

b / * The next two l<strong>in</strong>es compute change <strong>in</strong> imports */<br />

lastyrim = lag(imports);<br />

changeim = imports - lastyrim; run;<br />

proc pr<strong>in</strong>t;<br />

var year exports lastyrex changeex imports lastyrim changeim;<br />

run;<br />

3.2 data utils; <strong>in</strong>file 'utility.dat';<br />

<strong>in</strong>put @1 month $3. 85 year 2.0 phone 9-14 fuel 18-22 elec 25-29;<br />

if month = 'Jan' <strong>the</strong>n monthnum =l;<br />

else if month = 'Feb' <strong>the</strong>n monthnum =2;<br />

else if month = 'Mar' <strong>the</strong>n monthnum =3;<br />

else if month = 'Apr' <strong>the</strong>n monthnum =4;<br />

else if month = 'May' <strong>the</strong>n monthnum =5;<br />

else if month = 'Jun' <strong>the</strong>n monthnum =6;<br />

else if month = 'Jul' <strong>the</strong>n monthnum =7;<br />

else if month = 'Aug' <strong>the</strong>n monthnum =8;<br />

else if month = 'Sep' <strong>the</strong>n monthnum =9;<br />

else if month = 'Oct' <strong>the</strong>n monthnum =lo;<br />

else if month = 'Nov' <strong>the</strong>n monthnum =11;<br />

else if month = 'Dec' <strong>the</strong>n monthnum =12;<br />

run;<br />

/* Put data <strong>in</strong> year month order * /<br />

proc sort; by year monthnum; run;<br />

a data year90; set utils; if year = 90;<br />

lastmonth = lag(phone);<br />

change = phone - lastmonth; run;<br />

proc pr<strong>in</strong>t; var year month phone lastmonth change; run;<br />

b data w<strong>in</strong>ter; set utils; if month = 'Jan';<br />

lastyr = lag(fue1);<br />

change = fuel - lastyr; run;<br />

proc pr<strong>in</strong>t; var month year fuel lastyr change; run;<br />

7


3.4 data DH;<br />

<strong>in</strong>put flavor $ 1-10 height; brand = 'DH';<br />

datal<strong>in</strong>es;<br />

DevilsFood 39.0<br />

DevilsFood 36.5<br />

White 30.5<br />

White 34.5<br />

Yellow 37.0<br />

Yellow 35.0<br />

;<br />

run;<br />

data BC;<br />

<strong>in</strong>put flavor $ 1-10 height; brand = 'BC';<br />

datal<strong>in</strong>es;<br />

Yellow 35.5<br />

Yellow 36.0<br />

DevilsFood 35.5<br />

DevilsFood 37.5<br />

White 32.5<br />

White 32.5<br />

;<br />

run;<br />

a * Concatenate <strong>the</strong> two data sets ;<br />

data Cake; set DH BC;<br />

file 'Module3-4a.dat';<br />

put flavor $ 1-10 brand $ 12-13 height 15-18 .1; run;<br />

b * Re<strong>for</strong>mulate Duncan H<strong>in</strong>es data <strong>for</strong> match merg<strong>in</strong>g ;<br />

data DH1; set dh;<br />

dhht = height; keep flavor dhht; run;<br />

proc sort; by flavor; run;<br />

* Re<strong>for</strong>mulate Betty Crocker data <strong>for</strong> match merg<strong>in</strong>g;<br />

data BC1; set BC;<br />

bcht = height; keep flavor bcht; run;<br />

proc sort; by flavor; run;<br />

data Cake1; merge dh1 bc1; by flavor;<br />

file 'Module3-4b.dat';<br />

put flavor $ 1-10 dhht 12-16 .1 bcht 18-22 .1; run;<br />

3.5 data ml_first25; <strong>in</strong>file 'moonlake.dat' obs=25;<br />

<strong>in</strong>put propane 1 naturalgas 2 eeproducts 3 sshacwhs 4 ewrs 5<br />

remr 6 garbage 7 tagto 8 <strong>in</strong>ternet 9 hss 10 New 12 OneBill 14<br />

NG 15 Elec 16 PG 17 FuelOil 18 Wood 19 Coal 20 Solar 21 Source 22<br />

AgeHeat 24 TypeWater 25 Agewater 26 HowLng 27 PCHome 34 PCPlan 35<br />

Internet 36 Provider 37 Age 40 Educ 41 Income 42 sex 43; run;<br />

proc pr<strong>in</strong>t; run;<br />

8


3.6 data ml_26_50;<br />

<strong>in</strong>file 'moonlake.dat' firstobs=26 obs=50;<br />

Input statement as <strong>in</strong> 3.5.<br />

data ml_251_300;<br />

<strong>in</strong>file 'moonlake.dat' firstobs=251 obs = 300;<br />

Input statement as <strong>in</strong> 3.5.<br />

data ml2;<br />

set ML_26_50 ML_251_300; run;<br />

proc pr<strong>in</strong>t data = ml2; run;<br />

9


MODULE 4: <strong>SAS</strong> FUNCTIONS<br />

4.1 data well; <strong>in</strong>file 'well#l.dat';<br />

<strong>in</strong>put @1 date $8. nitrate z<strong>in</strong>c TDS;<br />

month = substr(date,l,3);<br />

day = substr(date,4,2);<br />

run;<br />

proc pr<strong>in</strong>t; run;<br />

4.2 data one; <strong>in</strong>put value;<br />

posval = abs(value);<br />

root = sqrt(posva1);<br />

newval = sqrt(abs(va1ue)) ;<br />

datal<strong>in</strong>es;<br />

2.7<br />

-6.9<br />

3.4<br />

0.5<br />

1.3<br />

; run;<br />

proc pr<strong>in</strong>t; run;<br />

4.3 data one; <strong>in</strong>put x;<br />

a cumprob = probbnm1(0.23,13,x);<br />

b greater = 1 - cumprob;<br />

c if x = 0 <strong>the</strong>n lessprob = .;<br />

else lessprob = probbnm1(.23,13,x-1);<br />

datal<strong>in</strong>es;<br />

0<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

9<br />

10<br />

11<br />

12<br />

13<br />

; run;<br />

proc pr<strong>in</strong>t; run;<br />

10


4.4 data b<strong>in</strong>omial; <strong>in</strong>put x; n = 5; p = 0.40;<br />

a cdf=probbnml(p, n, x);<br />

b pdf=cdf-lag(cdf);<br />

if x = 0 <strong>the</strong>n pdf = cdf;<br />

datal<strong>in</strong>es;<br />

0<br />

1<br />

2<br />

3<br />

4<br />

5<br />

;<br />

run;<br />

proc pr<strong>in</strong>t; run;<br />

4.5 data norm; mu=12.6; sigma=2.3;<br />

x = 10;<br />

z=(x-mu)/sigma;<br />

x1 = 15;<br />

z1=(x1-mu)/sigma;<br />

x2 = 7.6;<br />

z2=(x2-mu)/sigma;<br />

a prob_a=probnorm(z);<br />

b prob_b=probnorm(z1)-probnorm(z2);<br />

run;<br />

proc pr<strong>in</strong>t; run;<br />

11


MODULE 5: DESCRIPTIVE STATISTICS I<br />

5.1 data utils; <strong>in</strong>file 'utility.dat';<br />

<strong>in</strong>put @1 date $6. @5 year 2.0 phone 9-14 fuel 17-22 elec 25-29;<br />

total = phone + fuel + elec;<br />

label phone = 'phone costs'<br />

fuel = 'fuel costs'<br />

elec = 'electricity costs'<br />

total = 'total utility costs'; run;<br />

proc univariate plot; var phone fuel elec total;<br />

id date; title 'Descriptive Stats <strong>for</strong> utility Costs';<br />

run;<br />

Extreme phone costs: Low--Jan92, Jan89, Dec91, Oct92, Jan93.<br />

High—May90, Jan91, Apr90, Jan90, Jun90. No outliers.<br />

Extreme fuel costs: Low--Jul92, Ju190, Aug90, Ju189, Aug89.<br />

High--Jan92, Feb89, Jan89, Feb93, Jan92. No outliers.<br />

Extreme elec costs: Low--Jun92, Sep91, Mar92, Apr90, Mar90.<br />

High--Jun89, Nov88, Jan89, Oct88, Dec88. Dec88 is an outlier.<br />

Extreme total costs: Low--Sep92, Aug92, Oct92, Aug91, May92.<br />

High--Jan89, Feb91, Dec88, Jan90, Jan91. No outliers.<br />

5.2 Use data step as <strong>in</strong> 5.1 and <strong>the</strong>n<br />

proc sort; by year; run;<br />

proc univariate; by year; var total; id date;<br />

title 'Total utility Costs <strong>for</strong> each Year'; run;<br />

5.3 proc <strong>for</strong>mat; value lsfmt 1 = "Athletic" 2 = "Sedentary"; run;<br />

data athlete; <strong>in</strong>file 'athlete.dat';<br />

<strong>in</strong>put sbp 1-3 dbp 6-7 sex $ 10 ls 13;<br />

label sbp = 'Systolic Blood Pressure'<br />

dbp = 'Diastolic Blood Pressure' ls = 'Lifestyle';<br />

<strong>for</strong>mat ls lsfmt.; run;<br />

proc sort data = athlete; by sex ls; run;<br />

* Compare bp's among <strong>the</strong> 4 sex by lifestyle groups ;<br />

a proc univariate plots; var dbp; by sex ls;<br />

title 'Description of diastolic bp by sex and lifestyle';<br />

run;<br />

b proc univariate plots normal; var sbp;<br />

probplot sbp / normal(mu = est sigma = est);<br />

title 'Check<strong>in</strong>g whe<strong>the</strong>r sbp is normal'; run;<br />

12


5.4 data one; <strong>in</strong>file 'ch<strong>in</strong>a#l.dat';<br />

<strong>in</strong>put year 1-4 total 6-10 exports 12-16 imports 18-22;<br />

deficit = imports - exports; run;<br />

proc univariate plot;<br />

var imports exports deficit;<br />

id year; title 'Statistics on Ch<strong>in</strong>a''s Trade'; run;<br />

5.5 proc <strong>for</strong>mat;<br />

value bfmt 1 = 'Duracell' 2 = 'Energizer' 3 = 'Rayovac'<br />

4 = 'Radio Shack'; run;<br />

data one; <strong>in</strong>file 'battery.dat';<br />

<strong>in</strong>put brand 1 load 4-6 time 9-11;<br />

label brand = 'Battery Brand'<br />

time = 'Time to discharge';<br />

<strong>for</strong>mat brand bfmt.; run;<br />

proc boxplot;<br />

plot time*brand / boxstyle=schematic cboxes = black;<br />

title 'Compar<strong>in</strong>g discharge times among battery brands'; run;<br />

5.6 data park; <strong>in</strong>file 'park<strong>in</strong>g.dat';<br />

<strong>in</strong>put id miles; if miles = 99 <strong>the</strong>n miles = .;<br />

label miles = 'Distance live from campus'; run;<br />

proc univariate plot; var miles; id id;<br />

title 'Descriptive statistics of distance live from campus'; run;<br />

5.7 data quarterback; <strong>in</strong>file 'quarterback.dat';<br />

<strong>in</strong>put player $ 5-22 rat<strong>in</strong>g 101-105;<br />

label rat<strong>in</strong>g = 'Quarterback rat<strong>in</strong>g'; run;<br />

proc univariate plot; var rat<strong>in</strong>g; id player;<br />

title 'Descriptive statistics of quarterback rat<strong>in</strong>gs'; run;<br />

5.8 proc <strong>for</strong>mat;<br />

value sfmt 1 = 'Male' 2 = 'Female'; run;<br />

data btt; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put childid 1-4 sex 6 bweight 8-11 gestage 13-14;<br />

label bweight = 'Birth weight'<br />

gestage = 'Gestational age';<br />

<strong>for</strong>mat sex sfmt.; run;<br />

proc sort; by sex; run;<br />

proc univariate plot;<br />

var bweight gestage;<br />

id childid;<br />

by sex;<br />

title 'Statistics <strong>for</strong> birth weight and gestaional age by sex';<br />

run;<br />

13


MODULE 6: PROC GCHART<br />

6.1 data one; <strong>in</strong>file 'utility.dat';<br />

<strong>in</strong>put @1 date $char6. @5 year 2.0 phone fuel elec;<br />

total = phone + fuel + elec;<br />

label phone = 'phone costs'<br />

fuel = 'fuel costs'<br />

elec = 'electricity costs'<br />

total = 'total utility costs'; run;<br />

proc gchart;<br />

vbar phone fuel elec total / space = 0;<br />

title 'Histograms of utility costs'; run;<br />

The distributions are right skewed.<br />

6.2 Use <strong>the</strong> same data step as <strong>in</strong> 6.1 and <strong>the</strong>n<br />

data two; set one; if 90


6.5 proc <strong>for</strong>mat;<br />

value $sexfmt 'F'='Female' 'M'='Male'; run;<br />

data run;<br />

<strong>in</strong>file 'runn<strong>in</strong>g.dat';<br />

<strong>in</strong>put class sex $ @5 m<strong>in</strong>ute1 1.0 @7 second1 2.0<br />

@10 m<strong>in</strong>ute2 1.0 @12 second2 2.0;<br />

time1 = m<strong>in</strong>ute1*60 + second1;<br />

time2 = m<strong>in</strong>ute2*60 + second2;<br />

label class = 'Grade <strong>in</strong> School'<br />

time1 = 'Runn<strong>in</strong>g Time <strong>for</strong> First Race'<br />

time2 = 'Runn<strong>in</strong>g Time <strong>for</strong> Second Race';<br />

<strong>for</strong>mat sex sexfmt.; run;<br />

goptions htext = 2;<br />

proc gchart data = run;<br />

vbar time1 / space = 0 width = 10 midpo<strong>in</strong>ts = 70 to 140 by 10;<br />

vbar time2 / space = 0 width = 10 midpo<strong>in</strong>ts = 70 to 130 by 10;<br />

run;<br />

6.6 proc <strong>for</strong>mat;<br />

value sfmt 1 = 'Natural Gas' 2 = 'Electricity' 3 = 'Propane Gas'<br />

4 = ' ' 5 = 'Wood' 6 = 'Coal';<br />

value <strong>in</strong>cfmt 1='=$75,000' 6='Refuse'; run;<br />

data ml;<br />

<strong>in</strong>file 'moonlake.dat';<br />

<strong>in</strong>put propane 1 Source 22 Income 42;<br />

label propane = "Interest <strong>in</strong> purchas<strong>in</strong>g propane (1=Not, 5=Very)"<br />

Source = "Primary Energy Source <strong>for</strong> Heat"<br />

Income = "Annual Household Income";<br />

<strong>for</strong>mat Source sfmt. Income <strong>in</strong>cfmt.; run;<br />

proc gchart data = ml;<br />

* The bars <strong>for</strong> source ordered from highest to lowest;<br />

hbar source / midpo<strong>in</strong>ts = 1 3 2 5 6 ;<br />

hbar propane / midpo<strong>in</strong>ts = 1 to 6 by 1;<br />

hbar <strong>in</strong>come / midpo<strong>in</strong>ts = 1 to 6 by 1; run;<br />

15


6.7 proc <strong>for</strong>mat;<br />

value fsfmt 0 = 'Student' 1 = 'Faculty/Staff';<br />

value usrn 1='Usually' 2='Sometimes' 3='Rarely' 4='Never';<br />

run;<br />

data park;<br />

<strong>in</strong>file 'park<strong>in</strong>g.dat';<br />

<strong>in</strong>put id miles bus_convenient carpool years status bus Monday<br />

Tuesday Wednesday Thursday Friday drive permit meters lots;<br />

if id = 400 <strong>the</strong>n fac_staff = 0;<br />

if years = 99 <strong>the</strong>n years = .; if bus = 99 <strong>the</strong>n bus = .;<br />

if Monday = 99 <strong>the</strong>n Monday = .;<br />

if Tuesday = 99 <strong>the</strong>n Tuesday = .;<br />

if Wednesday = 99 <strong>the</strong>n Wednesday = .;<br />

if Thursday = 99 <strong>the</strong>n Thursday = .;<br />

if Friday = 99 <strong>the</strong>n Friday = .;<br />

busdays = Monday + Tuesday + Wednesday + Thursday + Friday;<br />

if bus = 2 <strong>the</strong>n busdays = 0;<br />

if lots = 99 <strong>the</strong>n lots = .;<br />

<strong>for</strong>mat fac_staff fsfmt. bus yn. lots usrn.; run;<br />

proc gchart data = park;<br />

a hbar years / space = 0 width = 6 midpo<strong>in</strong>ts = 1 to 7 by 1;<br />

c vbar busdays /<br />

space = 0 width = 10 midpo<strong>in</strong>ts = 0 to 5 by 1; run;<br />

b proc sort data = park; by fac_staff bus; run;<br />

proc gchart data = park;<br />

hbar lots / midpo<strong>in</strong>ts = 1 to 4 by 1;<br />

by fac_staff bus; run;<br />

6.8 proc <strong>for</strong>mat;<br />

value mefmt 1 = '= HS'; run;<br />

data btt; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put childid 1-4 momeduc 29 socio 33 socio5 73;<br />

<strong>for</strong>mat momeduc mefmt.; run;<br />

goptions htext = 1;<br />

a proc gchart data = btt;<br />

vbar momeduc / midpo<strong>in</strong>ts = 1 to 4 by 1; run;<br />

goptions htext = 2;<br />

b proc gchart data = btt;<br />

hbar socio socio5 / midpo<strong>in</strong>ts = 0 to 4 by 1; run;<br />

16


MODULE 7: DESCRIPTIVE STATISTICS II<br />

7.1 data ml; <strong>in</strong>file 'moonlake.dat';<br />

<strong>in</strong>put propane 1 naturalgas 2 eeproducts 3 sshacwhs 4 ewrs 5<br />

remr 6 garbage 7 tagto 8 <strong>in</strong>ternet 9 hss 10; run;<br />

data omitmiss<strong>in</strong>g; set ml;<br />

if propane = 6 <strong>the</strong>n propane = .;<br />

if naturalgas = 6 <strong>the</strong>n naturalgas = .;<br />

if eeproducts = 6 <strong>the</strong>n eeproducts = .;<br />

if sshacwhs = 6 <strong>the</strong>n sshacwhs = .;<br />

if ewrs = 6 <strong>the</strong>n ewrs = .;<br />

if remr = 6 <strong>the</strong>n remr = .;<br />

if garbage = 6 <strong>the</strong>n garbage = .;<br />

if tagto = 6 <strong>the</strong>n tagto = .;<br />

if <strong>in</strong>ternet = 6 <strong>the</strong>n <strong>in</strong>ternet = .;<br />

if hss = 6 <strong>the</strong>n hss = .; run;<br />

proc means data = omitmiss<strong>in</strong>g;<br />

var propane naturalgas eeproducts sshacwhs ewrs remr garbage<br />

tagto <strong>in</strong>ternet hss; run;<br />

7.2 proc <strong>for</strong>mat;<br />

value $sexfmt 'F'='Female' 'M'='Male'; run;<br />

data runn<strong>in</strong>g; <strong>in</strong>file 'runn<strong>in</strong>g.dat';<br />

<strong>in</strong>put class 1 sex $ 3 m<strong>in</strong>1 5 sec1 7-8 m<strong>in</strong>2 10 sec2 12-13;<br />

time1=m<strong>in</strong>1*60+sec1;<br />

time2=m<strong>in</strong>2*60+sec2;<br />

label time1 = 'Time <strong>for</strong> Race 1'<br />

time2 = 'Time <strong>for</strong> Race 2';<br />

<strong>for</strong>mat sex sexfmt.; run;<br />

proc means data = runn<strong>in</strong>g;<br />

class sex class; var time1; run;<br />

7.3 proc <strong>for</strong>mat;<br />

value lsfmt 1 = "Athletic" 2 = "Sedentary"; run;<br />

data athlete; <strong>in</strong>file 'athlete.dat';<br />

<strong>in</strong>put sbp 1-3 dbp 6-7 sex $ 10 ls 13;<br />

label sbp = 'Systolic Blood Pressure'<br />

dbp = 'Diastolic Blood Pressure'<br />

ls = 'Lifestyle';<br />

<strong>for</strong>mat ls lsfmt.; run;<br />

proc means data = athlete;<br />

class ls; var sbp dbp; run;<br />

7.4 data golf; <strong>in</strong>file 'golf.dat';<br />

<strong>in</strong>put Golfer 1 Compression 3-5 Material 8 Distance ; run;<br />

proc means data = golf;<br />

class Golfer; var distance; run;<br />

17


7.5 proc <strong>for</strong>mat;<br />

value contfmt 0 = 'Not Contam<strong>in</strong>ated' 1 = 'Contam<strong>in</strong>ated'; run;<br />

data well; <strong>in</strong>file 'well#1.dat';<br />

<strong>in</strong>put date $ 1-5 month $ 1-3 day 4-5 year 7-8 nitrate 11-15 .3<br />

z<strong>in</strong>c 18-22 .3 TDS 25-27;<br />

if (nitrate > 0.12) or (z<strong>in</strong>c > 0.02) or (TDS > 516) <strong>the</strong>n<br />

contam<strong>in</strong>ate = 1;<br />

else contam<strong>in</strong>ate = 0;<br />

<strong>for</strong>mat contam<strong>in</strong>ate contfmt.; run;<br />

a proc freq;<br />

table contam<strong>in</strong>ate;<br />

b table contam<strong>in</strong>ate*year; run;<br />

All of <strong>the</strong> data <strong>in</strong> 1990 is contam<strong>in</strong>ated. In 1991, half is contam<strong>in</strong>ated.<br />

7.6 proc <strong>for</strong>mat;<br />

value outfmt 0 = 'Failure' 1 = 'Success'; run;<br />

data one;<br />

<strong>in</strong>file 'survresp.dat';<br />

<strong>in</strong>put <strong>in</strong>centive n_cont n_treat r_cont r_treat;<br />

label n_cont = 'Sample size <strong>for</strong> control group'<br />

n_treat = 'Sample size <strong>for</strong> treatment group'<br />

r_cont = 'Response rate <strong>for</strong> control group'<br />

r_treat = 'Response rate <strong>for</strong> treatment group';<br />

if r_cont < r_treat <strong>the</strong>n outcome = 1;<br />

else outcome = 0;<br />

<strong>for</strong>mat outcome outfmt.; run;<br />

proc freq;<br />

tables outcome outcome*<strong>in</strong>centive; run;<br />

18


7.7 proc <strong>for</strong>mat;<br />

value bgfmt 0 = 'Bad' 1 = 'Good'; run;<br />

data sk<strong>in</strong>; <strong>in</strong>file 'sclero.dat';<br />

<strong>in</strong>put cl<strong>in</strong>ic id drug thickl thick2 mobill mobil2 assessl assess2;<br />

if thickl > thick2 <strong>the</strong>n r_thick = 1; else r_thick = 0 ;<br />

if mobill < mobil2 <strong>the</strong>n r_mobil = 1; else r_mobil = 0;<br />

if assessl > assess2 <strong>the</strong>n r_assess = 1; else r_assess = 0;<br />

label r_thick = 'Sk<strong>in</strong> thicken<strong>in</strong>g improvement'<br />

r_mobil = 'Sk<strong>in</strong> mobility improvement'<br />

r_assess = 'Patient assessment improvement';<br />

<strong>for</strong>mat r_thick bgfmt. r_mobil bgfmt. r_assess bgfmt. ; run;<br />

a proc freq; tables cl<strong>in</strong>ic; run;<br />

Cl<strong>in</strong>ics #46 and #49 had <strong>the</strong> largest number of patients <strong>in</strong> <strong>the</strong> study.<br />

b proc freq;<br />

tables drug*cl<strong>in</strong>ic; run;<br />

c proc freq data = sk<strong>in</strong>;<br />

where cl<strong>in</strong>ic = 46 or cl<strong>in</strong>ic = 48 or cl<strong>in</strong>ic = 49;<br />

tables cl<strong>in</strong>ic*drug*(r_thick r_mobil r_assess); run;<br />

d proc freq data = sk<strong>in</strong>;<br />

where drug = 1;<br />

tables r_thick*r_assess; run;<br />

34.38%, 21.88%<br />

19


7.8 proc <strong>for</strong>mat;<br />

value sfmt 1 = 'Natural Gas' 2 = 'Electricity' 3 = 'Propane Gas'<br />

4 = ' ' 5 = 'Wood' 6 = 'Coal';<br />

value agefmt 1='18-34' 2='35-49' 3='50-64' 4='>=65' 5='Refuse';<br />

run;<br />

data moonlake; <strong>in</strong>file 'moonlake.dat';<br />

<strong>in</strong>put propane 1 <strong>in</strong>ternet 9 NG 15 Elec 16 PG 17 FuelOil 18 Wood 19<br />

Coal 20 Solar 21 Source 22 Internet 36 Age 40;<br />

label propane = "Interest <strong>in</strong> purchas<strong>in</strong>g propane (1=Not, 5=Very)"<br />

Source = "Primary Energy Source <strong>for</strong> Heat"<br />

<strong>for</strong>mat Solar availfmt. Source sfmt. age agefmt.; run;<br />

proc freq data = moonlake;<br />

a table Source ;<br />

b table NG Elec PG FuelOil Wood Coal Solar;<br />

c table NG*Propane;<br />

d table Internet*age; run;<br />

e proc freq data = moonlake;<br />

where PCHome = 1;<br />

table Internet*age; run;<br />

7.9 proc <strong>for</strong>mat;<br />

value mefmt 1 = '= HS'; run;<br />

data btt; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put momeduc 29 socio 33 socio5 73;<br />

<strong>for</strong>mat momeduc mefmt.; run;<br />

proc freq data = btt;<br />

table socio*socio5 socio*momeduc; run;<br />

20


MODULE 8: GENERATING RANDOM OBSERVATIONS<br />

8.1 data one; do i=1 to 1000; obs = rannor(4241)*20 + 50;<br />

output; end; run;<br />

proc gchart data = one;<br />

vbar obs / space = 0;<br />

title 'Random samples from N(50,400)'; run;<br />

8.2 a data one; do i=1 to 50; obs = rannor(70776)*10 + 10;<br />

output; end; run;<br />

proc gchart;<br />

vbar obs / space = 0;<br />

title 'Random sample of 50 obs of N(10,lOO)'; run;<br />

b data two; do i=1 to 500; obs = rannor(70776)*10 + 10;<br />

output; end; run;<br />

proc gchart;<br />

vbar obs / space = 0;<br />

title 'Random sample of 500 obs of N(10,lOO)'; run;<br />

c data three; do i=1 to 5000; obs = rannor(70776)*10 + 10;<br />

output; end; run;<br />

proc gchart;<br />

vbar obs / space = 0;<br />

title 'Random sample of 5000 obs of N(10,lOO)'; run;<br />

8.3 data exp; do i = 1 to 1000; x = ranexp(6664)/7;<br />

output; end; run;<br />

proc gchart;<br />

vbar x / space = 0 width = 6;<br />

title 'An exponential distribution with lambda=7'; run;<br />

8.4 data poisson; do i = 1 to 700; y = ranpoi(9001, 5);<br />

output; end; run;<br />

proc gchart;<br />

vbar y / space = 0;<br />

title 'A Poisson distribution with mean=5'; run;<br />

8.5 data b<strong>in</strong>; do i = 1 to 500; xval = ranb<strong>in</strong>(2721, 40, 0.2);<br />

output; end; run;<br />

proc gchart;<br />

vbar xval / space = 0 midpo<strong>in</strong>ts = 0 to 20 by 1;<br />

title 'A B<strong>in</strong>omial distribution with n=40 and p=0.2'; run;<br />

21


8.6 data new; do i = 1 to 1000;<br />

x1 = ranexp(434911)/7;<br />

x2 = ranexp(434911)/7;<br />

x3 = ranexp(434911)/7;<br />

x4 = ranexp(434911)/7;<br />

x5 = ranexp(434911)/7;<br />

x6 = ranexp(434911)/7;<br />

x7 = ranexp(434911)/7;<br />

x8 = ranexp(434911)/7;<br />

x9 = ranexp(434911)/7;<br />

x10= ranexp(434911)/7;<br />

average = (x1+x2+x3+x4+x5+x6+x7+x8+x9+x10)/10; output; end; run;<br />

proc gchart;<br />

vbar average / space = 0 width = 6;<br />

title 'Distribution of average of exponential r.v.''s '; run;<br />

8.7 data uni<strong>for</strong>m; do i = 1 to 1000;<br />

val1= ranuni(887890)*10 + 10;<br />

va12= ranuni(887890)*10 + 10;<br />

va13= ranuni(887890)*10 + 10;<br />

va14= ranuni(887890)*10 + 10;<br />

va15= ranuni(887890)*10 + 10;<br />

va16= ranuni(887890)*10 + 10 ;<br />

va17= ranuni(887890)*10 + 10;<br />

va18= ranuni(887890)*10 + 10;<br />

va19= ranuni(887890)*10 + 10;<br />

vall0=ranuni(887890)*10 + 10;<br />

ave = (val1+va12+va13+va14+va15+va16+va17+va18+va19+vall0)/10;<br />

output; end; run;<br />

proc gchart;<br />

vbar ave / space = 0 width = 6;<br />

title 'Distribution of average of a Uni<strong>for</strong>m r.v. on (10,20)';<br />

run;<br />

22


MODULE 9: X-Y PLOTS<br />

9.1 data utility; <strong>in</strong>file 'utility.dat';<br />

<strong>in</strong>put month $ 1-3 year 5-6 phone 9-15 fuel 17-22 elec 25-29;<br />

total=phone + fuel + elec;<br />

if month = 'Jan' <strong>the</strong>n mnth = 1;<br />

else if month = 'Feb' <strong>the</strong>n mnth = 2;<br />

else if month = 'Mar' <strong>the</strong>n mnth = 3;<br />

else if month = 'Apr' <strong>the</strong>n mnth = 4;<br />

else if month = 'May' <strong>the</strong>n mnth = 5;<br />

else if month = 'Jun' <strong>the</strong>n mnth = 6;<br />

else if month = 'Jul' <strong>the</strong>n mnth = 7;<br />

else if month = 'Aug' <strong>the</strong>n mnth = 8;<br />

else if month = 'Sep' <strong>the</strong>n mnth = 9;<br />

else if month = 'Oct' <strong>the</strong>n mnth = 10;<br />

else if month = 'Nov' <strong>the</strong>n mnth = 11;<br />

else mnth = 12;<br />

if 89


9.3 data well; <strong>in</strong>file 'well#8.dat';<br />

<strong>in</strong>put @1 month $3. @4 day 2. @7 year 2. z<strong>in</strong>c;<br />

if month = 'Jan' <strong>the</strong>n mo = 1;<br />

else if month = 'Feb' <strong>the</strong>n mo = 2;<br />

else if month = 'Mar' <strong>the</strong>n mo = 3;<br />

else if month = 'Apr' <strong>the</strong>n mo = 4;<br />

else if month = 'May' <strong>the</strong>n mo = 5;<br />

else if month = 'Jun' <strong>the</strong>n mo = 6;<br />

else if month = 'Jul' <strong>the</strong>n mo = 7;<br />

else if month = 'Aug' <strong>the</strong>n mo = 8;<br />

else if month = 'Sep' <strong>the</strong>n mo = 9;<br />

else if month = 'Oct' <strong>the</strong>n mo = 10;<br />

else if month = 'Nov' <strong>the</strong>n mo = 11;<br />

else if month = 'Dec' <strong>the</strong>n mo = 12;<br />

<strong>for</strong>mat date date7. ;<br />

date = mdy (mo, day, year) ; run;<br />

proc sort; by year mo day; run;<br />

goptions csymbol = black;<br />

symbol1 value = dot i = jo<strong>in</strong>;<br />

proc gplot; by year;<br />

plot z<strong>in</strong>c*date;<br />

title 'Z<strong>in</strong>c concentrations over time'; run;<br />

9.4 data one; <strong>in</strong>file 'hand<strong>in</strong>j.dat';<br />

<strong>in</strong>put id $ type $ dayslost cost;<br />

label dayslost = 'Days of work lost'<br />

cost = 'Cost <strong>in</strong> Irish pounds'; run;<br />

goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

proc gplot;<br />

plot dayslost*cost;<br />

title 'Lost work days vs. cost'; run;<br />

9.5 data two; <strong>in</strong>file 'survresp.dat';<br />

<strong>in</strong>put <strong>in</strong>centive n_cont n_treat r_cont r_treat;<br />

improve =(r_treat - r_cont)/r_cont;<br />

label n_cont = 'Sample size <strong>for</strong> control group'<br />

n_treat = 'Sample size <strong>for</strong> treatment group'<br />

r_cont = 'Response <strong>for</strong> control group'<br />

r_treat = 'Response <strong>for</strong> treatment group'<br />

improve = 'Improvement <strong>in</strong> response rate'; run;<br />

goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

proc gplot;<br />

plot improve*<strong>in</strong>centive;<br />

title 'Improvement <strong>in</strong> response vs. types of <strong>in</strong>centive';<br />

run;<br />

24


9.6 data athlete; <strong>in</strong>file 'athlete.dat';<br />

<strong>in</strong>put sbp 1-3 dbp 6-7 sex $ 10 ls 13;<br />

label sbp = 'Systolic Blood Pressure'<br />

dbp = 'Diastolic Blood Pressure';<br />

run;<br />

goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

proc gplot;<br />

plot sbp*dbp;<br />

title 'Plot of systolic vs. diastolic blood pressure'; run;<br />

9.7 data <strong>in</strong>jury; <strong>in</strong>file '<strong>in</strong>jury.dat';<br />

<strong>in</strong>put year 1-4 burns 6-10 amputations 12-16; run;<br />

goptions csymbol = black;<br />

symbol1 value = dot i = jo<strong>in</strong>;<br />

symbol2 value = star i = jo<strong>in</strong> l<strong>in</strong>e = 2;<br />

axis1 label = ('Injuries');<br />

legend1 label = (H = 1.5 cell) value = (H = 1.5 cell);<br />

proc gplot;<br />

plot burns*year=1 amputations*year=2 /<br />

overlay vaxis=axis1 legend=legend1;<br />

title 'Plot of burns and amputations by year'; run;<br />

9.8 proc <strong>for</strong>mat;<br />

value $efmt 's' = 'Sou<strong>the</strong>rn' 'n' = 'Nor<strong>the</strong>rn'; run;<br />

data trees; <strong>in</strong>file 'trees.dat';<br />

<strong>in</strong>put location $ 1 elevation 3-6 damage 8-9;<br />

<strong>for</strong>mat location $efmt.; run;<br />

goptions csymbol = black htext = 2;<br />

symbol1 value = 'n';<br />

symbol2 value = 's';<br />

proc gplot data = trees;<br />

plot damage*elevation = location; run;<br />

9.9 data quarterback; <strong>in</strong>file 'quarterback.dat';<br />

<strong>in</strong>put YdsPerGame 63-67 TD 70-71 Int 74-75 Rat<strong>in</strong>g 101-105; run;<br />

goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

proc gplot;<br />

plot Rat<strong>in</strong>g*(YdsPerGame TD Int); run;<br />

25


MODULE 10: ONE SAMPLE TESTS FOR μ, p<br />

10.1 data well; <strong>in</strong>file 'well#1.dat';<br />

<strong>in</strong>put @11 nitrate 4. @18 z<strong>in</strong>c 5. @25 tds 3.;<br />

testnitr = nitrate - 0.1; testz<strong>in</strong>c = z<strong>in</strong>c - 0.01;<br />

testtds = tds - 475; run;<br />

proc means n mean std t probt;<br />

var testnitr testz<strong>in</strong>c testtds; run;<br />

a p-value = 0.22725<br />

b p-value = 0.05735<br />

c p-value = 0.0001<br />

10.2 data utils; <strong>in</strong>file 'utility.dat';<br />

<strong>in</strong>put @9 phone 6. @17 fuel 6. @25 elec 5.;<br />

testphone = phone - 50;<br />

testelec = elec - 30; run;<br />

proc means n mean std t probt;<br />

var testphone testelec; run;<br />

a p-value < 0.00005<br />

b p-value = 0.0498<br />

10.3 data runn<strong>in</strong>g; <strong>in</strong>file 'runn<strong>in</strong>g.dat';<br />

<strong>in</strong>put class 1 sex $ 3 m<strong>in</strong>1 5 sec1 7-8 m<strong>in</strong>2 10 sec2 12-13;<br />

time1=m<strong>in</strong>1*60+sec1;<br />

time2=m<strong>in</strong>2*60+sec2;<br />

testt1_78=time1-78;<br />

testt2_95=time2-95;<br />

label time1 = 'Time <strong>for</strong> Race 1'<br />

time2 = 'Time <strong>for</strong> Race 2';<br />

run;<br />

proc means data = runn<strong>in</strong>g n mean std t probt;<br />

where sex = 'F';<br />

var testt1_78 testt2_95; run;<br />

a p-value = 0.0217<br />

b p-value = 0.03435<br />

26


10.4 data debate; <strong>in</strong>file 'debate.dat';<br />

<strong>in</strong>put id school gender compare argue research reason speak;<br />

if compare = 1 <strong>the</strong>n debate_more =1;<br />

else debate_more =0;<br />

if compare = . <strong>the</strong>n debate_more = .;<br />

if argue = 1 <strong>the</strong>n argue_very =1;<br />

else argue_very =0;<br />

if argue = . <strong>the</strong>n argue_very = .;<br />

if research = 1 <strong>the</strong>n research_very =1;<br />

else research_very =0;<br />

if research = . <strong>the</strong>n research_very = .;<br />

if reason = 1 <strong>the</strong>n reason_very =1;<br />

else reason_very =0;<br />

if reason = . <strong>the</strong>n reason_very = .;<br />

if speak = 1 <strong>the</strong>n speak_very =1;<br />

else speak_very =0;<br />

if speak = . <strong>the</strong>n speak_very = .; run;<br />

proc freq;<br />

a tables debate_more / chisq testp = (0.25, 0.75);<br />

c tables argue_very / chisq testp = (0.2, 0.8);<br />

e tables research_very / chisq testp = (0.25, 0.75);<br />

f tables reason_very / chisq testp = (0.05, 0.95); run;<br />

a) pˆ = 0.771, p-value = 0.3857/2 = 0.19285.<br />

c) pˆ = 0.853, p-value = 0.0187.<br />

e) pˆ = 0.722, p-value = 0.2564/2 = 0.1282.<br />

f) pˆ = 0.893, p-value < 0.0001.<br />

b proc freq; where school = 8;<br />

tables debate_more / chisq testp = (0.25, 0.75); run;<br />

d proc freq; where gender = 1;<br />

tables argue_very / chisq testp = (0.2, 0.8); run;<br />

g proc freq; where gender=2 and school=9;<br />

tables speak_very / chisq testp = (0.25, 0.75); run;<br />

b) pˆ = 0.887, p-value= 0.0127/2 = 0.00635.<br />

d) pˆ = 0.881, p-value=0.0058.<br />

g) pˆ = 0.708, p-value = 0.6374/2 = 0.3187.<br />

27


10.5 data src; <strong>in</strong>file 'src.dat';<br />

<strong>in</strong>put @8 environ 2. @18 plant_an 2. @30 employ 2. @55 libcon 1.;<br />

if environ <strong>in</strong> (8, 9, 10) <strong>the</strong>n env_strong = 1;<br />

else if 1


10.7 data bball;<br />

<strong>in</strong>put baskets;<br />

datal<strong>in</strong>es;<br />

12<br />

8<br />

11<br />

10<br />

12<br />

6<br />

10<br />

14<br />

12<br />

8<br />

12<br />

12<br />

6<br />

8<br />

12<br />

15<br />

13<br />

9<br />

11<br />

10<br />

;<br />

run;<br />

proc means data = bball n mean std clm;<br />

var baskets; run;<br />

Note: The clm option tells proc means to compute a confidence <strong>in</strong>terval <strong>for</strong> <strong>the</strong> mean.<br />

29


10.8 proc <strong>for</strong>mat;<br />

value sfmt 1 = 'Male' 2 = 'Female'; run;<br />

data btt; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put childid 1-4 sex 6 bweight 8-11 gestage 13-14;<br />

testgage=gestage - 266/7;<br />

if bweight < 2500 <strong>the</strong>n low_wt = 1;<br />

else low_wt = 0;<br />

if bweight = . <strong>the</strong>n low_wt = .;<br />

testbwgt = bweight - 3332;<br />

<strong>for</strong>mat sex sfmt.; run;<br />

proc freq data = btt;<br />

a tables sex / chisq testp = (0.5, 0.5);<br />

c tables low_wt / chisq testp = (0.918, 0.082); run;<br />

b, d proc means data = btt n mean std t probt;<br />

var testgage testbwgt; run;<br />

a) Girls = 0.475, p-value = 0.4552.<br />

b) t = 1.96, p-value = 0.0518.<br />

c) = 0.055, p-value = 0.1517.<br />

d) t = −5.29, p-value is


MODULE 11: TWO SAMPLE T-TESTS<br />

11.1 data lens; <strong>in</strong>file 'cataract.dat';<br />

<strong>in</strong>put type $ astig; run;<br />

proc ttest; class type; var astig; run;<br />

Variances unequal: t=−2.00, p-value=0.0724.<br />

11.2 data gas; <strong>in</strong>file 'gas.dat';<br />

<strong>in</strong>put @43 trans $1. @45 mileage 4.; run;<br />

proc ttest; class trans; var mileage; run;<br />

Variances unequal: t=4.03, p-value=0.0051/2 = 0.00255.<br />

11.3 data grades; <strong>in</strong>file 'grades.dat';<br />

<strong>in</strong>put @5 gender $1. @25 f<strong>in</strong>al 3.0; run;<br />

proc ttest; class gender; var f<strong>in</strong>al; run;<br />

Variances equal: t=1.00, p-value=0.3229.<br />

11.4 data hands; <strong>in</strong>file 'hand<strong>in</strong>j.dat';<br />

<strong>in</strong>put @7 type $5. @13 days 2.0 @16 cost 4.0; run;<br />

proc ttest; class type; var days cost; run;<br />

a Variances unequal: t=−1.08, p-value=0.2904.<br />

b Variances unequal: t=−0.68, p-value=0.5039.<br />

11.5 data src; <strong>in</strong>file 'src.dat';<br />

<strong>in</strong>put @6 gender $1. @8 environ 2.0; run;<br />

proc ttest; class gender; var environ; run;<br />

Variances equal: t=−0.33, p-value=0.7391.<br />

11.6 data robots; <strong>in</strong>file 'robot.dat';<br />

<strong>in</strong>put put_humn put_robt qul_humn qul_robt;<br />

put_diff = put_humn - put_robt;<br />

qul_diff = qul_humn - qul_robt; run;<br />

proc means n mean std t prt;<br />

var put_diff qul_diff; run;<br />

a Paired t-test: t=−2.63, p-value=0.0340.<br />

b Paired t-test: t=−1.96, p-value=0.0914.<br />

31


11.7 proc <strong>for</strong>mat;<br />

value $sexfmt 'F'='Female' 'M'='Male'; run;<br />

data runn<strong>in</strong>g; <strong>in</strong>file 'runn<strong>in</strong>g.dat';<br />

<strong>in</strong>put class 1 sex $ 3 m<strong>in</strong>1 5 sec1 7-8 m<strong>in</strong>2 10 sec2 12-13;<br />

time1=m<strong>in</strong>1*60+sec1;<br />

time2=m<strong>in</strong>2*60+sec2;<br />

label time1 = 'Time <strong>for</strong> Race 1'<br />

time2 = 'Time <strong>for</strong> Race 2';<br />

<strong>for</strong>mat sex sexfmt.; run;<br />

proc ttest data = runn<strong>in</strong>g;<br />

class sex; var time1 time2; run;<br />

Time1: Variances unequal: t=2.31, p-value=0.0411.<br />

Time2: Variances equal: t=2.33, p-value=0.0336.<br />

11.8 proc <strong>for</strong>mat;<br />

value lsfmt 1 = "Athletic" 2 = "Sedentary"; run;<br />

data athlete; <strong>in</strong>file 'athlete.dat';<br />

<strong>in</strong>put sbp 1-3 dbp 6-7 sex $ 10 ls 13;<br />

label sbp = 'Systolic Blood Pressure'<br />

dbp = 'Diastolic Blood Pressure'<br />

ls = 'Lifestyle';<br />

<strong>for</strong>mat ls lsfmt.; run;<br />

proc ttest data = athlete;<br />

class ls; var dbp sbp; run;<br />

DBP: Variances equal: t=−2.02, p-value=0.0503.<br />

SBP: Variances unequal: t=−5.75, p-value


MODULE 12: ONE-WAY ANOVA<br />

12.1 data one; <strong>in</strong>file 'taillite.dat';<br />

<strong>in</strong>put @13 zone 2. @4 truck 1. @17 response 3. @7 group 1.; run ;<br />

a data zone30; set one; if zone = 30 and group = 1; run;<br />

proc glm; class truck;<br />

model response = truck;<br />

means truck / tukey l<strong>in</strong>es; run;<br />

F=16.4, p-value


12.4 data airplanes; <strong>in</strong>file 'airplanes.dat' delimiter = ',';<br />

<strong>in</strong>put design $ paper $ hang_time; run;<br />

a proc anova data = airplanes; class design;<br />

model hang_time = design;<br />

means design / snk l<strong>in</strong>es ; run;<br />

F = 9.91, p-value=0.0004, design groups: glide vs. dart, sonic.<br />

b proc anova data = airplanes; class paper;<br />

model hang_time = paper;<br />

means paper / snk l<strong>in</strong>es ; run;<br />

F = 1.29, p-value=0.2954.<br />

12.5 data popcorn; <strong>in</strong>put @1 brand $20. @22 time $4. @27 notpop 3.0;<br />

datal<strong>in</strong>es;<br />

Orville Redenbacker 2:15 80<br />

Orville Redenbacker 2:15 89<br />

Orville Redenbacker 2:30 57<br />

Orville Redenbacker 2:30 60<br />

Orville Redenbacker 2:45 60<br />

Orville Redenbacker 2:45 46<br />

Smith's 2:15 170<br />

Smith's 2:15 147<br />

Smith's 2:30 196<br />

Smith's 2:30 114<br />

Smith's 2:45 98<br />

Smith's 2:45 90<br />

Pop Secret 2:15 215<br />

Pop Secret 2:15 78<br />

Pop Secret 2:30 98<br />

Pop Secret 2:30 83<br />

Pop Secret 2:45 75<br />

Pop Secret 2:45 65<br />

;<br />

run;<br />

proc anova data = popcorn;<br />

class brand;<br />

model notpop = brand;<br />

means brand / tukey; run;<br />

F = 4.30, p-value=0.0334, Smith’s = Pop Secret, Pop Secret = Orville Redenbacker<br />

34


12.7 proc <strong>for</strong>mat;<br />

value bfmt 1 = 'Duracell' 2 = 'Energizer'<br />

3 = 'Rayovac' 4 = 'Radio Shack'; run;<br />

data battery; <strong>in</strong>file 'battery.dat';<br />

<strong>in</strong>put Brand 1 load 4-6 time 9-11;<br />

<strong>for</strong>mat Brand bfmt.; run;<br />

a proc anova data = battery;<br />

class brand;<br />

model time = brand;<br />

means brand; run;<br />

F = 0.02,p-value=0.9950.<br />

b proc anova data = battery;<br />

class load;<br />

model time = load;<br />

means load / snk; run;<br />

F=996.24, p-value


MODULE 13: TWO-WAY ANOVA AND MORE<br />

13.1 data one; <strong>in</strong>file 'taillite.dat';<br />

<strong>in</strong>put @4 type 1. @7 group 1. @I0 position 1. @13 zone 2.<br />

@17 response 3. @23 follow 2.; run;<br />

a proc glm; class group type;<br />

model response = group type group*type;<br />

means group type / tukey l<strong>in</strong>es; run;<br />

Group and type are significant. The <strong>in</strong>teraction is not.<br />

Type group<strong>in</strong>gs: 4 vs. 3, 1, 2. Group: F=4.63, p-value=0.0317.<br />

Type: F=9.38, p-value


13.4 data calls; <strong>in</strong>file 'calls.dat';<br />

<strong>in</strong>put week shift day $ number; run;<br />

proc glm; class shift day;<br />

model number = shift day shift*day;<br />

means shift day; run;<br />

Model: F=0.95, p-value=0.5087.<br />

Shift: F=2.26, p-value=0.1080;<br />

Day: F=1.00, p-value=0.4119.<br />

SxD: F=0.60, p-value=0.7788.<br />

13.5 proc <strong>for</strong>mat;<br />

value bfmt 1 = 'Duracell' 2 = 'Energizer'<br />

3 = 'Rayovac' 4 = 'Radio Shack'; run;<br />

data battery; <strong>in</strong>file 'battery.dat';<br />

<strong>in</strong>put Brand 1 load 4-6 time 9-11;<br />

<strong>for</strong>mat Brand bfmt.; run;<br />

proc glm data = battery; class load brand;<br />

model time = load brand load*brand; run;<br />

No <strong>in</strong>teraction: p = 0.8208;<br />

No effect <strong>in</strong> Brand on time: p = 0.1117;<br />

Significant effect of load on time: P < 0.0001;<br />

13.6 data airplanes; <strong>in</strong>file 'airplanes.dat' delimiter = ',';<br />

<strong>in</strong>put design $ paper $ hang_time; run;<br />

proc glm data = airplanes; class paper design;<br />

model hang_time = paper design paper*design; run;<br />

Significant <strong>in</strong>teraction: p = 0.0149.<br />

13.10 data btt; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put childid 1-4 sex 6 msbp 25-27 mmedaid 31 socio 33; run;<br />

proc glm data = btt; class mmedaid socio;<br />

model msbp = socio mmedaid socio*mmedaid; means socio*mmedaid;<br />

run;<br />

socio: F = 2.28, p-value=0.0623.<br />

mmedaid: F = 6.05, p-value=0.0147.<br />

socio*mmedaid: F = 5.94, p-value=0.0156.<br />

37


MODULE 14: MODEL CHECKING IN ANOVA<br />

14.1 data one; <strong>in</strong>file 'taillite.dat';<br />

<strong>in</strong>put @4 type 1. @7 group 1. @I0 position 1. @13 zone 2.<br />

@17 response 3. @23 follow 2.; run;<br />

a proc glm; class group type;<br />

model response = group type group*type;<br />

output out=new p=yhat student = sresid; run;<br />

goptions csymbol = black htext = 2; symbol1 value = dot;<br />

proc gplot data = new;<br />

plot sresid*yhat / vref = 0; plot sresid*group / vref = 0;<br />

plot sresid*type / vref = 0; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

b proc glm; class group zone;<br />

model response = group zone group*zone;<br />

output out=new p=yhat student = sresid; run;<br />

goptions csymbol = black htext = 2; symbol1 value = dot;<br />

proc gplot data = new;<br />

plot sresid*yhat / vref = 0; plot sresid*group / vref = 0;<br />

plot sresid*zone / vref = 0; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

c proc glm; class group position;<br />

model response = group position group*position;<br />

output out=new p=yhat student = sresid; run;<br />

goptions csymbol = black htext = 2; symbol1 value = dot;<br />

proc gplot data = new;<br />

plot sresid*yhat / vref = 0; plot sresid*group / vref = 0;<br />

plot sresid*position / vref = 0; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

14.2 data one; <strong>in</strong>file 'brownie.dat'; <strong>in</strong>put day pan $ mix $ width; run;<br />

proc glm; class pan mix;<br />

model width = pan mix pan*mix;<br />

output out=new p=yhat student = sresid; run;<br />

goptions csymbol = black htext = 2; symbol1 value = dot;<br />

proc gplot data = new;<br />

plot sresid*yhat / vref = 0;<br />

plot sresid*pan / vref = 0;<br />

plot sresid*mix / vref = 0; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square; run;<br />

38


14.3 data wear; <strong>in</strong>file 'wear.dat';<br />

<strong>in</strong>put grit $ 1-5 cut wear; run;<br />

proc glm;<br />

class grit cut;<br />

model wear = grit cut grit*cut;<br />

output out=new p=yhat student = sresid; run;<br />

goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

proc gplot data = new;<br />

plot sresid*yhat / vref = 0;<br />

plot sresid*grit / vref = 0;<br />

plot sresid*cut / vref = 0; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square; run;<br />

14.4 data calls; <strong>in</strong>file 'calls.dat';<br />

<strong>in</strong>put week shift day $ number; run;<br />

proc glm;<br />

class shift day;<br />

model number = shift day shift*day;<br />

output out=new p=yhat student = sresid; run;<br />

goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

proc gplot data = new;<br />

plot sresid*yhat / vref = 0;<br />

plot sresid*shift / vref = 0;<br />

plot sresid*day / vref = 0; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square; run;<br />

14.5 proc <strong>for</strong>mat;<br />

value bfmt 1 = 'Duracell' 2 = 'Energizer'<br />

3 = 'Rayovac' 4 = 'Radio Shack'; run;<br />

data battery; <strong>in</strong>file 'battery.dat';<br />

<strong>in</strong>put Brand 1 load 4-6 time 9-11;<br />

<strong>for</strong>mat Brand bfmt.; run;<br />

proc glm data = battery;<br />

class load brand;<br />

model time = load brand load*brand;<br />

output out=new p=yhat student = sresid; run;<br />

goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

proc gplot data = new;<br />

plot sresid*yhat / vref = 0;<br />

plot sresid*load / vref = 0;<br />

plot sresid*brand / vref = 0; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square; run;<br />

39


14.10 data btt; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put childid 1-4 sex 6 msbp 25-27 mmedaid 31 socio 33; run;<br />

proc glm data = btt;<br />

class mmedaid socio;<br />

model msbp = socio mmedaid socio*mmedaid;<br />

means socio*mmedaid;<br />

output out=new p=yhat student = sresid; run;<br />

goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

proc gplot data = new;<br />

plot sresid*yhat / vref = 0;<br />

plot sresid*socio / vref = 0;<br />

plot sresid*mmedaid / vref = 0; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square; run;<br />

14.11 proc <strong>for</strong>mat; value sfmt 1 = 'Male' 2 = 'Female'; run;<br />

data btt; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put childid 1-4 sex 6 bweight 8-11 momeduc 29;<br />

<strong>for</strong>mat sex sfmt.; run;<br />

proc glm data = btt;<br />

class momeduc;<br />

model bweight = momeduc;<br />

means momeduc / hovtest = levene;<br />

output out=new p=yhat student = sresid; run;<br />

goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

proc gplot data = new;<br />

plot sresid*yhat / vref = 0;<br />

plot sresid*momeduc / vref = 0; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square; run;<br />

Constant variance assumption OK: p-value = 0.3738.<br />

Normality of residuals OK: p-value = 0.5631.<br />

Plots all look OK.<br />

40


MODULE 15: CORRELATIONS<br />

15.1 data one; <strong>in</strong>file 'electric.dat';<br />

<strong>in</strong>put house <strong>in</strong>come air <strong>in</strong>dex number load; run;<br />

proc corr;<br />

var house number <strong>in</strong>dex <strong>in</strong>come; run;<br />

a No, p-value


15.5 data utils; <strong>in</strong>file 'utility.dat';<br />

<strong>in</strong>put @9 phone 6. @17 fuel 6. @25 elec 5.; run;<br />

proc corr;<br />

var phone fuel elec; run;<br />

Fuel and electricity.<br />

15.6 proc <strong>for</strong>mat; value $sexfmt 'F'='Female' 'M'='Male'; run;<br />

data runn<strong>in</strong>g; <strong>in</strong>file 'runn<strong>in</strong>g.dat';<br />

<strong>in</strong>put class 1 sex $ 3 m<strong>in</strong>1 5 sec1 7-8 m<strong>in</strong>2 10 sec2 12-13;<br />

time1=m<strong>in</strong>1*60+sec1;<br />

time2=m<strong>in</strong>2*60+sec2;<br />

label time1 = 'Time <strong>for</strong> Race 1'<br />

time2 = 'Time <strong>for</strong> Race 2';<br />

<strong>for</strong>mat sex sexfmt.; run;<br />

proc corr data = runn<strong>in</strong>g;<br />

var time1 time2; run;<br />

proc sort data = runn<strong>in</strong>g;<br />

by sex class; run;<br />

proc corr data = runn<strong>in</strong>g;<br />

var time1 time2; by sex class; run;<br />

15.8 data quarterback; <strong>in</strong>file 'quarterback.dat';<br />

<strong>in</strong>put Rank 1-2 Player $ 5-22 Team $ 25-27 Comp 30-32 Att 35-37<br />

Pct 40-43 AttPerGame 46-49 Yds 52-55 Avg 58-60 YdsPerGame 63-67<br />

TD 70-71 Int 74-75 FirstDown 77-80 FirstDownPct 83-86<br />

Over20 89-90 Over40 93-94 Sack 97-98 Rat<strong>in</strong>g 101-105; run;<br />

proc corr data = quarterback;<br />

var rat<strong>in</strong>g comp pct yds <strong>in</strong>t sack; run;<br />

Percent completed has <strong>the</strong> highest correlation with quarterback rat<strong>in</strong>g followed by total<br />

yards and number of completions.<br />

42


MODULE 16: SIMPLE LINEAR REGRESSION<br />

16.1 goptions csymbol = black htext = 2; symbol1 value = dot;<br />

data one; <strong>in</strong>file bonescor.dat';<br />

<strong>in</strong>put <strong>in</strong>dex ccratio csi width score pct; run;<br />

proc reg;<br />

model score = pct;<br />

plot score*pct; run;<br />

a Yhat = 4.845 + .0253x, R 2 =0.0864.<br />

c Bone score and % young normal do not appear to be l<strong>in</strong>early related.<br />

16.2 data one; <strong>in</strong>file 'electric.dat';<br />

<strong>in</strong>put house <strong>in</strong>come air appl<strong>in</strong>dx number peakload; run;<br />

a proc reg;<br />

model peakload = air;<br />

plot peakload*air; run;<br />

Yhat = 2.265 + 0.742x, R 2 =0.8598.<br />

b proc reg;<br />

model peakload = appl<strong>in</strong>dx;<br />

plot peakload*appl<strong>in</strong>dx; run;<br />

Yhat=-0.729 + 0.947x, R 2 =0.7851.<br />

c proc reg;<br />

model peakload = number;<br />

plot peakload*number; run;<br />

Yhat=4.809 - 0.0581x, R 2 =0.0045.<br />

43


16.3 data one; <strong>in</strong>file 'gas.dat';<br />

<strong>in</strong>put disp power torque ratio axle barrel speed clen<br />

cwid cwt trans mileage; run;<br />

a proc reg;<br />

model power = disp;<br />

plot power*disp ; run;<br />

Yhat = 33.5 + 0.362x, R 2 =0.8848.<br />

b proc reg;<br />

model torque = disp;<br />

plot torque*disp ; run;<br />

Yhat = 15.48 + .7085x, R 2 =0.9793.<br />

c proc reg;<br />

model torque= power;<br />

plot torque*power ; run;<br />

Yhat = -27.835 + 1.794x, R 2 =0.9300.<br />

d proc reg;<br />

model mileage = disp;<br />

plot mileage*disp ; run;<br />

Yhat=33.49 - 0.0471x, R 2 =0.7601.<br />

e proc reg;<br />

model mileage = torque;<br />

plot mileage*torque ; run;<br />

Yhat=33.996 - 0.064x, R 2 =0.7214.<br />

f proc reg;<br />

model mileage = power;<br />

plot mileage*power; run;<br />

Yhat=35.35 - 0.112x, R 2 =0.6345.<br />

16.4 data one; <strong>in</strong>file electric.dat';<br />

<strong>in</strong>put house <strong>in</strong>come air appl<strong>in</strong>dx number peakload; run;<br />

a proc reg;<br />

model peakload = air / clm; run;<br />

b proc reg;<br />

model peakload = appl<strong>in</strong>dx / cli; run;<br />

44


16.5 data one; <strong>in</strong>file 'gas.dat';<br />

<strong>in</strong>put disp power torque ratio axle barrel speed clen<br />

cwid cwt trans mileage; run;<br />

a proc reg;<br />

model power = disp / clm; run;<br />

b proc reg;<br />

model torque = disp / cli; run;<br />

c proc reg;<br />

model torque= power / clm; run;<br />

d proc reg;<br />

model mileage = disp / cli; run;<br />

e proc reg;<br />

model mileage = torque / clm; run;<br />

f proc reg;<br />

model mileage = power / cli; run;<br />

16.6 data quarterback; <strong>in</strong>file 'quarterback.dat';<br />

<strong>in</strong>put Rank Player $ 5-22 Team $ 25-27 Comp Att Pct AttPerGame Yds<br />

Avg YdsPerGame TD Int FirstD FirstDP P20 P40 Sck Rate; run;<br />

proc reg data = quarterback;<br />

model rate = pct / clm cli;<br />

plot rate*pct;<br />

title 'Regression model <strong>for</strong> Quarterback Rat<strong>in</strong>g'; run;<br />

a Yhat = -60.92505 + 2.340x, R 2 =0.6331.<br />

c The l<strong>in</strong>e appears to fit <strong>the</strong> trend <strong>in</strong> <strong>the</strong> data very well.<br />

45


MODULE 17: MODEL CHECKING IN REGRESSION<br />

17.1 goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

data one; <strong>in</strong>file 'bonescor.dat' ;<br />

<strong>in</strong>put <strong>in</strong>dex ccratio csi width score pct; run;<br />

proc reg;<br />

model score = pct;<br />

plot score*p.;<br />

plot student.*p.;<br />

plot student.*pct;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square; run;<br />

17.2 data one; <strong>in</strong>file 'electric.dat' ;<br />

<strong>in</strong>put house <strong>in</strong>come air appl<strong>in</strong>dx number peakload; run;<br />

a proc reg data = one;<br />

model peakload = air;<br />

plot peakload*p.; plot student.*p.; plot student.*air;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

There may be some curvature <strong>in</strong> <strong>the</strong> residual plots. A l<strong>in</strong>ear model may not be<br />

appropriate.<br />

b proc reg data = one;<br />

model peakload = appl<strong>in</strong>dx;<br />

plot peakload*p.; plot student.*p.; plot student.*appl<strong>in</strong>dx;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal ; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

Assumptions appear to be satisfied.<br />

c proc reg data = one;<br />

model peakload = number;<br />

plot peakload*p.; plot student.*p.; plot student.*number;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

Assumptions appear to be satisfied.<br />

46


17.3 data one; <strong>in</strong>file 'gas.dat';<br />

<strong>in</strong>put disp power torque ratio axle barrel speed clen<br />

cwid cwt trans mileage; run;<br />

a proc reg data = one; model power = disp;<br />

plot power*p.; plot student.*p.; plot student.*disp;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

Assumptions appear to be satisfied.<br />

b proc reg data = one; model torque = disp;<br />

plot torque*p.; plot student.*p.; plot student.*disp;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

There may be <strong>in</strong>creas<strong>in</strong>g variation.<br />

c proc reg data = one; model torque= power;<br />

plot torque*p.; plot student.*p.; plot student.*power;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

There may be some curvature <strong>in</strong> <strong>the</strong> residuals.<br />

d proc reg data = one; model mileage = disp;<br />

plot mileage*p.; plot student.*p.; plot student.*disp;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

Assumptions appear to be satisfied.<br />

e proc reg data = one; model mileage = torque;<br />

plot mileage*p.; plot student.*p.; plot student.*torque;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

Assumptions appear to be satisfied.<br />

47


f proc reg data = one;<br />

model mileage = power;<br />

plot mileage*p.; plot student.*p.; plot student.*power;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

Assumptions appear to be satisfied.<br />

17.4 data one; <strong>in</strong>file grades.dat';<br />

<strong>in</strong>put id $ sex $ class $ quiz exam1 exam2 lab f<strong>in</strong>alexam; run;<br />

a proc reg data=one;<br />

model f<strong>in</strong>alexam = exam1;<br />

plot f<strong>in</strong>alexam*p.; plot student.*p.; plot student.*exam1;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

Data conta<strong>in</strong>s an outlier.<br />

b proc reg data=one;<br />

model f<strong>in</strong>alexam = exam2;<br />

plot f<strong>in</strong>alexam*p.; plot student.*p.; plot student.*exam2;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

Data conta<strong>in</strong>s an outlier.<br />

c proc reg data=one;<br />

model f<strong>in</strong>alexam = quiz;<br />

plot f<strong>in</strong>alexam*p.; plot student.*p.; plot student.*quiz;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

run;<br />

Assumptions appear to be satisfied.<br />

48


17.5 data quarterback; <strong>in</strong>file 'quarterback.dat';<br />

<strong>in</strong>put Rank Player $ 5-22 Team $ 25-27 Comp Att Pct AttPerGame Yds<br />

Avg YdsPerGame TD Int FirstD FirstDP P20 P40 Sck Rate; run;<br />

proc reg data = quarterback;<br />

model rate = pct;<br />

plot rate*p.;<br />

plot student.*p.;<br />

plot student.*pct;<br />

output out = new p = yhat r = resid student = sresid; run;<br />

proc univariate data = new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square; run;<br />

There may be <strong>in</strong>creas<strong>in</strong>g variation.<br />

49


MODULE 18: MULTIPLE LINEAR REGRESSION<br />

18.1 goptions csymbol = black htext = 2; symbol1 value = dot;<br />

data one; <strong>in</strong>file 'gas.dat';<br />

<strong>in</strong>put @45 mileage 4. @7 power 3. @38 car_wt 4. @11 torque 3. @1<br />

disp 5.; run;<br />

a proc reg; model mileage = power car_wt torque;<br />

output out=new1 p=yhat student=resid; run;<br />

proc gplot data=new1;<br />

plot mileage*yhat; title 'Mileage vs. yhat';<br />

plot resid*yhat / vref = 0; title 'Resid vs. yhat';<br />

plot resid*power / vref = 0; title 'Resid vs. xl';<br />

plot resid*car_wt / vref = 0; title 'Resid vs. x2';<br />

plot resid*torque / vref = 0; title 'Resid vs. x3'; run;<br />

proc univariate data=new1 normal; var resid;<br />

probplot resid / normal (mu = est sigma = est) square;<br />

title 'Probability plot of residuals'; run ;<br />

b proc reg data=one; model mileage = power disp torque;<br />

output out=new2 p=yhat2 student=resid2; run;<br />

proc gplot data=new2;<br />

plot mileage*yhat2; title 'Mileage vs. yhat';<br />

plot resid2*yhat2 / vref = 0; title 'Resid vs. yhat';<br />

plot resid2*power / vref = 0; title 'Resid vs. xl';<br />

plot resid2*disp / vref = 0; title 'Resid vs. x2';<br />

plot resid2*torque / vref = 0; title 'Resid vs. x3'; run;<br />

proc univariate data = new2 normal; var resid2;<br />

probplot resid2 / normal (mu = est sigma = est) square;<br />

title 'Probability plot of residuals'; run ;<br />

18.2 data one; <strong>in</strong>file 'grades.dat';<br />

<strong>in</strong>put id $ sex $ class $ quiz exam1 exam2 lab f<strong>in</strong>al; run;<br />

proc reg; model f<strong>in</strong>al = quiz exam1 exam2 lab;<br />

plot f<strong>in</strong>al*p.; plot student.*p.;<br />

plot student.*quiz; plot student.*exam1;<br />

plot student.*exam2; plot student.*lab;<br />

output out=new p=yhat student=sresid;<br />

title 'Multiple Regression Model and Model Check<strong>in</strong>g Plots'; run;<br />

There appears to be a low outlier.<br />

proc univariate data=new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

title 'Probability plot of residuals'; run;<br />

With a p-value of 0.0239 <strong>for</strong> <strong>the</strong> normality test, <strong>the</strong> residuals may be nonnormal.<br />

50


18.3 data one; <strong>in</strong>file 'electric.dat';<br />

<strong>in</strong>put house <strong>in</strong>come air_cond <strong>in</strong>dex fam_num peakload; run;<br />

proc reg; model peakload = house <strong>in</strong>come air_cond fam_num;<br />

plot peakload*p.; plot student.*p.;<br />

plot student.*house; plot student.*<strong>in</strong>come;<br />

plot student.*air_cond; plot student.*fam_num;<br />

output out = new p=yhat student=sresid;<br />

title 'Multiple Regression Model and Model Check<strong>in</strong>g Plots'; run;<br />

proc univariate data=new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

title 'Probability plot of residuals'; run;<br />

Number <strong>in</strong> family: t=1.401, p-value=0.1668. Family number is not needed <strong>in</strong> <strong>the</strong> model.<br />

The model assumptions appear valid.<br />

18.4 data prod; <strong>in</strong>put prod temp light;<br />

templight=temp*light;<br />

datal<strong>in</strong>es;<br />

45 64 60<br />

49 64 65<br />

47 66 60<br />

57 66 65<br />

48 68 60<br />

53 68 65<br />

51 70 60<br />

54 70 65<br />

56 72 60<br />

64 72 65<br />

;<br />

run;<br />

proc reg data = prod;<br />

model prod = temp light templight; run;<br />

Interaction not significant – elim<strong>in</strong>ate.<br />

proc reg data = prod; model prod = temp light;<br />

plot prod*p.;<br />

plot student.*p.;<br />

plot student.*temp;<br />

plot student.*light;<br />

output out = new p=yhat student=sresid; run;<br />

proc univariate data=new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square;<br />

title 'Probability plot of residuals'; run;<br />

R 2 = 79.92%, model p-value = 0.0163;<br />

Residuals are normal: p-value = 0.8184;<br />

Plots: may be <strong>in</strong>creas<strong>in</strong>g variation, curvature vs. temp, and <strong>in</strong>creas<strong>in</strong>g variability with<br />

light.<br />

51


18.7 goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

data quarterback; <strong>in</strong>file 'quarterback.dat';<br />

<strong>in</strong>put Rank 1-2 Player $ 5-22 Team $ 25-27 Comp 30-32 Att 35-37<br />

Pct 40-43 AttPerGame 46-49 Yds 52-55 Avg 58-60 YdsPerGame 63-67<br />

TD 70-71 Int 74-75 FirstDown 77-80 FirstDownPct 83-86<br />

Over20 89-90 Over40 93-94 Sack 97-98 Rat<strong>in</strong>g 101-105; run;<br />

proc reg data = quarterback;<br />

model rat<strong>in</strong>g = comp pct yds <strong>in</strong>t sack;<br />

plot rat<strong>in</strong>g*p.;<br />

plot student.*p.;<br />

plot student.*comp;<br />

plot student.*pct;<br />

plot student.*yds;<br />

plot student.*<strong>in</strong>t;<br />

plot student.*sack;<br />

output out=new p=yhat student=sresid; run;<br />

proc univariate data=new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square; run;<br />

R 2 = 0.9433, model p-value < 0.0001.<br />

Sack is not needed <strong>in</strong> <strong>the</strong> model: p-value = 0.4234.<br />

Residuals not quite normal: p-value = 0.0137.<br />

O<strong>the</strong>rwise residual plots look OK.<br />

52


MODULE 19: MULTIPLE REGRESSION<br />

CHOOSING THE BEST MODEL<br />

19.1 data one; <strong>in</strong>file 'gas.dat';<br />

<strong>in</strong>put disp power torque ratio axle barrels speeds car_ln car_wd<br />

car_wt trans mileage; run;<br />

proc reg;<br />

model mileage = disp power torque speeds car_wt car_ln /<br />

<strong>in</strong>fluence coll<strong>in</strong> spec ; run;<br />

19.2 data one; <strong>in</strong>file 'gas.dat';<br />

<strong>in</strong>put disp power torque ratio axle barrels speeds car_ln car_wd<br />

car_wt trans mileage; run;<br />

a proc reg;<br />

model mileage = disp power torque speeds car_wt car_ln<br />

/ selection=stepwise; run;<br />

b proc reg;<br />

model mileage = disp power torque speeds car_wt car_ln<br />

/ selection=backward; run;<br />

c proc reg;<br />

model mileage = disp power torque speeds car_wt car_ln<br />

/ selection=<strong>for</strong>ward; run;<br />

d proc reg;<br />

model mileage = disp power torque speeds car_wt car_ln<br />

/ selection=rsquare cp adjrsq mse; run;<br />

19.5 data grades; <strong>in</strong>file 'grades.dat';<br />

<strong>in</strong>put quiz 9-10 examl 12-14 exam2 16-18 lab 20-22 f<strong>in</strong>al 25-27;<br />

run;<br />

proc reg;<br />

model f<strong>in</strong>al = examl exam2 quiz lab / <strong>in</strong>fluence coll<strong>in</strong> spec ; run;<br />

53


19.6 data grades; <strong>in</strong>file grades.dat';<br />

<strong>in</strong>put quiz 9-10 examl 12-14 exam2 16-18 lab 20-22 f<strong>in</strong>al 25-27;<br />

run;<br />

a proc reg; model f<strong>in</strong>al = examl exam2 quiz lab<br />

/ selection = stepwise; run;<br />

b proc reg; model f<strong>in</strong>al = examl exam2 quiz lab<br />

/ selection = backward; run;<br />

c proc reg; model f<strong>in</strong>al = examl exam2 quiz lab<br />

/ selection = <strong>for</strong>ward; run;<br />

d proc reg; model f<strong>in</strong>al = examl exam2 quiz lab<br />

/ selection = rsquare cp adjrsq mse; run;<br />

19.8 data pharmacy; <strong>in</strong>file 'pharmacy.dat';<br />

<strong>in</strong>put pharmacy 1-2 volume 5-6 floor_space 9-12 rx_space 15-16<br />

park<strong>in</strong>g 19-20 shop_center 23 <strong>in</strong>come 26-27;<br />

<strong>for</strong>mat shop_center scfmt.;<br />

sc_fs=shop_center*floor_space; sc_rxs=shop_center*rx_space;<br />

sc_p=shop_center*park<strong>in</strong>g; sc_i=shop_center*<strong>in</strong>come; run;<br />

* Model allow<strong>in</strong>g <strong>for</strong> <strong>in</strong>teractions between shopp<strong>in</strong>g center<br />

and o<strong>the</strong>r variables;<br />

proc reg data = pharmacy;<br />

model volume = floor_space rx_space park<strong>in</strong>g shop_center <strong>in</strong>come<br />

sc_fs sc_rxs sc_p sc_i; run;<br />

All <strong>in</strong>teractions are not significant.<br />

After sequentially elim<strong>in</strong>at<strong>in</strong>g non-significant terms we obta<strong>in</strong> <strong>the</strong> f<strong>in</strong>al model.<br />

goptions csymbol = black htext = 2;<br />

symbol1 value = dot; symbol2 value = square;<br />

* F<strong>in</strong>al model;<br />

proc reg data = pharmacy;<br />

model volume = floor_space rx_space; plot volume*p.;<br />

output out=new p=yhat student=sresid; run;<br />

proc univariate data=new normal; var sresid;<br />

probplot sresid / normal (mu = est sigma = est) square; run;<br />

Normality of residuals OK.<br />

proc gplot data = new;<br />

plot sresid*yhat=shop_center / vref = 0;<br />

plot sresid*floor_space=shop_center / vref = 0;<br />

plot sresid*rx_space=shop_center / vref = 0; run;<br />

Residual plots all look OK.<br />

54


19.12 data quarterback; <strong>in</strong>file 'quarterback.dat';<br />

<strong>in</strong>put Rank 1-2 Player $ 5-22 Team $ 25-27 Comp 30-32 Att 35-37<br />

Pct 40-43 AttPerGame 46-49 Yds 52-55 Avg 58-60 YdsPerGame 63-67<br />

TD 70-71 Int 74-75 FirstDown 77-80 FirstDownPct 83-86<br />

Over20 89-90 Over40 93-94 Sack 97-98 Rat<strong>in</strong>g 101-105; run;<br />

a proc reg data = quarterback;<br />

model rat<strong>in</strong>g = Comp Att Pct AttPerGame Yds Avg YdsPerGame<br />

TD Int FirstDown FirstDownPct Over20 Over40 Sack<br />

/ selection = stepwise; run;<br />

Us<strong>in</strong>g defaults, variables <strong>in</strong> f<strong>in</strong>al model: Avg, TD, Int, Pct, Att, Comp.<br />

b proc reg data = quarterback;<br />

model rat<strong>in</strong>g = Comp Att Pct AttPerGame Yds Avg YdsPerGame<br />

TD Int FirstDown FirstDownPct Over20 Over40 Sack<br />

/ selection = backwards; run;<br />

Us<strong>in</strong>g defaults, variables <strong>in</strong> f<strong>in</strong>al model: Pct, AttPerGame, Yds, Avg, YdsPerGame, TD,<br />

Int, FirstDown, FirstDownPct, Over40.<br />

c proc reg data = quarterback;<br />

model rat<strong>in</strong>g = Comp Att Pct AttPerGame Yds Avg YdsPerGame<br />

TD Int FirstDown FirstDownPct Over20 Over40 Sack<br />

/ selection = <strong>for</strong>ward; run;<br />

Us<strong>in</strong>g defaults, variables <strong>in</strong> f<strong>in</strong>al model: Avg, TD, Int, Pct, Over40, Att, Comp,<br />

YdsPerGame, AttPerGame, Yds.<br />

55


MODULE 20: TESTS FOR CATEGORICAL DATA<br />

20.1 data debate; <strong>in</strong>file 'debate.dat';<br />

<strong>in</strong>put id school gender compare argue research reason speak ;<br />

if school = 3 or school = 5 or school = 6 or school = 8;<br />

if research = 2 or research = 3 <strong>the</strong>n research = 4; run;<br />

proc freq data=debate;<br />

tables (research reason speak argue)*school / chisq expected;<br />

run;<br />

20.2 Use data step from 20.1 and <strong>the</strong>n<br />

data skyl<strong>in</strong>e; set debate; if school = 8; run;<br />

proc freq data=skyl<strong>in</strong>e;<br />

tables gender*(compare argue research reason speak)<br />

/ chisq expected; run;<br />

20.3 proc <strong>for</strong>mat; value gfmt 1 = 'Female' 5 = 'Male'; run;<br />

data src; <strong>in</strong>file 'src.dat';<br />

<strong>in</strong>put id gender environ quality air health plants<br />

jobslost pop jobs hours <strong>in</strong>come age party libcon;<br />

if environ =1 or environ =2 or environ =3 <strong>the</strong>n env =1;<br />

else if 4


20.5 proc <strong>for</strong>mat;<br />

value fsfmt 0 = 'Student' 1 = 'Faculty/Staff';<br />

value yn 1 = 'Yes' 2 = 'No';<br />

value yndk 1 = 'Yes' 2 = 'No' 3 = 'Dont Know';<br />

value statfmt 1='OnCampus' 2='OffCampus' 3='On+Off' 4='DontWork';<br />

value perfmt 1 = 'No' 2 = 'Yearly' 3 = 'Quarterly';<br />

value usrn 1='Usually' 2='Sometimes' 3='Rarely' 4='Never'; run;<br />

data park; <strong>in</strong>file 'park<strong>in</strong>g.dat';<br />

<strong>in</strong>put id miles bus_convenient carpool years status bus Monday<br />

Tuesday Wednesday Thursday Friday drive permit meters lots;<br />

if id = 400 <strong>the</strong>n fac_staff = 0;<br />

if bus_convenient = 99 <strong>the</strong>n bus_convenient = .;<br />

if id = . <strong>the</strong>n fac_staff = .;<br />

if miles = 99 <strong>the</strong>n miles = .;<br />

if carpool = 99 <strong>the</strong>n carpool = .;<br />

if years = 99 <strong>the</strong>n years = .;<br />

if status = 99 <strong>the</strong>n status = .;<br />

if bus = 99 <strong>the</strong>n bus = .;<br />

if Monday = 99 <strong>the</strong>n Monday = .;<br />

if Tuesday = 99 <strong>the</strong>n Tuesday = .;<br />

if Wednesday=99 <strong>the</strong>n Wednesday = .;<br />

if Thursday = 99 <strong>the</strong>n Thursday = .;<br />

if Friday = 99 <strong>the</strong>n Friday = .;<br />

if drive = 99 <strong>the</strong>n drive = .;<br />

if permit = 99 <strong>the</strong>n permit = .;<br />

if meters = 99 <strong>the</strong>n meters = .;<br />

if lots = 99 <strong>the</strong>n lots = .;<br />

<strong>for</strong>mat fac_staff fsfmt. bus yn. bus_convenient yndk.<br />

status statfmt. permit perfmt. meters usrn. lots usrn.; run;<br />

proc freq data = park;<br />

b table fac_staff*bus_convenient/chisq expected cellchi2;<br />

d table fac_staff*meters/chisq expected cellchi2;<br />

f table bus*permit/chisq expected cellchi2; run;<br />

proc sort data =park; by fac_staff; run;<br />

proc freq data =park;<br />

table bus*permit/chisq expected cellchi2;<br />

by fac_staff; run;<br />

57


20.6 proc <strong>for</strong>mat;<br />

value mfmt 1 = 'Never' 2 = 'Occasional' 3 = 'Regular';<br />

value pfmt 1 = 'Nei<strong>the</strong>r' 2 = 'One' 3 = 'Both'; run;<br />

data a;<br />

<strong>in</strong>put s_marijuana p_alc_drug count;<br />

datal<strong>in</strong>es;<br />

1 1 141<br />

1 2 68<br />

1 3 17<br />

2 1 54<br />

2 2 44<br />

2 3 11<br />

3 1 40<br />

3 2 51<br />

3 3 19<br />

; run;<br />

proc freq data = a;<br />

table p_alc_drug*s_marijuana / chisq expected cellchi2;<br />

weight count;<br />

<strong>for</strong>mat s_marijuana mfmt. p_alc_drug pfmt.; run;<br />

20.7 proc <strong>for</strong>mat;<br />

value agefmt 1 = '15-54' 2 = '55-64' 3 = '65-74' 4 = 'Over 74';<br />

value locfmt 1 = 'Home' 2 = 'Acute-Care' 3 = 'Chronic-Care';<br />

run;<br />

data a;<br />

<strong>in</strong>put Age Location Count;<br />

datal<strong>in</strong>es;<br />

1 1 94<br />

1 2 418<br />

1 3 23<br />

2 1 116<br />

2 2 524<br />

2 3 34<br />

3 1 156<br />

3 2 581<br />

3 3 109<br />

4 1 138<br />

4 2 558<br />

4 3 238<br />

; run;<br />

proc freq data = a;<br />

table age*location / chisq expected cellchi2;<br />

weight count;<br />

<strong>for</strong>mat age agefmt. location locfmt.; run;<br />

58


20.8 proc <strong>for</strong>mat;<br />

value sfmt 1 = 'Male' 2 = 'Female'; run;<br />

data btt; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put childid 1-4 sex 6 momeduc 29 mmedaid 31 socio 33 smoke5 69<br />

medaid5 71 socio5 73;<br />

<strong>for</strong>mat sex sfmt.; run;<br />

proc freq data = btt;<br />

a tables sex / chisq testp = (0.5, 0.5);<br />

b tables momeduc*socio / chisq expected;<br />

c tables smoke5*socio5 / chisq expected;<br />

d tables medaid5*socio5 / chisq expected; run;<br />

Note: In (b), (c), and (d) <strong>the</strong>re are many low expected frequencies. Comb<strong>in</strong><strong>in</strong>g <strong>the</strong><br />

socioeconomic categories 3 and 4 <strong>in</strong>to a s<strong>in</strong>gle category may help. Fisher’s exact test is a<br />

better solution but <strong>the</strong> computations can take a long time if <strong>the</strong> sample size is large.<br />

59


MODULE 21: NON-PARAMETRIC TESTS<br />

21.1 data one; <strong>in</strong>file 'taillite.dat';<br />

<strong>in</strong>put id type group position zone resptime folltime; run;<br />

a proc npar1way wilcoxon data = one;<br />

where zone = 30;<br />

class type; var resptime; run;<br />

p-value


21.4 proc <strong>for</strong>mat; value lsfmt 1 = 'Athletic' 2 = 'Senentary'; run;<br />

data athlete;<br />

<strong>in</strong>file 'athlete.dat'; <strong>in</strong>put sbp 1-3 dbp 6-7 sex $ 10 ls 13;<br />

label sbp = 'Systolic Blood Pressure'<br />

dbp = 'Diastolic Blood Pressure'<br />

ls = 'Lifestyle';<br />

<strong>for</strong>mat ls lsfmt.; run;<br />

proc npar1way wilcoxon data = athlete;<br />

class sex; var sbp dbp; run;<br />

SBP: p-value = 0.0366. SBP significantly different between males and females;<br />

DBP: p-value < 0.0001. DBP significantly different between males and females;<br />

proc sort data = athlete;<br />

by sex;<br />

run;<br />

* Check normality assumption that would be needed <strong>for</strong> t-test;<br />

proc univariate normal data = athlete;<br />

var sbp dbp; by sex;<br />

SBP: Shapiro-Wilks p-values: Female = 0.0455, Male = 0.0170 – SBP is not quite<br />

normal.<br />

DBP: Shapiro-Wilks p-values: Female = 0.5903, Male = 0.4913 - DBP is normal.<br />

21.6 data btt; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put childid 1-4 bweight 8-11 momeduc 29; run;<br />

proc npar1way wilcoxon anova data = btt;<br />

class momeduc; var bweight; run;<br />

Kruskal Wallis p-value = 0.0881.<br />

ANOVA p-value = 0.0931.<br />

61


MODULE 22: ANALYSIS OF COVARIANCE<br />

22.1 data one; <strong>in</strong>file 'gas.dat';<br />

<strong>in</strong>put @45 mileage 4. @43 trans $1. @25 speeds $1. @38 car_wt 4.<br />

@11 torque 3.; run;<br />

a proc glm; class trans speeds;<br />

model mileage = trans speeds car_wt / solution; run;<br />

b proc glm; class trans speeds;<br />

model mileage = trans speeds torque / solution; run;<br />

22.2 data two; <strong>in</strong>file 'dummy.dat';<br />

<strong>in</strong>put species $ 1 impactor $ 3-5 stiff1 stiff2 calcium magnesium;<br />

run;<br />

a proc glm; class species impactor;<br />

model stiff1 = species impactor calcium / solution; run ;<br />

b proc glm; class species impactor;<br />

model stiff1 = species impactor magnesium; run;<br />

22.4 proc <strong>for</strong>mat; value sfmt 1 = 'Male' 2 = 'Female'; run;<br />

data btt; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put childid 1-4 sex 6 bweight 8-11 gestage 13-14 mmedaid 31;<br />

<strong>for</strong>mat sex sfmt.; run;<br />

proc glm data = btt;<br />

class sex momeduc mmedaid;<br />

model bweight = sex mmedaid sex*mmedaid gestage; run;<br />

62


MODULE 23: LOGISTIC REGRESSION<br />

23.1 proc <strong>for</strong>mat;<br />

value fsfmt 0 = 'Student' 1 = 'Faculty/Staff';<br />

value yn 1= 'Yes' 2 = 'No';<br />

value yndk 1= 'Yes' 2 = 'No' 3 = 'Dont Know'; run;<br />

data park; <strong>in</strong>file 'park<strong>in</strong>g.dat';<br />

<strong>in</strong>put id miles bus_convenient carpool years status bus;<br />

if id = 400 <strong>the</strong>n fac_staff = 0;<br />

if carpool = 99 <strong>the</strong>n carpool = .;<br />

if years = 99 <strong>the</strong>n years = .;<br />

if bus = 99 <strong>the</strong>n bus = .;<br />

if bus_convenient = 99 <strong>the</strong>n bus_convenient = .;<br />

<strong>for</strong>mat fac_staff fsfmt. bus yn. bus_convenient yndk.; run;<br />

a proc logist data = park;<br />

model bus_convenient = fac_staff years; run;<br />

b proc logist data = park;<br />

model bus = fac_staff years; run;<br />

c proc logist data = park;<br />

model carpool = fac_staff years; run;<br />

23.4 proc <strong>for</strong>mat;<br />

value sfmt 1 = 'Male' 2 = 'Female'; run;<br />

data btt; <strong>in</strong>file 'btt.dat';<br />

<strong>in</strong>put childid 1-4 sex 6 bweight 8-11 gestage 13-14 momage 16-17<br />

parity 19 mdbp 21-23 msbp 25-27 mmedaid 31;<br />

<strong>for</strong>mat sex sfmt.; run;<br />

a proc logist data = btt;<br />

model sex = bweight gestage parity; run;<br />

proc logist data = btt;<br />

model sex = parity; run;<br />

b proc logist data = btt;<br />

model mmedaid = bweight gestage momage parity mdbp msbp;<br />

run;<br />

proc logist data = btt;<br />

model mmedaid = momage; run;<br />

63


MODULE 24: MATRIX COMPUTATIONS<br />

24.1 to 24.3 require <strong>the</strong> follow<strong>in</strong>g <strong>in</strong>itial creation of matrices A, B, and C.<br />

proc iml;<br />

A = { 2 1 0 3, -1 0 2 4, 4 -2 7 0};<br />

B = {-4 3 5 1, 2 2 1 -1, 3 2 -4 5};<br />

C = {5, 4, 8};<br />

pr<strong>in</strong>t A B C;<br />

24.1 a D = A+B;<br />

b E = A-B;<br />

c F = A#B;<br />

d G = A/B;<br />

pr<strong>in</strong>t D E F G;<br />

24.2 a H = A//B;<br />

b I = A||B;<br />

c J = A(|,3|);<br />

d K = B(|2,|);<br />

e L = B(|1:2,3:4|);<br />

pr<strong>in</strong>t H I J K L;<br />

24.3 a M=T(B);<br />

b D=A*t(B);<br />

c N=det(D);<br />

d O=trace(D);<br />

e P = diag(D); * Note diag produces a diagonal matrix;<br />

Q = vecdiag(D);<br />

f R = solve(D, C);<br />

pr<strong>in</strong>t M D, N O, P Q R;<br />

quit;<br />

64


24.4 data a; <strong>in</strong>put x1 x2 x3;<br />

datal<strong>in</strong>es;<br />

1 4 0.2<br />

1 5 0.2<br />

1 6 0.2<br />

1 7 0.2<br />

1 4 0.3<br />

1 5 0.3<br />

1 6 0.3<br />

1 7 0.3<br />

1 4 0.4<br />

1 5 0.4<br />

1 6 0.4<br />

1 7 0.4<br />

run;<br />

proc iml;<br />

use A; * To make data set A available with<strong>in</strong> proc iml;<br />

read all var {x1 x2 x3} <strong>in</strong>to X;<br />

Y = {4.3, 5.5, 6.8, 8.0, 4.0, 5.2, 6.6, 7.5, 2.0, 4.0, 5.7, 6.5};<br />

I12=I(12);<br />

J12=J(12, 12, 1);<br />

pr<strong>in</strong>t X Y I12 J12;<br />

a B=<strong>in</strong>v(X`*X)*X`*Y;<br />

b A=X*B;<br />

c C=Y`*Y-Y`*J12*Y/12;<br />

d D=Y`*Y-B`*X`*Y;<br />

e E=Y-X*B;<br />

f F=C-D;<br />

g G=D/9;<br />

h H=X*<strong>in</strong>v(X`*X)*X`;<br />

k K=Y`*(I12-H)*Y;<br />

l L=Y`*(H-J12/12)*Y;<br />

m M=G*<strong>in</strong>v(X`*X);<br />

n N=sqrt(diag(M));<br />

o O=(I12-H)*Y;<br />

pr<strong>in</strong>t B A C E, D F G, H, K L M N O;<br />

* Create a <strong>SAS</strong> data set conta<strong>in</strong><strong>in</strong>g a matrix <strong>for</strong> use <strong>in</strong> 24.5;<br />

create ydata from y[colname={y}]; append from y;<br />

quit;<br />

24.5 data reg;<br />

merge a ydata; run;<br />

proc reg data = reg;<br />

model y = x2 x3 / p r <strong>in</strong>fluence; run;<br />

65


MODULE 25: MACRO VARIABLES AND PROGRAMS<br />

25.1 goptions csymbol = black htext = 2;<br />

proc <strong>for</strong>mat;<br />

value lsfmt 1 = "Athletic" 2 = "Sedentary";<br />

value $sfmt 'M' = 'Male' 'F'= 'Female'; run;<br />

data athlete; <strong>in</strong>file 'athlete.dat';<br />

<strong>in</strong>put sbp 1-3 dbp 6-7 sex $ 10 ls 13;<br />

label sbp = 'Systolic Blood Pressure'<br />

dbp = 'Diastolic Blood Pressure'<br />

ls = 'Lifestyle';<br />

<strong>for</strong>mat ls lsfmt. sex $sfmt.; run;<br />

%macro boxt(data, y, x);<br />

proc sort data = &data; by &x; run;<br />

(i) proc boxplot data = &data;<br />

plot &y*&x / boxstyle=schematic; run;<br />

(ii) proc ttest data = &data;<br />

class &x; var &y; run;<br />

%mend boxt;<br />

a %boxt(athlete, sbp, sex);<br />

b %boxt(athlete, dbp, sex);<br />

c %boxt(athlete, sbp, ls);<br />

d %boxt(athlete, dbp, ls);<br />

66


25.2 goptions csymbol = black htext = 2;<br />

symbol1 value = dot;<br />

symbol2 value = square;<br />

data elec; <strong>in</strong>file 'electric.dat';<br />

<strong>in</strong>put hs 1-3 fi 6-11 acc 14-16 ai 19-23 fm 26-28 phl 31-35;<br />

label hs = 'House Size'<br />

fi = 'Family Income'<br />

acc = 'Air Condition<strong>in</strong>g Capacity'<br />

phl = 'Peak Hour Load'; run;<br />

%MACRO simplereg(data, yvar, xvar);<br />

(i) proc gplot data = &data;<br />

plot &yvar * &xvar;<br />

title "Plot of &yvar vs. &xvar";<br />

run;<br />

(ii) proc corr data = &data;<br />

var &yvar &xvar;<br />

title "Correlation of &yvar vs. &xvar";<br />

run;<br />

(iii) proc reg data = &data;<br />

model &yvar = &xvar;<br />

(iv) plot &yvar * p. p.*p. / overlay;<br />

plot student.*p.; plot student.* &xvar;<br />

title "Regression of &yvar vs. &xvar and model-check<strong>in</strong>g<br />

plots";<br />

run;<br />

%MEND simplereg;<br />

a %simplereg(elec, phl, hs);<br />

b %simplereg(elec, phl, fi);<br />

c %simplereg(elec, phl, acc);<br />

67

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!