23.07.2013 Views

Basic data operations in SPSS - ISS - University of Leeds

Basic data operations in SPSS - ISS - University of Leeds

Basic data operations in SPSS - ISS - University of Leeds

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Information Systems Services<br />

<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for<br />

W<strong>in</strong>dows 17<br />

TUT 113<br />

Version 6 (November 2009)


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Contents<br />

1 Introduction ................................................................................................................................... 4<br />

1.1 About <strong>data</strong> management ........................................................................................................ 4<br />

1.2 Requirements ........................................................................................................................ 4<br />

1.3 Documentation ...................................................................................................................... 4<br />

1.4 Gett<strong>in</strong>g Started ...................................................................................................................... 4<br />

Task 1 Download example <strong>data</strong> sets ........................................................................................ 4<br />

1.5 The <strong>SPSS</strong> Command Language ............................................................................................ 5<br />

1.6 The Syntax W<strong>in</strong>dow ............................................................................................................... 5<br />

Task 2 Us<strong>in</strong>g the Syntax W<strong>in</strong>dow ............................................................................................ 5<br />

1.7 The Syntax Reference Guide ................................................................................................. 8<br />

Task 3: Us<strong>in</strong>g the <strong>SPSS</strong> Syntax Reference Guide ..................................................................... 8<br />

2 Data Transformations ................................................................................................................... 9<br />

2.1 Creat<strong>in</strong>g New Variables ......................................................................................................... 9<br />

Task 4 Us<strong>in</strong>g Compute to create a new variable ...................................................................... 9<br />

2.2 Comput<strong>in</strong>g Counts Us<strong>in</strong>g COUNT........................................................................................ 12<br />

Task 5 Comput<strong>in</strong>g a Count ..................................................................................................... 12<br />

2.3 Recod<strong>in</strong>g variables .............................................................................................................. 15<br />

Task 6 Recod<strong>in</strong>g <strong>in</strong>to the same variable ................................................................................. 15<br />

Task 7 Recod<strong>in</strong>g <strong>in</strong>to a new variable ..................................................................................... 17<br />

Task 8 Perform<strong>in</strong>g an automatic recode ................................................................................. 21<br />

2.4 Conditional transformations ................................................................................................. 23<br />

Task 9 Perform<strong>in</strong>g a conditional compute ............................................................................. 23<br />

3 Work<strong>in</strong>g with subsets <strong>of</strong> <strong>data</strong> ...................................................................................................... 26<br />

Task 10 Select<strong>in</strong>g a subset <strong>of</strong> cases .......................................................................................... 26<br />

Task 11 Delet<strong>in</strong>g selected cases ............................................................................................... 28<br />

Task 12 Sub-group process<strong>in</strong>g ................................................................................................. 29<br />

4 Merg<strong>in</strong>g <strong>SPSS</strong> <strong>data</strong> sets ............................................................................................................. 31<br />

Task 13 Add<strong>in</strong>g cases ............................................................................................................... 31<br />

Task 14 Add<strong>in</strong>g variables ......................................................................................................... 34<br />

Task 15 F<strong>in</strong>ish<strong>in</strong>g <strong>SPSS</strong> ........................................................................................................... 36<br />

Information Systems Services Page 2 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Format conventions<br />

In this document the follow<strong>in</strong>g format conventions are used:<br />

Feedback<br />

Commands that you must type <strong>in</strong> and menu<br />

items are shown <strong>in</strong> bold.<br />

Keys that you press and options that you select<br />

are enclosed <strong>in</strong> angle brackets.<br />

Name<br />

<br />

If you notice any mistakes <strong>in</strong> this document please contact the Information Officer. Email should be sent<br />

to the address <strong>in</strong>fo-<strong>of</strong>ficer@leeds.ac.uk.<br />

Copyright<br />

This document is copyright <strong>University</strong> <strong>of</strong> <strong>Leeds</strong>. Permission to use material <strong>in</strong> this document should be<br />

obta<strong>in</strong>ed from the Information Officer (email should be sent to the address <strong>in</strong>fo-<strong>of</strong>ficer@leeds.ac.uk).<br />

Information Systems Services Page 3 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

1 Introduction<br />

1.1 About <strong>data</strong> management<br />

The first step <strong>in</strong> perform<strong>in</strong>g a statistical analysis with <strong>SPSS</strong> is usually to def<strong>in</strong>e your <strong>data</strong> and enter your <strong>data</strong><br />

<strong>in</strong>to <strong>SPSS</strong>. If you are lucky, this is all that will be required before you are able to carry out your analysis. More<br />

commonly, however, you may f<strong>in</strong>d it necessary to manipulate the <strong>data</strong> before you can perform the required<br />

analysis.<br />

For example, it may be necessary to generate new variables from exist<strong>in</strong>g variables, or you may need to group<br />

the values <strong>of</strong> a variable <strong>in</strong>to a small number <strong>of</strong> groups to facilitate the creation <strong>of</strong> simple summary statistics or<br />

simple graphics.<br />

You may also f<strong>in</strong>d that the structure <strong>of</strong> your orig<strong>in</strong>al <strong>data</strong> does not conform to that required by some <strong>of</strong> the<br />

statistical or graphical procedures <strong>in</strong> <strong>SPSS</strong> and that the <strong>data</strong> needs to be re-shaped prior to analysis.<br />

This document will take you through some <strong>of</strong> the basic tasks <strong>of</strong> <strong>data</strong> manipulation that you may need to<br />

accomplish prior to perform<strong>in</strong>g statistical analyses <strong>in</strong> <strong>SPSS</strong>. The tasks have been designed <strong>in</strong> such a way that<br />

you are advised to complete a task and its exercises before proceed<strong>in</strong>g to the next one.<br />

1.2 Requirements<br />

It is assumed that you already know how to log<strong>in</strong> to the network and run the Micros<strong>of</strong>t W<strong>in</strong>dows operat<strong>in</strong>g<br />

system. It is also assumed that you know how to run <strong>SPSS</strong> for W<strong>in</strong>dows from the W<strong>in</strong>dows desktop, create a<br />

new <strong>data</strong> sheet, enter text <strong>in</strong>to a cell, run a simple analysis and pr<strong>in</strong>t it out. If you do not yet know how to<br />

achieve these <strong>operations</strong> it will be necessary for you to read and work through the follow<strong>in</strong>g document:<br />

Gett<strong>in</strong>g started with <strong>SPSS</strong> for W<strong>in</strong>dows (BEG 14)<br />

1.3 Documentation<br />

If you require further <strong>in</strong>formation on the facilities <strong>in</strong> <strong>SPSS</strong> the follow<strong>in</strong>g documents:<br />

Advanced <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows (TUT 114)<br />

Simple statistical analysis <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows (TUT 115)<br />

Advanced statistical analysis <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows (TUT 116)<br />

All the documents referred to above are available as PDF files for pr<strong>in</strong>t<strong>in</strong>g from the <strong>ISS</strong> web site:<br />

http://iss.leeds.ac.uk/downloads/303/statistical_analysis<br />

References are made <strong>in</strong> this document to the <strong>SPSS</strong> Syntax Reference Guide. This is now available onl<strong>in</strong>e as<br />

part <strong>of</strong> the <strong>SPSS</strong> Help system.<br />

1.4 Gett<strong>in</strong>g Started<br />

A variety <strong>of</strong> <strong>SPSS</strong> <strong>data</strong> sets will be used for these exercises. The <strong>data</strong> sets are stored <strong>in</strong> a zip file on the <strong>ISS</strong><br />

website. The first task will be to copy the zip file <strong>in</strong>to your own chosen directory and unpack the zip file. You will<br />

need the files <strong>in</strong> order to complete these exercises.<br />

Task 1 Download example <strong>data</strong> sets<br />

Activity 1.1 Open a web browser such as Internet Explorer and go to the URL:<br />

Information Systems Services Page 4 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

http://iss.leeds.ac.uk/downloads/303/statistical_analysis<br />

Activity 1.2 Scroll down to locate the item titled<br />

‘<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17’<br />

double click on this item and click on Download on the screen that follows.<br />

Activity 1.3 From the Save As dialog box, choose a suitable directory to store the file <strong>in</strong> and click .<br />

Close the web browser.<br />

Activity 1.4 Go <strong>in</strong>to W<strong>in</strong>dows Explorer and double click on the zip file. Click File>Extract to unzip the <strong>data</strong><br />

files.<br />

Activity 1.5 Locate the <strong>SPSS</strong> icon from the Statistics menu and double-click the icon to open <strong>SPSS</strong>. After a<br />

short period the <strong>SPSS</strong> <strong>data</strong> editor w<strong>in</strong>dow will be displayed.<br />

1.5 The <strong>SPSS</strong> Command Language<br />

The use <strong>of</strong> a standard W<strong>in</strong>dows-based <strong>in</strong>terface for applications s<strong>of</strong>tware allows transfer learn<strong>in</strong>g to take place<br />

when mov<strong>in</strong>g between different applications. Master<strong>in</strong>g one menu system makes it easy to master another. In<br />

addition the <strong>SPSS</strong> graphical user <strong>in</strong>terface greatly simplifies the specification <strong>of</strong> analyses and other <strong>operations</strong><br />

by reliev<strong>in</strong>g the user <strong>of</strong> the need to write ‘programs’.<br />

The <strong>SPSS</strong> graphical user <strong>in</strong>terface caters for most <strong>of</strong> the functionality provided by <strong>SPSS</strong>. However, there are<br />

some <strong>operations</strong> that cannot be handled us<strong>in</strong>g this approach. For example, you may need to re-shape <strong>data</strong><br />

<strong>in</strong>put from an external source <strong>in</strong>to a form more appropriate for analysis. For such <strong>operations</strong> it may be<br />

necessary to resort to the use <strong>of</strong> the <strong>SPSS</strong> Command Language.<br />

Versions <strong>of</strong> <strong>SPSS</strong> prior to the emergence <strong>of</strong> the W<strong>in</strong>dows operat<strong>in</strong>g system required users to specify all their<br />

requests by us<strong>in</strong>g <strong>SPSS</strong> programs written <strong>in</strong> the <strong>SPSS</strong> command language. In practice, this usually <strong>in</strong>volved<br />

writ<strong>in</strong>g out programs on paper before attempt<strong>in</strong>g to load them <strong>in</strong>to <strong>SPSS</strong> on a computer. Nowadays, most<br />

users ‘program’ directly by <strong>in</strong>teract<strong>in</strong>g with the <strong>SPSS</strong> graphical user <strong>in</strong>terface. However, the <strong>SPSS</strong> Command<br />

Language rema<strong>in</strong>s at the heart <strong>of</strong> the <strong>SPSS</strong> system, despite the fact that specifications are made <strong>in</strong>teractively.<br />

1.6 The Syntax W<strong>in</strong>dow<br />

To enable <strong>SPSS</strong> command language to be used <strong>SPSS</strong> uses a special w<strong>in</strong>dow called the Syntax W<strong>in</strong>dow.<br />

Syntax w<strong>in</strong>dows can be used <strong>in</strong> one <strong>of</strong> two ways. An empty syntax w<strong>in</strong>dow can be opened us<strong>in</strong>g File …New<br />

… Syntax enabl<strong>in</strong>g commands to be entered by the user. Or, after request<strong>in</strong>g an analysis us<strong>in</strong>g the menu<br />

<strong>in</strong>terface, the commands underly<strong>in</strong>g the analysis can be <strong>in</strong>spected, and possibly edited, by us<strong>in</strong>g the Paste<br />

button to display them <strong>in</strong> a syntax w<strong>in</strong>dow prior to execut<strong>in</strong>g the request.<br />

Task 2 Us<strong>in</strong>g the Syntax W<strong>in</strong>dow<br />

Activity 2.1 Open the <strong>SPSS</strong> <strong>data</strong> file statan.sav.<br />

Activity 2.2 (i) From the Analyze menu, select Descriptives ... Frequencies.<br />

Information Systems Services Page 5 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

(ii) From the list <strong>of</strong> variables displayed select height as the variable for analysis.<br />

(iii) Untick the box labelled Display frequency tables.<br />

Activity 2.4 Click on the Statistics button.<br />

In the dialogue box displayed:<br />

(i) Select Quartiles, Std. deviation and Mean.<br />

(ii) Select Percentiles, enter 5 <strong>in</strong> the box alongside and click on Add.<br />

(iii) Enter 95 <strong>in</strong> the percentiles box and click on Add.<br />

(iv) Click on Cont<strong>in</strong>ue.<br />

Activity 2.5 Click on the Charts button.<br />

In the dialogue box displayed:<br />

(i) Select Histograms and With normal curve.<br />

(ii) Click on Cont<strong>in</strong>ue.<br />

Activity 2.6 Normally, once the specification <strong>of</strong> an analysis is complete, the usual action is to execute the<br />

analysis by click<strong>in</strong>g on the OK button.<br />

(i) Instead <strong>of</strong> click<strong>in</strong>g on OK click on the button labelled Paste.<br />

A Syntax W<strong>in</strong>dow, shown <strong>in</strong> Figure 1 below, will be displayed conta<strong>in</strong><strong>in</strong>g commands that <strong>SPSS</strong><br />

has assembled to perform the Frequencies analysis you requested <strong>in</strong>teractively.<br />

Figure 1: Syntax w<strong>in</strong>dow conta<strong>in</strong><strong>in</strong>g Frequencies commands<br />

Note the generation <strong>of</strong> sub-commands correspond<strong>in</strong>g to the various options specified <strong>in</strong> the<br />

dialogue w<strong>in</strong>dows. For example /HISTOGRAM NORMAL has been generated <strong>in</strong> response to<br />

select<strong>in</strong>g a histogram with a super-imposed Normal Curve <strong>in</strong> the charts options w<strong>in</strong>dow.<br />

You are free to edit this program before runn<strong>in</strong>g it. In fact, there are a few procedures <strong>in</strong> <strong>SPSS</strong><br />

where some <strong>of</strong> the options can not be specified <strong>in</strong>teractively and must be specified by edit<strong>in</strong>g a<br />

partially-complete program assembled <strong>in</strong> this way.<br />

Information Systems Services Page 6 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Activity 2.7 To run the program select Run from the menu. (In some situations it may be appropriate to<br />

highlight a sub-set <strong>of</strong> commands for execution before select<strong>in</strong>g Run). The output obta<strong>in</strong>ed from<br />

execut<strong>in</strong>g this program is displayed <strong>in</strong> Figure 2.<br />

Figure 2: Output from Frequencies analysis<br />

A syntax w<strong>in</strong>dow can also be obta<strong>in</strong>ed from the File menu by select<strong>in</strong>g File … New … Syntax. This allows<br />

you to by-pass the menu system and to take complete control over the writ<strong>in</strong>g <strong>of</strong> <strong>SPSS</strong> programs. This<br />

approach is most commonly used for writ<strong>in</strong>g non-standard <strong>in</strong>put programs rather than for specify<strong>in</strong>g analyses.<br />

One advantage <strong>of</strong> us<strong>in</strong>g the syntax w<strong>in</strong>dow is that it allows you to save programs generated by <strong>SPSS</strong> for reuse<br />

at a later stage. Syntax files are stored with a file type <strong>of</strong> .sps. Re-us<strong>in</strong>g programs can save you time<br />

and effort by elim<strong>in</strong>at<strong>in</strong>g the need to repeat specifications us<strong>in</strong>g the menu system. They also serve as a<br />

precise record <strong>of</strong> the analysis carried out. This can be very helpful if problems arise <strong>in</strong> carry<strong>in</strong>g out an<br />

analysis.<br />

Activity 2.8 Save the contents <strong>of</strong> the syntax w<strong>in</strong>dow.<br />

(i) From the menu, select W<strong>in</strong>dow and *Syntax1 - <strong>SPSS</strong> Syntax Editor. The syntax<br />

w<strong>in</strong>dow will come to the foreground.<br />

(ii) Select File … Save As and save the contents <strong>in</strong> file named testsyntax.<br />

(iii) Close the Syntax w<strong>in</strong>dow us<strong>in</strong>g the X button at the top right <strong>of</strong> the w<strong>in</strong>dow.<br />

Activity 2.9 Re-open the saved syntax file.<br />

(i) From the Data Editor menu, select File …Open…Syntax and re-open the syntax file.<br />

This can now be edited and/or re-executed as required.<br />

(ii) Close the Syntax w<strong>in</strong>dow us<strong>in</strong>g the X button at the top right <strong>of</strong> the w<strong>in</strong>dow.<br />

Information Systems Services Page 7 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

1.7 The Syntax Reference Guide<br />

The full syntax <strong>of</strong> the <strong>SPSS</strong> language is def<strong>in</strong>ed <strong>in</strong> the <strong>SPSS</strong> Syntax Reference Guide available from <strong>SPSS</strong> UK<br />

Limited. The guide is also available electronically as part <strong>of</strong> the <strong>SPSS</strong> s<strong>of</strong>tware.<br />

Task 3: Us<strong>in</strong>g the <strong>SPSS</strong> Syntax Reference Guide<br />

Activity 3.1: Select Help… Command Syntax Reference from the Menu.<br />

Activity 3.2: From the <strong>in</strong>dex on the left- hand side <strong>of</strong> the display, Select Frequencies.<br />

You should see a def<strong>in</strong>ition <strong>of</strong> the syntax <strong>of</strong> the Frequencies command as shown <strong>in</strong> Figure 3.<br />

Figure 3: <strong>SPSS</strong> Onl<strong>in</strong>e Help<br />

The emphasis throughout this text will rema<strong>in</strong> on us<strong>in</strong>g <strong>SPSS</strong> <strong>in</strong>teractively. However, to aid learn<strong>in</strong>g <strong>of</strong> the<br />

<strong>SPSS</strong> command language, the <strong>SPSS</strong> commands generated by the <strong>SPSS</strong> menu system will be shown<br />

follow<strong>in</strong>g the description <strong>of</strong> the <strong>in</strong>teractive method. These commands will be highlighted <strong>in</strong> the text us<strong>in</strong>g the<br />

symbol .<br />

For a number <strong>of</strong> topics however, the use <strong>of</strong> <strong>SPSS</strong> commands is essential. <strong>SPSS</strong> commands are also used<br />

occasionally <strong>in</strong> preference to the <strong>in</strong>teractive style simply because they provide a clearer record <strong>of</strong> the operation<br />

be<strong>in</strong>g performed.<br />

Information Systems Services Page 8 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

2 Data Transformations<br />

It is common that the <strong>data</strong> currently available is not <strong>in</strong> the form required for analysis. For example,<br />

exam<strong>in</strong>ation <strong>of</strong> the <strong>data</strong> may reveal the need to create new variables for analysis derived from values <strong>of</strong> one or<br />

more exist<strong>in</strong>g variables. If <strong>data</strong> values are to hand, these can be entered at the keyboard. However, if <strong>data</strong><br />

values are to be derived from values <strong>of</strong> other variables, one <strong>of</strong> the <strong>data</strong> transformation facilities must be used.<br />

2.1 Creat<strong>in</strong>g New Variables<br />

The Compute facility on the Transform menu allows you to create new variables from values <strong>of</strong> exist<strong>in</strong>g<br />

variables.<br />

Task 4 Us<strong>in</strong>g Compute to create a new variable<br />

Objective Create a new variable based upon values <strong>of</strong> exist<strong>in</strong>g variables.<br />

Comments Compute can also be used to modify the value <strong>of</strong> an exist<strong>in</strong>g variable.<br />

Problem In the <strong>data</strong> file employee<strong>data</strong>.sav, the variables jobtime and prevexp record the number <strong>of</strong><br />

months employed and the number <strong>of</strong> months <strong>of</strong> previous work experience.<br />

We need to calculate the total number <strong>of</strong> years worked rounded to the nearest whole number.<br />

Activity 4.1 Open the <strong>SPSS</strong> <strong>data</strong> file employee<strong>data</strong>.sav.<br />

Activity 4.2 Select Transform from the menu.<br />

The Transform menu, pictured <strong>in</strong> Figure 4, provides access to a range <strong>of</strong> tools for perform<strong>in</strong>g<br />

<strong>data</strong> transformations <strong>in</strong> <strong>SPSS</strong>.<br />

Figure 4: Transform menu<br />

Activity 4.3 Select Compute Variable from the menu.<br />

The Compute Variable facility is used to create new variables whose values can be expressed<br />

as a mathematical function <strong>of</strong> values <strong>of</strong> exist<strong>in</strong>g variables.<br />

Activity 4.4 In the Compute Variable dialog w<strong>in</strong>dow, enter workyrs as the name <strong>of</strong> the Target Variable.<br />

Activity 4.5 Assemble the expression RND((jobtime + prevexp)/12) <strong>in</strong> the Numerical Expression box.<br />

To assemble the required expression,<br />

(i) Scroll down the Function group box and select Arithmetic.<br />

Information Systems Services Page 9 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

(ii) Hav<strong>in</strong>g selected Arithmetic functions, scroll down the list <strong>of</strong> Functions and Special<br />

Activity 4.6 Click on .<br />

Variables and highlight the RND function. Click on the upward arrow key to select this<br />

function.<br />

(iii) Replace the symbol ?, represent<strong>in</strong>g the argument <strong>of</strong> RND, by the expression (jobtime +<br />

prevexp)/12.<br />

The term jobtime+prevexp represents the total number <strong>of</strong> months worked. Division by 12<br />

converts this <strong>in</strong>to years. The RND function rounds the result to the nearest <strong>in</strong>teger.<br />

On completion <strong>of</strong> the expression, the dialog w<strong>in</strong>dow should resemble Figure 5.<br />

Figure 5: Compute dialog w<strong>in</strong>dow<br />

The result is shown <strong>in</strong> Figure 6, below.<br />

Figure 6: Data Editor show<strong>in</strong>g computed variable workyrs<br />

Note that if miss<strong>in</strong>g values had been present <strong>in</strong> either jobtime or prevexp, the result variable<br />

workyrs would have been assigned the system miss<strong>in</strong>g value.<br />

Figure 7 shows the <strong>SPSS</strong> commands used to perform the transformation.<br />

COMPUTE workyrs = RND((jobtime + prevexp) / 12) .<br />

EXECUTE .<br />

Figure 7: COMPUTE command<br />

Information Systems Services Page 10 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

2.1.2 <strong>SPSS</strong> functions<br />

Note the use <strong>of</strong> the command EXECUTE. This forces the transformation to be executed.<br />

Without this command, the execution <strong>of</strong> the transformation would be delayed until the next<br />

procedure is encountered.<br />

The RND function used <strong>in</strong> the previous example is one <strong>of</strong> many functions provided by <strong>SPSS</strong>. Table 1, below,<br />

lists some commonly used functions.<br />

Function Description<br />

ABS(arg) Returns the absolute value <strong>of</strong> arg. eg ABS(-3.5) = 3.5.<br />

CONCAT(str1, str2,…) Concatenates the text str<strong>in</strong>gs specified by str1, str2, …<br />

DATE.DMY(d,m,y) Returns a date value given day, month and year <strong>in</strong>dicated<br />

by <strong>in</strong>teger values <strong>in</strong> d, m and y.<br />

LN(arg) Returns the base-e logarithm <strong>of</strong> arg. (arg >0)<br />

MAX(value1,value2[,...]) Returns the maximum value <strong>of</strong> its arguments that have<br />

valid values. This function requires two or more<br />

arguments.<br />

MEAN(arg1,arg2[,...]) Returns the arithmetic mean <strong>of</strong> its arguments that have<br />

valid values. This function requires two or more<br />

arguments, which must be numeric.<br />

MIN(value1,value2[,...]) Returns the m<strong>in</strong>imum value <strong>of</strong> its arguments that have<br />

valid values. This function requires two or more<br />

arguments.<br />

MOD(arg,modulus) Returns the rema<strong>in</strong>der when arg is divided by modulus.<br />

eg MOD(13,12)=1<br />

RND(argr) Returns the <strong>in</strong>teger that results from round<strong>in</strong>g arg.<br />

eg RND(2.4)=2 ; RND(2.5)=3 ; RND(-2.5)=-3<br />

SQRT(arg) Returns the positive square root <strong>of</strong> arg.<br />

SUBSTR(str<strong>in</strong>g,pos,len) Returns the substr<strong>in</strong>g <strong>of</strong> str<strong>in</strong>g <strong>of</strong> length len beg<strong>in</strong>n<strong>in</strong>g at<br />

position pos. eg substr<strong>in</strong>g(‘JAN99’,4,2)=’99’.<br />

SYSMIS(numvar) Returns 1 or true if the value <strong>of</strong> numvar is system-miss<strong>in</strong>g.<br />

The argument numvar must be the name <strong>of</strong> a numeric<br />

variable <strong>in</strong> the work<strong>in</strong>g <strong>data</strong> file.<br />

TRUNC(arg) Returns the value <strong>of</strong> arg truncated to an <strong>in</strong>teger (towards<br />

0). eg TRUNC(2.7)=2; TRUNC(-2.7)=-2.<br />

XDATE.MDAY(arg) Returns day number <strong>in</strong> month from a date specified by<br />

arg.<br />

XDATE.MONTH(arg) Returns month number from a date specified by arg.<br />

XDATE.YEAR(arg) Returns a four-digit year from a date specified by arg.<br />

YRMODA(y,m,d) Returns the number <strong>of</strong> days from October 15, 1582, to the<br />

date represented by the arguments y, m, and d.<br />

Table 1: A selection <strong>of</strong> <strong>SPSS</strong> functions<br />

For a complete list <strong>of</strong> functions available, consult the <strong>SPSS</strong> Onl<strong>in</strong>e Help system or the <strong>SPSS</strong> Syntax Reference<br />

Guide.<br />

Information Systems Services Page 11 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

2.1.3 <strong>SPSS</strong> System Variables<br />

<strong>SPSS</strong> uses a set <strong>of</strong> special variables, called system variables, with reserved names to hold the values <strong>of</strong><br />

commonly used <strong>data</strong> items. Their names beg<strong>in</strong> with a dollar sign ($).<br />

Table 2 lists a selection <strong>of</strong> these variables and their values.<br />

Variable Description<br />

$CASENUM Records the number <strong>of</strong> cases read up to and <strong>in</strong>clud<strong>in</strong>g the current case.<br />

$SYSMIS System Miss<strong>in</strong>g Value<br />

$JDATE Current date <strong>in</strong> number <strong>of</strong> days s<strong>in</strong>ce October 14, 1582<br />

$DATE Current date <strong>in</strong> the form dd-mmm-yy<br />

$DATE11 Current date <strong>in</strong> the form dd-mmm-yyyy<br />

$TIME Current date and time. Value is the number <strong>of</strong> seconds from midnight,<br />

October 14, 1582 to the time when the value <strong>of</strong> $TIME is used.<br />

Table 2: A selection <strong>of</strong> <strong>SPSS</strong> system variables<br />

Although their values can not be modified, system variables may be used <strong>in</strong> the same way as normal variables<br />

with<strong>in</strong> <strong>data</strong> transformations.<br />

2.2 Comput<strong>in</strong>g Counts Us<strong>in</strong>g COUNT<br />

A common requirement is the need to form a count <strong>of</strong> the number <strong>of</strong> occurrences <strong>of</strong> a particular value across a<br />

list <strong>of</strong> variables. For example, patients’ medical <strong>data</strong> may conta<strong>in</strong> variables record<strong>in</strong>g the presence or absence<br />

<strong>of</strong> a host <strong>of</strong> symptoms. If the presence <strong>of</strong> a symptom is coded us<strong>in</strong>g ‘1’, the number <strong>of</strong> symptoms exhibited by<br />

the patient is just the number <strong>of</strong> variables with the value ‘1’. Although the Compute facility can deal with this<br />

problem, the Count facility provides a more efficient means <strong>of</strong> perform<strong>in</strong>g this task.<br />

Task 5 Comput<strong>in</strong>g a Count<br />

Objective Create a count <strong>of</strong> the number <strong>of</strong> occurrences <strong>of</strong> a value <strong>in</strong> a set <strong>of</strong> variables.<br />

Problem The <strong>SPSS</strong> <strong>data</strong> file census.sav conta<strong>in</strong>s <strong>data</strong> derived from a national census. The variables<br />

natenvir, natheal, natcrime, natdrug, nateduc, natdef and natmass record the respondents’<br />

views on the adequacy <strong>of</strong> the Government’s spend<strong>in</strong>g <strong>in</strong> several areas <strong>of</strong> national importance,<br />

such as Health and Education.<br />

The responses to these variables are coded as follows:<br />

Code Label<br />

0 Miss<strong>in</strong>g<br />

1 Too Little<br />

2 About Right<br />

3 Too Much<br />

Information Systems Services Page 12 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

A simple (albeit crude) statistic measur<strong>in</strong>g the overall satisfaction with Government spend<strong>in</strong>g<br />

might be the count <strong>of</strong> the number <strong>of</strong> these variables for which the respondent replied ‘About<br />

Right’. This is easily obta<strong>in</strong>ed us<strong>in</strong>g the Count facility.<br />

Activity 5.1 Open the <strong>SPSS</strong> <strong>data</strong> file census.sav.<br />

Activity 5.2 Select Transform … Count Values with<strong>in</strong> Cases from the menu. In the w<strong>in</strong>dow displayed:<br />

(i) Enter spend<strong>in</strong>g as the name <strong>of</strong> the Target Variable.<br />

(ii) Enter Satisfaction with Government Spend<strong>in</strong>g as the Target Label.<br />

(iii) Highlight natenvir, natheal, natcrime, natdrug, nateduc, natdef and natmass <strong>in</strong> the<br />

list <strong>of</strong> variables and use the button to transfer them to the list <strong>of</strong> selected variables.<br />

The w<strong>in</strong>dow should now resemble Figure 8.<br />

Figure 8: Specify<strong>in</strong>g variables <strong>in</strong> the Count dialog box<br />

Activity 5.3 Click on to specify the values that contribute to the count.<br />

(i) Enter 2 (the code for ‘About Right’) <strong>in</strong> the Value box.<br />

Figure 9: Def<strong>in</strong><strong>in</strong>g values for Count<br />

(ii) Click followed by .<br />

Activity 5.4 Click <strong>in</strong> the <strong>in</strong>itial Count w<strong>in</strong>dow to execute the request. Figure 10 shows the result.<br />

Information Systems Services Page 13 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Figure 10: The result <strong>of</strong> the count operation<br />

The variable spend<strong>in</strong>g, added to the end <strong>of</strong> the list <strong>of</strong> variables, conta<strong>in</strong>s the appropriate count (width and<br />

decimal places have been changed to 1 and 0 respectively us<strong>in</strong>g the Variable View w<strong>in</strong>dow s<strong>in</strong>ce the value <strong>of</strong><br />

spend<strong>in</strong>g is a number <strong>in</strong> the range 0 to 7). If required, value labels may be def<strong>in</strong>ed for spend<strong>in</strong>g us<strong>in</strong>g the<br />

Variable View w<strong>in</strong>dow.<br />

We can exam<strong>in</strong>e the distribution <strong>of</strong> values <strong>of</strong> the variable spend<strong>in</strong>g us<strong>in</strong>g Frequencies or Graph. The results<br />

are shown <strong>in</strong> Figure 11. The list <strong>of</strong> frequencies reveals that only 0.5% <strong>of</strong> those surveyed feel that spend<strong>in</strong>g is<br />

‘About Right’ <strong>in</strong> all <strong>of</strong> the areas <strong>of</strong> Government spend<strong>in</strong>g.<br />

Figure 11: Frequency distribution and bar chart for spend<strong>in</strong>g<br />

Figure 12 shows the <strong>SPSS</strong> commands used to perform the transformation.<br />

COUNT spend<strong>in</strong>g = natenvir natheal natcrime natdrug nateduc<br />

natdef natmass (2) .<br />

VARIABLE LABELS<br />

spend<strong>in</strong>g 'Satisfaction with Government Spend<strong>in</strong>g' .<br />

EXECUTE .<br />

Figure 12: COUNT command<br />

Information Systems Services Page 14 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

2.3 Recod<strong>in</strong>g variables<br />

There are several situations where modifications to the values <strong>of</strong> exist<strong>in</strong>g variables need to be made.<br />

Examples <strong>in</strong>clude correct<strong>in</strong>g errors <strong>in</strong> <strong>data</strong>, chang<strong>in</strong>g the codes used for variables and form<strong>in</strong>g categories<br />

based upon values <strong>of</strong> a given variable.<br />

<strong>SPSS</strong> <strong>of</strong>fers two facilities for re-cod<strong>in</strong>g <strong>data</strong>: Recode and Automatic Recode. The Recode facility is the most<br />

versatile <strong>of</strong> the two facilities and allows variables to be re-coded either <strong>in</strong>to the same variable or <strong>in</strong>to different<br />

variables. Automatic Recode addresses the specific problem <strong>of</strong> replac<strong>in</strong>g a set <strong>of</strong> character codes by a set <strong>of</strong><br />

numeric codes. This can be useful as a prelim<strong>in</strong>ary to us<strong>in</strong>g some <strong>of</strong> the statistical modell<strong>in</strong>g procedures which<br />

require categorical variables to be def<strong>in</strong>ed as numerical variables.<br />

The follow<strong>in</strong>g three tasks illustrate both styles <strong>of</strong> Recode and the Automatic Recode facility.<br />

Task 6 Recod<strong>in</strong>g <strong>in</strong>to the same variable<br />

Objective Correct errors <strong>in</strong> a variable.<br />

Comment The Recode facility can be used to change values <strong>of</strong> an exist<strong>in</strong>g variable. This can be useful<br />

for correct<strong>in</strong>g errors <strong>in</strong> the <strong>data</strong>.<br />

Problem Figure 13 shows the frequency distribution for the variable Zodiac. Note the s<strong>in</strong>gle case with a<br />

code <strong>of</strong> 13. This is an <strong>in</strong>correct value that needs correct<strong>in</strong>g.<br />

Figure 13: Frequency table for the variable zodiac<br />

We can use Recode to change the value 13 to the <strong>SPSS</strong> system miss<strong>in</strong>g value.<br />

Activity 6.1 Open the <strong>SPSS</strong> <strong>data</strong> file census.sav if it is not already open.<br />

Activity 6.2 Select Transform>Recode>Into Same Variables.<br />

(i) Select Zodiac from the list <strong>of</strong> variables displayed and click on .<br />

Information Systems Services Page 15 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Activity 6.3 Def<strong>in</strong>e old and new values.<br />

Figure 14: Recode <strong>in</strong>to Same Variables dialog<br />

(i) In the area on the left labelled Old Value, enter the value 13 <strong>in</strong>to the box labelled Value.<br />

(ii) In the area labelled New Value, select .<br />

(iii) Click on . The value 13 will disappear from the box on the left hand side and the<br />

mapp<strong>in</strong>g between the old and the new value will appear <strong>in</strong> the large box on the right<br />

hand side, as shown <strong>in</strong> Figure 15.<br />

Figure 15: Recode show<strong>in</strong>g mapp<strong>in</strong>g <strong>of</strong> old value to new value<br />

Note the use <strong>of</strong> the keyword SYSMIS. This is a reserved name <strong>in</strong> <strong>SPSS</strong> and is used to<br />

represent the value <strong>of</strong> the <strong>SPSS</strong> system miss<strong>in</strong>g value.<br />

(iv) Click .<br />

Activity 6.4 Click to execute the re-code operation.<br />

Figure 16 shows the result <strong>of</strong> re-runn<strong>in</strong>g Frequencies on the modified variable.<br />

Information Systems Services Page 16 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Figure 16: Revised frequency table after recod<strong>in</strong>g zodiac<br />

The entry correspond<strong>in</strong>g to code 13 has been replaced by an entry labelled ‘System’ under the<br />

general head<strong>in</strong>g ‘Miss<strong>in</strong>g’. Note the changes to the columns ‘Valid Percent’ and ‘Cumulative<br />

Percent’ result<strong>in</strong>g from the exclusion <strong>of</strong> this case from the analysis.<br />

Figure 17 shows the <strong>SPSS</strong> commands used to perform the transformation.<br />

RECODE<br />

zodiac (13=SYSMIS) .<br />

EXECUTE .<br />

Figure 17: RECODE <strong>in</strong>to ‘Same Variable’<br />

Task 7 Recod<strong>in</strong>g <strong>in</strong>to a new variable<br />

Objective Recode a variable <strong>in</strong>to a new variable.<br />

Comment A very common requirement <strong>in</strong> <strong>data</strong> analysis is to be able def<strong>in</strong>e a new variable by re-cod<strong>in</strong>g a<br />

range <strong>of</strong> values <strong>of</strong> an exist<strong>in</strong>g variable <strong>in</strong>to a s<strong>in</strong>gle category. This can serve a number <strong>of</strong><br />

purposes. It is <strong>of</strong>ten used as a method <strong>of</strong> reduc<strong>in</strong>g the dimensionality <strong>of</strong> the <strong>data</strong> allow<strong>in</strong>g some<br />

analysis to be performed where otherwise no analysis would be possible. It is also used to alter<br />

the format <strong>of</strong> summary tables and displays.<br />

Problem The chart <strong>in</strong> Figure 18 shows a histogram obta<strong>in</strong>ed for the variable <strong>in</strong>come.<br />

Figure 18: Histogram <strong>of</strong> the variable <strong>in</strong>come<br />

Information Systems Services Page 17 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Suppose that a chart show<strong>in</strong>g fewer bars is considered more appropriate. We can use Recode,<br />

as follows, to def<strong>in</strong>e a new variable, <strong>in</strong>crange, each <strong>of</strong> whose values correspond to a range <strong>of</strong><br />

<strong>in</strong>come values. Observ<strong>in</strong>g that <strong>in</strong>come values range from 24000 to 68200, we will use seven<br />

<strong>in</strong>come ranges, each <strong>of</strong> width 10000.<br />

Activity 7.1 Select Transform>Recode>Into Different Variables …<br />

Activity 7.2 Def<strong>in</strong>e variable names and labels.<br />

(i) Select Income as the variable to be re-coded.<br />

(ii) In the box labelled Name, enter <strong>in</strong>crange for the name <strong>of</strong> the new variable.<br />

(iii) In the box labelled Label, enter Income Range.<br />

(iv) Click on to assign <strong>in</strong>crange as the name <strong>of</strong> the Output variable.<br />

The w<strong>in</strong>dow should now resemble Figure 19.<br />

Figure 19: Recode <strong>in</strong>to different variables<br />

Activity 7.3 Def<strong>in</strong>e the mapp<strong>in</strong>gs between old and new values.<br />

(i) Click on .<br />

Note the different ways <strong>of</strong> specify<strong>in</strong>g values to be re-coded.<br />

Figure 20: Def<strong>in</strong><strong>in</strong>g old and new values<br />

You can specify <strong>in</strong>dividual values, system or user miss<strong>in</strong>g values or ranges <strong>of</strong> values as the<br />

values to be re-coded. In this example, we will re-code ranges <strong>of</strong> values <strong>of</strong> <strong>in</strong>come <strong>in</strong>to s<strong>in</strong>gle<br />

codes 1,2,3 … 7.<br />

Information Systems Services Page 18 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

(ii) Select the Range specification labelled (currently greyed), and enter<br />

10000 <strong>in</strong> the box alongside.<br />

Enter 1 <strong>in</strong> the box labelled New Value and click on .<br />

‘Lowest thru 10000 -> 1’ will appear <strong>in</strong> the Old --> New box. This def<strong>in</strong>es the first <strong>of</strong> our<br />

mapp<strong>in</strong>gs.<br />

(iii) Select the first <strong>of</strong> the Range specifications and enter the values 10000 and 20000 <strong>in</strong> the<br />

boxes provided.<br />

Enter 2 <strong>in</strong> the box labelled New Value and click on .<br />

‘10000 thru 20000 -> 2’ will be added to the list <strong>of</strong> mapp<strong>in</strong>gs.<br />

(iv) Def<strong>in</strong>e similar mapp<strong>in</strong>gs for each <strong>of</strong> the <strong>in</strong>come ranges 20000–30000, 30000–40000,<br />

40000–50000 and 50000–60000 us<strong>in</strong>g codes 3,4,5 and 6.<br />

(v) Def<strong>in</strong>e the f<strong>in</strong>al range us<strong>in</strong>g the Range specification labelled .<br />

Enter 60000 <strong>in</strong> the box alongside, enter 7 <strong>in</strong> the box labelled New Value and click on<br />

.<br />

The specification ‘60000 thru Highest -> 7’ will be added to the list <strong>of</strong> mapp<strong>in</strong>gs.<br />

Note that the apparent ambiguity surround<strong>in</strong>g the mapp<strong>in</strong>g <strong>of</strong> the values which appear as both<br />

the upper bound <strong>of</strong> one range and the lower bound <strong>of</strong> the next range, (such as 10000), is<br />

resolved automatically by <strong>SPSS</strong>. <strong>SPSS</strong> assigns such values to the first <strong>of</strong> the two ranges <strong>in</strong><br />

which the value appears. Hence 10000 will be mapped <strong>in</strong>to the value 1.<br />

Figure 21 shows the status <strong>of</strong> the w<strong>in</strong>dow after complet<strong>in</strong>g all the re-code specifications.<br />

Figure 21: Completed recode specifications<br />

(vi) Click on to return to the earlier w<strong>in</strong>dow.<br />

Activity 7.4 Click to perform the re-code.<br />

The variable <strong>in</strong>crange will appear as a new column on the right <strong>of</strong> the <strong>data</strong> editor w<strong>in</strong>dow.<br />

Information Systems Services Page 19 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Activity 7.5 Use the Variable View w<strong>in</strong>dow to def<strong>in</strong>e value labels for <strong>in</strong>crange as follows:<br />

Code Value label<br />

1 Legacy Dialogs<br />

…Bar on the variable <strong>in</strong>crange. Your graph should resemble Figure 22.<br />

Figure 22: Bar chart <strong>of</strong> the variable <strong>in</strong>crange<br />

Figure 23 shows the <strong>SPSS</strong> commands used to perform the transformation.<br />

RECODE <strong>in</strong>come<br />

(Lowest thru 10000=1) (10000 thru 20000=2)<br />

(20000 thru 30000=3) (30000 thru 40000=4)<br />

(40000 thru 50000=5) (50000 thru 60000=6)<br />

(60000 thru Highest=7) INTO <strong>in</strong>crange .<br />

VARIABLE LABELS <strong>in</strong>crange 'Income Range'.<br />

EXECUTE .<br />

Figure 23: RECODE <strong>in</strong>to ‘Different Variable’<br />

Information Systems Services Page 20 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Task 8 Perform<strong>in</strong>g an automatic recode<br />

Objective To recode a block <strong>of</strong> <strong>data</strong> <strong>in</strong> order to group some categories together.<br />

Comment Categorical variables may be coded us<strong>in</strong>g alphanumeric codes or numeric codes. For example,<br />

a variable such as gender may be coded us<strong>in</strong>g either ‘F’ and ‘M’ or numeric codes 1 and 2. For<br />

the purpose <strong>of</strong> obta<strong>in</strong><strong>in</strong>g descriptive statistics, the choice <strong>of</strong> cod<strong>in</strong>g would not usually be critical.<br />

However, some <strong>of</strong> the statistical procedures for modell<strong>in</strong>g categorical <strong>data</strong>, such as the logl<strong>in</strong>ear<br />

models procedure, require that categorical <strong>data</strong> be encoded us<strong>in</strong>g numerical codes.<br />

Whilst Recode could be used to change each alphanumeric code <strong>in</strong>to a numeric code, this<br />

could be a tedious process if many values required re-cod<strong>in</strong>g. Instead, the Automatic Recode<br />

facility can be used.<br />

Problem In the <strong>data</strong> file census.sav, the variable gender is coded us<strong>in</strong>g ‘f’ and ‘m’. To change these <strong>in</strong>to<br />

numeric codes, we need to create a new variable which is numeric (<strong>SPSS</strong> does not let you recode<br />

<strong>in</strong>to the same variable and change the type <strong>of</strong> the variable).<br />

Activity 8.1 From the Transform menu select Automatic Recode.<br />

Activity 8.2 Select Respondent’s sex [gender] from the list <strong>of</strong> variables and transfer it to the Variable -><br />

New Name box.<br />

Activity 8.3 Enter sex <strong>in</strong> the box below and click on the button. The completed dialog box<br />

should resemble Figure 24 below.<br />

Figure 24: Automatic recode <strong>of</strong> gender<br />

Activity 8.4 Click and <strong>in</strong>spect the output at the bottom <strong>of</strong> the output w<strong>in</strong>dow.<br />

<strong>SPSS</strong> produces a summary table <strong>in</strong> the output w<strong>in</strong>dow (similar to that shown <strong>in</strong> Figure 25)<br />

show<strong>in</strong>g the old and new codes.<br />

Figure 25: Summary <strong>of</strong> automatic recode<br />

Information Systems Services Page 21 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Note that the new variable, sex, has <strong>in</strong>herited the variable label and value labels <strong>of</strong> the old<br />

variable, gender.<br />

Figure 26 shows the <strong>SPSS</strong> commands used to perform the transformation.<br />

AUTORECODE<br />

VARIABLES=gender /INTO sex<br />

/PRINT.<br />

Figure 26: AUTORECODE<br />

Information Systems Services Page 22 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

2.4 Conditional transformations<br />

All the transformations described so far have been examples <strong>of</strong> unconditional transformations. These are<br />

transformations that affect every case <strong>in</strong> the <strong>data</strong> set <strong>in</strong> the same way. Situations arise, however, where the<br />

form <strong>of</strong> the transformation to be carried out varies from case to case. Such transformations are called<br />

conditional transformations. They can be applied to the <strong>data</strong> transformation commands Compute and Recode.<br />

All conditional transformations are specified us<strong>in</strong>g a logical expression. Usually, this expression is constructed<br />

us<strong>in</strong>g the dialog w<strong>in</strong>dow appropriate to the transformation be<strong>in</strong>g carried out. Alternatively, they may be<br />

constructed us<strong>in</strong>g an IF statement as part <strong>of</strong> an <strong>SPSS</strong> program compiled with<strong>in</strong> a syntax w<strong>in</strong>dow.<br />

Task 9 Perform<strong>in</strong>g a conditional compute<br />

Objective Apply a <strong>data</strong> transformation to a subset <strong>of</strong> <strong>data</strong> cases which satisfy a specified condition.<br />

Comment This requires a variation <strong>of</strong> the Compute command used <strong>in</strong> Task 4.<br />

Problem The <strong>data</strong> file employee<strong>data</strong>.sav conta<strong>in</strong>s <strong>data</strong> relat<strong>in</strong>g to employees <strong>of</strong> a company. The<br />

variable salary records the employees’ current salaries. An enlightened new CEO is concerned<br />

that the mean salary <strong>of</strong> female employees is only about $26,000 compared with a figure <strong>of</strong><br />

about $41,400 for men (see Figure 27 below).<br />

Figure 27: Salaries <strong>of</strong> employees broken down by gender<br />

The CEO decides to rectify this situation immediately by award<strong>in</strong>g a ten percent salary<br />

<strong>in</strong>crement to all female employees, irrespective <strong>of</strong> the length <strong>of</strong> service <strong>of</strong> the employee or <strong>of</strong><br />

any other factors such as job function.<br />

We can implement this modification us<strong>in</strong>g a conditional compute transformation<br />

Activity 9.1 Open the <strong>SPSS</strong> <strong>data</strong> file employee<strong>data</strong>.sav.<br />

Activity 9.2 From the Transform menu select Compute Variable.<br />

Activity 9.3 Def<strong>in</strong>e the transformation to be applied<br />

(i) Enter sal<strong>in</strong>cr <strong>in</strong> the box labelled Target Variable.<br />

(ii) Assemble the expression 0.1*salary <strong>in</strong> the Numeric Expression box.<br />

The dialog box should resemble Figure 28.<br />

Information Systems Services Page 23 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Figure 28: Conditional compute – def<strong>in</strong><strong>in</strong>g the transformation<br />

(iii) Click on the button marked and select .<br />

(iv) Select gender from the list <strong>of</strong> variables and transfer it to the box on the right us<strong>in</strong>g the<br />

transfer arrow .<br />

(v) Complete the condition by typ<strong>in</strong>g =‘f’ after the word ‘gender’.<br />

The dialog box should now look like Figure 29.<br />

Figure 29: Conditional compute – specify<strong>in</strong>g the condition<br />

(vi) Click on .<br />

If we were to click on at this stage, the value <strong>of</strong> sal<strong>in</strong>cr for male employees would be<br />

undef<strong>in</strong>ed. It would be assigned the system miss<strong>in</strong>g value. To avoid this, we could have<br />

performed an <strong>in</strong>itial unconditional compute operation, by assign<strong>in</strong>g a value <strong>of</strong> 0 to sal<strong>in</strong>cr for all<br />

employees. An alternative method is to use the Paste button and <strong>in</strong>sert this extra transformation<br />

<strong>in</strong>to the <strong>SPSS</strong> commands.<br />

Information Systems Services Page 24 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Activity 9.4 Def<strong>in</strong>e the action required for cases which do not satisfy the criterion for an <strong>in</strong>crement. This<br />

requires that we arrange for all males to receive an <strong>in</strong>crement <strong>of</strong> 0.<br />

(i) Click on .<br />

An <strong>SPSS</strong> syntax w<strong>in</strong>dow will be displayed conta<strong>in</strong><strong>in</strong>g the follow<strong>in</strong>g commands:<br />

Figure 30: Conditional transformation commands generated by <strong>SPSS</strong><br />

(ii) Directly before these two commands, <strong>in</strong>sert the command:<br />

compute sal<strong>in</strong>cr = 0.<br />

The w<strong>in</strong>dow should now look like Figure 31.<br />

Figure 31: Pre-sett<strong>in</strong>g values <strong>of</strong> computed variable to zero<br />

(iii) Select Run>All to execute the transformations.<br />

When these commands are executed, the value <strong>of</strong> 0 will first be assigned unconditionally to the<br />

variable sal<strong>in</strong>cr for each case. For females only, this value will be overwritten by the value<br />

calculated by the expression <strong>in</strong> the IF statement.<br />

Figure 32 shows a sample <strong>of</strong> the cases after runn<strong>in</strong>g the transformation. The Variable View<br />

w<strong>in</strong>dow was used to change the type <strong>of</strong> the variable sal<strong>in</strong>cr to Dollar and the width and<br />

decimal places to 8 and 0 respectively.<br />

Figure 32: Data w<strong>in</strong>dow show<strong>in</strong>g result <strong>of</strong> conditional transformation<br />

Information Systems Services Page 25 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

3 Work<strong>in</strong>g with subsets <strong>of</strong> <strong>data</strong><br />

A common requirement <strong>in</strong> analysis is to be able to analyse subsets <strong>of</strong> <strong>data</strong>.<br />

<strong>SPSS</strong> allows selections to be made on either a temporary or permanent basis. Temporary selections are<br />

performed us<strong>in</strong>g a filter. When filter<strong>in</strong>g is active, all cases rema<strong>in</strong> <strong>in</strong> the <strong>data</strong> editor w<strong>in</strong>dow but only those<br />

cases filtered are used for analysis. If permanent selection is used, cases not <strong>in</strong>cluded <strong>in</strong> the selected subset<br />

are deleted from the <strong>data</strong> editor w<strong>in</strong>dow.<br />

If only a few analyses are to be carried out on subsets, filter<strong>in</strong>g would be the best approach. However, if<br />

extensive analysis is to be performed on a selected subset, it would be more efficient to use a permanent<br />

selection. If permanent selection is used, however, it is advisable to save a copy <strong>of</strong> the selected <strong>data</strong> to a new<br />

<strong>SPSS</strong> system file before proceed<strong>in</strong>g with analysis. This will protect aga<strong>in</strong>st the danger <strong>of</strong> sav<strong>in</strong>g the <strong>data</strong> to the<br />

orig<strong>in</strong>al <strong>SPSS</strong> <strong>data</strong> file, with a resultant loss <strong>of</strong> <strong>data</strong>.<br />

Data selection is performed us<strong>in</strong>g the Select Cases facility on the Data menu.<br />

When the same analysis is required for more than one subset <strong>of</strong> <strong>data</strong>, the repeated use <strong>of</strong> the Select Cases<br />

facility can become tedious. If the <strong>data</strong> set conta<strong>in</strong>s a variable whose values identify the multiple subsets, the<br />

Split File facility can be used <strong>in</strong>stead to reduce the effort <strong>in</strong> carry<strong>in</strong>g out the analyses on multiple subsets.<br />

Task 10 Select<strong>in</strong>g a subset <strong>of</strong> cases<br />

Objective To select a subset <strong>of</strong> <strong>data</strong> cases for analysis.<br />

Comment This illustrates the use <strong>of</strong> a temporary selection or filter.<br />

Problem The <strong>data</strong> file census.sav conta<strong>in</strong>s <strong>in</strong>formation relat<strong>in</strong>g to <strong>in</strong>dividuals. It is required to restrict<br />

analysis to <strong>in</strong>dividuals who are unmarried and earn<strong>in</strong>g at least £20,000 per annum.<br />

Activity 10.1 Open the <strong>SPSS</strong> <strong>data</strong> file census.sav.<br />

Activity 10.2 Select Select Cases from the Data menu. The Select Cases dialog w<strong>in</strong>dow will be displayed.<br />

Figure 33: Select Cases dialog w<strong>in</strong>dow<br />

Note the different ways <strong>in</strong> which cases may be selected.<br />

The choice ‘If condition is satisfied’ restricts cases to those satisfy<strong>in</strong>g a logical condition<br />

entered by the user. The next choice, ‘Random Sample <strong>of</strong> Cases’, selects a random subset <strong>of</strong><br />

cases based on a pseudo-random number generated by <strong>SPSS</strong>. The third option allows case<br />

Information Systems Services Page 26 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

selection to be based on a range <strong>of</strong> times or on a range <strong>of</strong> case numbers. The f<strong>in</strong>al option, ‘Use<br />

filter variable’, allows case selection to be based upon values <strong>of</strong> an exist<strong>in</strong>g variable. Cases<br />

with values other than 0 or miss<strong>in</strong>g will be selected<br />

Activity 10.3 Specify the criterion for selection.<br />

(i) Select the option .<br />

(ii) Click on .<br />

The criterion for selection to be used <strong>in</strong> this example is ‘unmarried and earn<strong>in</strong>g at least 20,000<br />

per annum’. In terms <strong>of</strong> <strong>SPSS</strong> variables this can be expressed as ‘Marital=5 and Income<br />

>=20000’.<br />

(iii) Use the w<strong>in</strong>dow displayed to assemble the required logical expression.<br />

Figure 34: Select Cases dialog show<strong>in</strong>g selection criterion<br />

(iv) Click on .<br />

Activity 10.4 Execute the transformation.<br />

(i) Click .<br />

Figure 30 shows the effect <strong>of</strong> apply<strong>in</strong>g the selection.<br />

Figure 35: Select Cases dialog show<strong>in</strong>g selection criterion<br />

Cases that fail to meet the selection criterion are <strong>in</strong>dicated by a diagonal l<strong>in</strong>e <strong>in</strong> the row labels <strong>of</strong><br />

the Data Editor, as shown <strong>in</strong> Figure 30, and the status l<strong>in</strong>e also <strong>in</strong>dicates that filter<strong>in</strong>g is turned<br />

on.<br />

The <strong>SPSS</strong> commands used to perform this selection are shown <strong>in</strong> Figure 36.<br />

Information Systems Services Page 27 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

COMPUTE filter_$=(marital = 5 & <strong>in</strong>come >= 20000).<br />

VARIABLE LABEL filter_$ 'marital = 5 & <strong>in</strong>come >= 20000 (FILTER)'.<br />

VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.<br />

FORMAT filter_$ (f1.0).<br />

FILTER BY filter_$.<br />

EXECUTE.<br />

Figure 36: Commands used to perform case selection us<strong>in</strong>g filter<strong>in</strong>g<br />

Inspection <strong>of</strong> Figure 36 reveals that <strong>SPSS</strong> creates a special variable called filter_$, coded 1 or<br />

0 depend<strong>in</strong>g on whether or not the case meets the selection criterion.<br />

Activity 10.5 Inspect the result <strong>of</strong> the selection.<br />

(i) Scroll to last column <strong>in</strong> the Data Editor.<br />

(ii) Click on the icon to display value labels. Case number 22, which<br />

satisfied the criterion, will show the label Selected <strong>in</strong> the filter_$ column.<br />

Note that the filter variable filter_$ is a temporary variable and will not be saved on exit from<br />

<strong>SPSS</strong>. To keep the variable for future use, rename it us<strong>in</strong>g the Variable View w<strong>in</strong>dow and save<br />

the <strong>data</strong> before exit<strong>in</strong>g from <strong>SPSS</strong>.<br />

A case selection criterion specified <strong>in</strong> this manner will rema<strong>in</strong> <strong>in</strong> force until a new specification is<br />

issued.<br />

Activity 10.6 Cancel the selection.<br />

(i) Click on the icon.<br />

(ii) Click on .<br />

The Select Cases dialog w<strong>in</strong>dow will be re-displayed.<br />

(iii) Click on the radio button.<br />

(iv) Click .<br />

All cases will now appear as selected <strong>in</strong> the <strong>SPSS</strong> Data Editor.<br />

Task 11 Delet<strong>in</strong>g selected cases<br />

Objective To delete cases from an active <strong>SPSS</strong> <strong>data</strong> set.<br />

Comment This use <strong>of</strong> Select Cases is more drastic <strong>in</strong> effect and results <strong>in</strong> the permanent deletion <strong>of</strong> <strong>data</strong><br />

cases from the active <strong>data</strong> set.<br />

Activity 11.1 Click on the Dialog Recall icon and click on . The Select Cases dialog w<strong>in</strong>dow<br />

will be re-displayed.<br />

Activity 11.2 Select .<br />

The previous selections will still be <strong>in</strong> place.<br />

Activity 11.3 Specify that permanent case selection is required.<br />

(i) Click on the radio button and click .<br />

The Data Editor will now conta<strong>in</strong> just the 13 cases that satisfy the selection criterion.<br />

Information Systems Services Page 28 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

When permanent case selection is used <strong>in</strong> an <strong>SPSS</strong>, great care is needed to ensure that<br />

important <strong>data</strong> is not lost from physical <strong>data</strong> files. If, after mak<strong>in</strong>g a permanent <strong>data</strong> selection,<br />

you were to use the Save command to save the rema<strong>in</strong><strong>in</strong>g <strong>data</strong> cases without specify<strong>in</strong>g a new<br />

file name, the orig<strong>in</strong>al <strong>SPSS</strong> <strong>data</strong> file would be overwritten with the new reduced set <strong>of</strong> <strong>data</strong> with<br />

a result<strong>in</strong>g loss <strong>of</strong> <strong>data</strong>. If you do want to save the selected cases, use Save As <strong>in</strong>stead to save<br />

the cases to a new <strong>data</strong> file.<br />

A similar danger arises if you decide to open a new <strong>data</strong> set. <strong>SPSS</strong> will ask you if you want to<br />

save the current <strong>data</strong>, the subset <strong>of</strong> 13 cases just selected. If you reply ‘Yes’, your orig<strong>in</strong>al <strong>data</strong><br />

file will be overwritten. This is illustrated <strong>in</strong> the next activity.<br />

Activity 11.4 Exercise caution when open<strong>in</strong>g a new <strong>data</strong> file.<br />

(i) Select File>New>Data from the menu.<br />

At this po<strong>in</strong>t EXTREME CAUTION is required!<br />

If you were to reply ‘Yes’ to this, you would overwrite the orig<strong>in</strong>al <strong>data</strong> file! If you did want to<br />

save this subset, you should reply Cancel and use File>Save As to save the <strong>data</strong> to a new file.<br />

Instead <strong>of</strong> sav<strong>in</strong>g this subset, revert to us<strong>in</strong>g the orig<strong>in</strong>al <strong>data</strong> file.<br />

(ii) Click on .<br />

(iii) Select File>Open>Data and select census.sav. Click .<br />

The orig<strong>in</strong>al <strong>data</strong> set, census.sav, will be re-displayed.<br />

Task 12 Sub-group process<strong>in</strong>g<br />

Objective Perform the same analysis on sub-groups <strong>of</strong> <strong>data</strong> def<strong>in</strong>ed by values <strong>of</strong> a specified variable.<br />

Comment Split File provides a more efficient way <strong>of</strong> perform<strong>in</strong>g the same analysis on different subsets <strong>of</strong><br />

<strong>data</strong> than the repeated use <strong>of</strong> Select Cases.<br />

Problem The file census.sav (which should be currently open follow<strong>in</strong>g completion <strong>of</strong> the previous task)<br />

conta<strong>in</strong>s a variable polviews <strong>in</strong>dicat<strong>in</strong>g the political views <strong>of</strong> the respondent. A second variable,<br />

genelec, <strong>in</strong>dicates the party for whom the respondent voted <strong>in</strong> the 1992 general election. It is<br />

required to exam<strong>in</strong>e the distribution <strong>of</strong> votes for each political party by <strong>in</strong>dividuals shar<strong>in</strong>g the<br />

same political views.<br />

Activity 12.1 Select Split File from the Data menu.<br />

Activity 12.2 Def<strong>in</strong>e the groups.<br />

(i) Click the radio button.<br />

(ii) Select Political views [polviews] from the list <strong>of</strong> variables and transfer it to the box<br />

labelled Groups Based on. The display should resemble Figure 37.<br />

Information Systems Services Page 29 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Figure 37: Split File dialog box<br />

The default action is to sort the cases by the split file variable. If the <strong>data</strong> is already sorted, the<br />

sort can by avoided by click<strong>in</strong>g the ‘File is already sorted’ radio button.<br />

(iii) Click .<br />

The Split File On message appears on the status l<strong>in</strong>e <strong>of</strong> the Data Editor.<br />

Activity 12.3 To see the effect <strong>of</strong> select<strong>in</strong>g Split File process<strong>in</strong>g, run a Frequencies analysis on the variable<br />

genelec.<br />

(i) Select Analyze>Descriptive Statistics>Frequencies.<br />

(ii) Select .<br />

(iii) Click on .<br />

A table <strong>of</strong> frequencies, similar to that <strong>in</strong> Figure 38, is produced for each category <strong>of</strong> political<br />

views.<br />

Figure 38: Frequencies on genelec with split file on polviews<br />

Activity 12.4 Before cont<strong>in</strong>u<strong>in</strong>g, turn <strong>of</strong>f split file process<strong>in</strong>g:<br />

(i) Select Split File from the Data menu.<br />

(ii) Select and click .<br />

The full <strong>data</strong> set will now be available for subsequent analyses.<br />

Information Systems Services Page 30 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

4 Merg<strong>in</strong>g <strong>SPSS</strong> <strong>data</strong> sets<br />

Commonly, collections <strong>of</strong> <strong>data</strong> may be built up over time possibly <strong>in</strong>volv<strong>in</strong>g more than one person <strong>in</strong> the<br />

process <strong>of</strong> <strong>data</strong> collection and <strong>data</strong> entry. For <strong>in</strong>stance, if a survey is be<strong>in</strong>g carried out <strong>in</strong> the field, several<br />

<strong>in</strong>terviewers may be <strong>in</strong>volved and if each <strong>in</strong>terviewer enters the <strong>data</strong> directly <strong>in</strong>to an <strong>SPSS</strong> file the files may<br />

have to be merged.<br />

The Merge facility allows <strong>data</strong> from two files to be merged. It can be used either to comb<strong>in</strong>e cases from two<br />

files which share exactly the same variables (the Add Cases option) or to comb<strong>in</strong>e variables from two files (the<br />

Add Variables option). These two <strong>operations</strong> are mutually exclusive.<br />

If the two files to be merged conta<strong>in</strong> exactly the same variables with exactly the same structure, the cases from<br />

the second file will be added at the end <strong>of</strong> the first file.<br />

If the files do not match exactly, <strong>SPSS</strong> will produce a list <strong>of</strong> unpaired variables. This list will conta<strong>in</strong>:<br />

Variables from either <strong>data</strong> file that do not match a variable name <strong>in</strong> the other (<strong>in</strong> this case, pairs<br />

can be created from the unpaired set and these pairs can be <strong>in</strong>cluded <strong>in</strong> the new merged file).<br />

Variables def<strong>in</strong>ed as numeric <strong>data</strong> <strong>in</strong> one file and str<strong>in</strong>g <strong>data</strong> <strong>in</strong> the other file. Numeric variables<br />

cannot be merged with str<strong>in</strong>g variables<br />

Str<strong>in</strong>g variables <strong>of</strong> unequal width (<strong>in</strong> this case it would be necessary to modify the structure <strong>of</strong><br />

such variables <strong>in</strong> one <strong>of</strong> the files).<br />

The new <strong>data</strong> file will conta<strong>in</strong> all those variables that match exactly, all the new pairs selected, and any<br />

unpaired variables that have been matched to create new pairs.<br />

Any rema<strong>in</strong><strong>in</strong>g unpaired variables which are <strong>in</strong>cluded will conta<strong>in</strong> miss<strong>in</strong>g <strong>data</strong> for the cases from the file that<br />

does not conta<strong>in</strong> that variable.<br />

Before merg<strong>in</strong>g the files, any unwanted variables can be deselected.<br />

Task 13 Add<strong>in</strong>g cases<br />

Objective To add <strong>data</strong> cases to the work<strong>in</strong>g file from another <strong>SPSS</strong> <strong>data</strong> file.<br />

Comment This illustrates the use <strong>of</strong> the Add Cases facility under Merge.<br />

Problem The <strong>data</strong> file possums.sav conta<strong>in</strong>s 462 <strong>data</strong> cases and 16 variables. The file possums1.sav<br />

conta<strong>in</strong>s 2 <strong>data</strong> cases for the same variables. It is necessary to comb<strong>in</strong>e the two sets <strong>of</strong> cases<br />

<strong>in</strong>to one <strong>data</strong> file.<br />

Activity 13.1 Open the file possums1.sav from the directory <strong>in</strong> which your <strong>SPSS</strong> example <strong>data</strong> sets are<br />

stored.<br />

Activity 13.2 Select Data>Merge Files>Add Cases. The Merge Files menu will be displayed.<br />

Information Systems Services Page 31 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Figure 39: Merge Files menu<br />

Activity 13.3 Choose the file possums.sav. Click on Open.<br />

Figure 40: Select<strong>in</strong>g the file to be merged<br />

The follow<strong>in</strong>g dialog w<strong>in</strong>dow will be opened.<br />

Figure 41: Variable specification <strong>in</strong> Merge<br />

Information Systems Services Page 32 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Note that the left hand w<strong>in</strong>dow <strong>of</strong> the dialog box has listed a pair <strong>of</strong> variables which do not<br />

match. The reason for this is that the declared width <strong>of</strong> the variable complics differs <strong>in</strong> the two<br />

files.<br />

Activity 13.4 Try to pair the variables by click<strong>in</strong>g on the first variable, complics


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Note: If a column displays asterisks (*****), simply widen the column to see the <strong>data</strong>. This can<br />

be done by po<strong>in</strong>t<strong>in</strong>g the mouse to the boundary between the cells, and dragg<strong>in</strong>g to the right<br />

when the mouse po<strong>in</strong>ter changes to a double headed arrow.<br />

Figure 44: Widen<strong>in</strong>g the column width<br />

A practical consideration <strong>in</strong> perform<strong>in</strong>g a merge is the order <strong>in</strong> which the two files are specified.<br />

They can be specified <strong>in</strong> either order, but the choice is important because it can affect the<br />

variable def<strong>in</strong>itions assigned to common variables <strong>in</strong> the merged file. For example, if a numeric<br />

variable common to both files has a decimal places value <strong>of</strong> 0 <strong>in</strong> the active file and 2 <strong>in</strong> the file<br />

be<strong>in</strong>g added, the value assigned <strong>in</strong> the merged file would be 0. If the active file is our master<br />

file, then this might be acceptable. But if the file be<strong>in</strong>g added was our master file, then this<br />

would probably not be acceptable s<strong>in</strong>ce it would mean that we are over-rid<strong>in</strong>g def<strong>in</strong>itions which<br />

we have laid down <strong>in</strong> our master file.<br />

The conclusion might be, therefore, to always open the master file first and to add cases from<br />

the second file. Whilst this would prevent the possibility <strong>of</strong> master def<strong>in</strong>itions be<strong>in</strong>g changed, it<br />

would also prevent us from chang<strong>in</strong>g any other def<strong>in</strong>itions <strong>in</strong> the active file prior to the merge.<br />

Accord<strong>in</strong>gly, <strong>in</strong> this example, we have chosen to open the file conta<strong>in</strong><strong>in</strong>g the new cases first so<br />

that we can change the active file prior to the merge. This approach does require, <strong>of</strong> course,<br />

that care is taken to ensure that master def<strong>in</strong>itions are not <strong>in</strong>advertently changed. The motto<br />

must therefore be: caution.<br />

Task 14 Add<strong>in</strong>g variables<br />

Objective To add variables to the work<strong>in</strong>g file from another <strong>SPSS</strong> <strong>data</strong> file.<br />

Comment This illustrates the use <strong>of</strong> the Add Variables under Merge.<br />

Problem The <strong>data</strong> file possums1.sav conta<strong>in</strong>s 2 <strong>data</strong> cases and 16 variables. The file possums2.sav<br />

conta<strong>in</strong>s 3 new variables plus the common key variable patid with values 600 and 601. It is<br />

necessary to comb<strong>in</strong>e the two sets <strong>of</strong> variables <strong>in</strong>to one <strong>data</strong> file.<br />

Activity 14.1 Open the file possums1.sav from the directory <strong>in</strong> which your <strong>SPSS</strong> example <strong>data</strong> sets are<br />

stored.<br />

Activity 14.2 Select Data>Merge Files>Add Variables.<br />

Activity 14.3 Choose the file possums2.sav.<br />

This file conta<strong>in</strong>s three variables id, height and weight with entries for patients with patid 600<br />

and 601.<br />

The follow<strong>in</strong>g dialog w<strong>in</strong>dow will be opened.<br />

Information Systems Services Page 34 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Version 6 (November 2009)<br />

Figure 45: Add Variables dialog<br />

The box labelled ‘Excluded Variables’ lists the variable patid which is common to both files.<br />

Activity 14.4 Select ‘Match cases on key variables <strong>in</strong> sorted files’, highlight the variable patid and select it<br />

as the key variable <strong>in</strong>to the ‘Key Variables’ box. Click .<br />

A warn<strong>in</strong>g message, shown below, will be displayed warn<strong>in</strong>g you <strong>of</strong> the need for each file to be<br />

pre-sorted on values <strong>of</strong> the key variable(s). In this case, we are safe to proceed but <strong>in</strong> if your<br />

<strong>data</strong> is not pre-sorted, and you still choose to proceed, some <strong>in</strong>formation may be lost <strong>in</strong> the<br />

merge.<br />

Click .<br />

Figure 46: Warn<strong>in</strong>g to ensure <strong>data</strong> is sorted by key variables<br />

Activity 14.5 (i) Scroll to the right <strong>of</strong> the <strong>data</strong> view. The variables height and weight will appear at the<br />

right-hand end <strong>of</strong> the <strong>data</strong> vector.<br />

Figure 47: Merged <strong>data</strong> cases<br />

Activity 14.6 (ii) Use File>Save As to save the merged <strong>data</strong> to a new file (use the filename allvars.sav).<br />

Information Systems Services Page 35 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}


<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />

Task 15 F<strong>in</strong>ish<strong>in</strong>g <strong>SPSS</strong><br />

Objective Term<strong>in</strong>ate the <strong>SPSS</strong> session.<br />

Comments You should always quit any computer program when you have f<strong>in</strong>ished your session. Never<br />

switch <strong>of</strong>f the computer when <strong>SPSS</strong> is still runn<strong>in</strong>g unless absolutely necessary as this will<br />

corrupt your spreadsheet files. Also never leave a computer whilst you are still logged on to it as<br />

others may use your account and could potentially damage your files.<br />

Activity 15.1 Select the Exit option from the File menu.<br />

Activity 15.2 <strong>SPSS</strong> will ask you if you want to save the contents <strong>of</strong> various w<strong>in</strong>dows before it lets you quit, so<br />

a dialogue box similar to Figure 48 appears:<br />

Figure 48: Sav<strong>in</strong>g the <strong>SPSS</strong> <strong>data</strong><br />

Click either or depend<strong>in</strong>g on whether you want the <strong>data</strong> saved. If you choose to<br />

save it the file is saved to the same name on disk. Once the <strong>data</strong>sheet has been saved it can be<br />

reopened by <strong>SPSS</strong> and edited.<br />

You will also be asked if you want to save the contents <strong>of</strong> your output w<strong>in</strong>dow – usually the<br />

numerical results <strong>of</strong> all your analysis work. Normally you would want to save this, but here you<br />

should click .<br />

Once you have done this <strong>SPSS</strong> will quit. If you click on <strong>in</strong> any <strong>of</strong> the boxes you will be<br />

returned to your unsaved work to cont<strong>in</strong>ue.<br />

Information Systems Services Page 36 <strong>of</strong>36<br />

Version 4.1 (December 2006) tut113_vn17nov2009.doc}

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!