Basic data operations in SPSS - ISS - University of Leeds
Basic data operations in SPSS - ISS - University of Leeds
Basic data operations in SPSS - ISS - University of Leeds
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Information Systems Services<br />
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for<br />
W<strong>in</strong>dows 17<br />
TUT 113<br />
Version 6 (November 2009)
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Contents<br />
1 Introduction ................................................................................................................................... 4<br />
1.1 About <strong>data</strong> management ........................................................................................................ 4<br />
1.2 Requirements ........................................................................................................................ 4<br />
1.3 Documentation ...................................................................................................................... 4<br />
1.4 Gett<strong>in</strong>g Started ...................................................................................................................... 4<br />
Task 1 Download example <strong>data</strong> sets ........................................................................................ 4<br />
1.5 The <strong>SPSS</strong> Command Language ............................................................................................ 5<br />
1.6 The Syntax W<strong>in</strong>dow ............................................................................................................... 5<br />
Task 2 Us<strong>in</strong>g the Syntax W<strong>in</strong>dow ............................................................................................ 5<br />
1.7 The Syntax Reference Guide ................................................................................................. 8<br />
Task 3: Us<strong>in</strong>g the <strong>SPSS</strong> Syntax Reference Guide ..................................................................... 8<br />
2 Data Transformations ................................................................................................................... 9<br />
2.1 Creat<strong>in</strong>g New Variables ......................................................................................................... 9<br />
Task 4 Us<strong>in</strong>g Compute to create a new variable ...................................................................... 9<br />
2.2 Comput<strong>in</strong>g Counts Us<strong>in</strong>g COUNT........................................................................................ 12<br />
Task 5 Comput<strong>in</strong>g a Count ..................................................................................................... 12<br />
2.3 Recod<strong>in</strong>g variables .............................................................................................................. 15<br />
Task 6 Recod<strong>in</strong>g <strong>in</strong>to the same variable ................................................................................. 15<br />
Task 7 Recod<strong>in</strong>g <strong>in</strong>to a new variable ..................................................................................... 17<br />
Task 8 Perform<strong>in</strong>g an automatic recode ................................................................................. 21<br />
2.4 Conditional transformations ................................................................................................. 23<br />
Task 9 Perform<strong>in</strong>g a conditional compute ............................................................................. 23<br />
3 Work<strong>in</strong>g with subsets <strong>of</strong> <strong>data</strong> ...................................................................................................... 26<br />
Task 10 Select<strong>in</strong>g a subset <strong>of</strong> cases .......................................................................................... 26<br />
Task 11 Delet<strong>in</strong>g selected cases ............................................................................................... 28<br />
Task 12 Sub-group process<strong>in</strong>g ................................................................................................. 29<br />
4 Merg<strong>in</strong>g <strong>SPSS</strong> <strong>data</strong> sets ............................................................................................................. 31<br />
Task 13 Add<strong>in</strong>g cases ............................................................................................................... 31<br />
Task 14 Add<strong>in</strong>g variables ......................................................................................................... 34<br />
Task 15 F<strong>in</strong>ish<strong>in</strong>g <strong>SPSS</strong> ........................................................................................................... 36<br />
Information Systems Services Page 2 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Format conventions<br />
In this document the follow<strong>in</strong>g format conventions are used:<br />
Feedback<br />
Commands that you must type <strong>in</strong> and menu<br />
items are shown <strong>in</strong> bold.<br />
Keys that you press and options that you select<br />
are enclosed <strong>in</strong> angle brackets.<br />
Name<br />
<br />
If you notice any mistakes <strong>in</strong> this document please contact the Information Officer. Email should be sent<br />
to the address <strong>in</strong>fo-<strong>of</strong>ficer@leeds.ac.uk.<br />
Copyright<br />
This document is copyright <strong>University</strong> <strong>of</strong> <strong>Leeds</strong>. Permission to use material <strong>in</strong> this document should be<br />
obta<strong>in</strong>ed from the Information Officer (email should be sent to the address <strong>in</strong>fo-<strong>of</strong>ficer@leeds.ac.uk).<br />
Information Systems Services Page 3 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
1 Introduction<br />
1.1 About <strong>data</strong> management<br />
The first step <strong>in</strong> perform<strong>in</strong>g a statistical analysis with <strong>SPSS</strong> is usually to def<strong>in</strong>e your <strong>data</strong> and enter your <strong>data</strong><br />
<strong>in</strong>to <strong>SPSS</strong>. If you are lucky, this is all that will be required before you are able to carry out your analysis. More<br />
commonly, however, you may f<strong>in</strong>d it necessary to manipulate the <strong>data</strong> before you can perform the required<br />
analysis.<br />
For example, it may be necessary to generate new variables from exist<strong>in</strong>g variables, or you may need to group<br />
the values <strong>of</strong> a variable <strong>in</strong>to a small number <strong>of</strong> groups to facilitate the creation <strong>of</strong> simple summary statistics or<br />
simple graphics.<br />
You may also f<strong>in</strong>d that the structure <strong>of</strong> your orig<strong>in</strong>al <strong>data</strong> does not conform to that required by some <strong>of</strong> the<br />
statistical or graphical procedures <strong>in</strong> <strong>SPSS</strong> and that the <strong>data</strong> needs to be re-shaped prior to analysis.<br />
This document will take you through some <strong>of</strong> the basic tasks <strong>of</strong> <strong>data</strong> manipulation that you may need to<br />
accomplish prior to perform<strong>in</strong>g statistical analyses <strong>in</strong> <strong>SPSS</strong>. The tasks have been designed <strong>in</strong> such a way that<br />
you are advised to complete a task and its exercises before proceed<strong>in</strong>g to the next one.<br />
1.2 Requirements<br />
It is assumed that you already know how to log<strong>in</strong> to the network and run the Micros<strong>of</strong>t W<strong>in</strong>dows operat<strong>in</strong>g<br />
system. It is also assumed that you know how to run <strong>SPSS</strong> for W<strong>in</strong>dows from the W<strong>in</strong>dows desktop, create a<br />
new <strong>data</strong> sheet, enter text <strong>in</strong>to a cell, run a simple analysis and pr<strong>in</strong>t it out. If you do not yet know how to<br />
achieve these <strong>operations</strong> it will be necessary for you to read and work through the follow<strong>in</strong>g document:<br />
Gett<strong>in</strong>g started with <strong>SPSS</strong> for W<strong>in</strong>dows (BEG 14)<br />
1.3 Documentation<br />
If you require further <strong>in</strong>formation on the facilities <strong>in</strong> <strong>SPSS</strong> the follow<strong>in</strong>g documents:<br />
Advanced <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows (TUT 114)<br />
Simple statistical analysis <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows (TUT 115)<br />
Advanced statistical analysis <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows (TUT 116)<br />
All the documents referred to above are available as PDF files for pr<strong>in</strong>t<strong>in</strong>g from the <strong>ISS</strong> web site:<br />
http://iss.leeds.ac.uk/downloads/303/statistical_analysis<br />
References are made <strong>in</strong> this document to the <strong>SPSS</strong> Syntax Reference Guide. This is now available onl<strong>in</strong>e as<br />
part <strong>of</strong> the <strong>SPSS</strong> Help system.<br />
1.4 Gett<strong>in</strong>g Started<br />
A variety <strong>of</strong> <strong>SPSS</strong> <strong>data</strong> sets will be used for these exercises. The <strong>data</strong> sets are stored <strong>in</strong> a zip file on the <strong>ISS</strong><br />
website. The first task will be to copy the zip file <strong>in</strong>to your own chosen directory and unpack the zip file. You will<br />
need the files <strong>in</strong> order to complete these exercises.<br />
Task 1 Download example <strong>data</strong> sets<br />
Activity 1.1 Open a web browser such as Internet Explorer and go to the URL:<br />
Information Systems Services Page 4 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
http://iss.leeds.ac.uk/downloads/303/statistical_analysis<br />
Activity 1.2 Scroll down to locate the item titled<br />
‘<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17’<br />
double click on this item and click on Download on the screen that follows.<br />
Activity 1.3 From the Save As dialog box, choose a suitable directory to store the file <strong>in</strong> and click .<br />
Close the web browser.<br />
Activity 1.4 Go <strong>in</strong>to W<strong>in</strong>dows Explorer and double click on the zip file. Click File>Extract to unzip the <strong>data</strong><br />
files.<br />
Activity 1.5 Locate the <strong>SPSS</strong> icon from the Statistics menu and double-click the icon to open <strong>SPSS</strong>. After a<br />
short period the <strong>SPSS</strong> <strong>data</strong> editor w<strong>in</strong>dow will be displayed.<br />
1.5 The <strong>SPSS</strong> Command Language<br />
The use <strong>of</strong> a standard W<strong>in</strong>dows-based <strong>in</strong>terface for applications s<strong>of</strong>tware allows transfer learn<strong>in</strong>g to take place<br />
when mov<strong>in</strong>g between different applications. Master<strong>in</strong>g one menu system makes it easy to master another. In<br />
addition the <strong>SPSS</strong> graphical user <strong>in</strong>terface greatly simplifies the specification <strong>of</strong> analyses and other <strong>operations</strong><br />
by reliev<strong>in</strong>g the user <strong>of</strong> the need to write ‘programs’.<br />
The <strong>SPSS</strong> graphical user <strong>in</strong>terface caters for most <strong>of</strong> the functionality provided by <strong>SPSS</strong>. However, there are<br />
some <strong>operations</strong> that cannot be handled us<strong>in</strong>g this approach. For example, you may need to re-shape <strong>data</strong><br />
<strong>in</strong>put from an external source <strong>in</strong>to a form more appropriate for analysis. For such <strong>operations</strong> it may be<br />
necessary to resort to the use <strong>of</strong> the <strong>SPSS</strong> Command Language.<br />
Versions <strong>of</strong> <strong>SPSS</strong> prior to the emergence <strong>of</strong> the W<strong>in</strong>dows operat<strong>in</strong>g system required users to specify all their<br />
requests by us<strong>in</strong>g <strong>SPSS</strong> programs written <strong>in</strong> the <strong>SPSS</strong> command language. In practice, this usually <strong>in</strong>volved<br />
writ<strong>in</strong>g out programs on paper before attempt<strong>in</strong>g to load them <strong>in</strong>to <strong>SPSS</strong> on a computer. Nowadays, most<br />
users ‘program’ directly by <strong>in</strong>teract<strong>in</strong>g with the <strong>SPSS</strong> graphical user <strong>in</strong>terface. However, the <strong>SPSS</strong> Command<br />
Language rema<strong>in</strong>s at the heart <strong>of</strong> the <strong>SPSS</strong> system, despite the fact that specifications are made <strong>in</strong>teractively.<br />
1.6 The Syntax W<strong>in</strong>dow<br />
To enable <strong>SPSS</strong> command language to be used <strong>SPSS</strong> uses a special w<strong>in</strong>dow called the Syntax W<strong>in</strong>dow.<br />
Syntax w<strong>in</strong>dows can be used <strong>in</strong> one <strong>of</strong> two ways. An empty syntax w<strong>in</strong>dow can be opened us<strong>in</strong>g File …New<br />
… Syntax enabl<strong>in</strong>g commands to be entered by the user. Or, after request<strong>in</strong>g an analysis us<strong>in</strong>g the menu<br />
<strong>in</strong>terface, the commands underly<strong>in</strong>g the analysis can be <strong>in</strong>spected, and possibly edited, by us<strong>in</strong>g the Paste<br />
button to display them <strong>in</strong> a syntax w<strong>in</strong>dow prior to execut<strong>in</strong>g the request.<br />
Task 2 Us<strong>in</strong>g the Syntax W<strong>in</strong>dow<br />
Activity 2.1 Open the <strong>SPSS</strong> <strong>data</strong> file statan.sav.<br />
Activity 2.2 (i) From the Analyze menu, select Descriptives ... Frequencies.<br />
Information Systems Services Page 5 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
(ii) From the list <strong>of</strong> variables displayed select height as the variable for analysis.<br />
(iii) Untick the box labelled Display frequency tables.<br />
Activity 2.4 Click on the Statistics button.<br />
In the dialogue box displayed:<br />
(i) Select Quartiles, Std. deviation and Mean.<br />
(ii) Select Percentiles, enter 5 <strong>in</strong> the box alongside and click on Add.<br />
(iii) Enter 95 <strong>in</strong> the percentiles box and click on Add.<br />
(iv) Click on Cont<strong>in</strong>ue.<br />
Activity 2.5 Click on the Charts button.<br />
In the dialogue box displayed:<br />
(i) Select Histograms and With normal curve.<br />
(ii) Click on Cont<strong>in</strong>ue.<br />
Activity 2.6 Normally, once the specification <strong>of</strong> an analysis is complete, the usual action is to execute the<br />
analysis by click<strong>in</strong>g on the OK button.<br />
(i) Instead <strong>of</strong> click<strong>in</strong>g on OK click on the button labelled Paste.<br />
A Syntax W<strong>in</strong>dow, shown <strong>in</strong> Figure 1 below, will be displayed conta<strong>in</strong><strong>in</strong>g commands that <strong>SPSS</strong><br />
has assembled to perform the Frequencies analysis you requested <strong>in</strong>teractively.<br />
Figure 1: Syntax w<strong>in</strong>dow conta<strong>in</strong><strong>in</strong>g Frequencies commands<br />
Note the generation <strong>of</strong> sub-commands correspond<strong>in</strong>g to the various options specified <strong>in</strong> the<br />
dialogue w<strong>in</strong>dows. For example /HISTOGRAM NORMAL has been generated <strong>in</strong> response to<br />
select<strong>in</strong>g a histogram with a super-imposed Normal Curve <strong>in</strong> the charts options w<strong>in</strong>dow.<br />
You are free to edit this program before runn<strong>in</strong>g it. In fact, there are a few procedures <strong>in</strong> <strong>SPSS</strong><br />
where some <strong>of</strong> the options can not be specified <strong>in</strong>teractively and must be specified by edit<strong>in</strong>g a<br />
partially-complete program assembled <strong>in</strong> this way.<br />
Information Systems Services Page 6 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Activity 2.7 To run the program select Run from the menu. (In some situations it may be appropriate to<br />
highlight a sub-set <strong>of</strong> commands for execution before select<strong>in</strong>g Run). The output obta<strong>in</strong>ed from<br />
execut<strong>in</strong>g this program is displayed <strong>in</strong> Figure 2.<br />
Figure 2: Output from Frequencies analysis<br />
A syntax w<strong>in</strong>dow can also be obta<strong>in</strong>ed from the File menu by select<strong>in</strong>g File … New … Syntax. This allows<br />
you to by-pass the menu system and to take complete control over the writ<strong>in</strong>g <strong>of</strong> <strong>SPSS</strong> programs. This<br />
approach is most commonly used for writ<strong>in</strong>g non-standard <strong>in</strong>put programs rather than for specify<strong>in</strong>g analyses.<br />
One advantage <strong>of</strong> us<strong>in</strong>g the syntax w<strong>in</strong>dow is that it allows you to save programs generated by <strong>SPSS</strong> for reuse<br />
at a later stage. Syntax files are stored with a file type <strong>of</strong> .sps. Re-us<strong>in</strong>g programs can save you time<br />
and effort by elim<strong>in</strong>at<strong>in</strong>g the need to repeat specifications us<strong>in</strong>g the menu system. They also serve as a<br />
precise record <strong>of</strong> the analysis carried out. This can be very helpful if problems arise <strong>in</strong> carry<strong>in</strong>g out an<br />
analysis.<br />
Activity 2.8 Save the contents <strong>of</strong> the syntax w<strong>in</strong>dow.<br />
(i) From the menu, select W<strong>in</strong>dow and *Syntax1 - <strong>SPSS</strong> Syntax Editor. The syntax<br />
w<strong>in</strong>dow will come to the foreground.<br />
(ii) Select File … Save As and save the contents <strong>in</strong> file named testsyntax.<br />
(iii) Close the Syntax w<strong>in</strong>dow us<strong>in</strong>g the X button at the top right <strong>of</strong> the w<strong>in</strong>dow.<br />
Activity 2.9 Re-open the saved syntax file.<br />
(i) From the Data Editor menu, select File …Open…Syntax and re-open the syntax file.<br />
This can now be edited and/or re-executed as required.<br />
(ii) Close the Syntax w<strong>in</strong>dow us<strong>in</strong>g the X button at the top right <strong>of</strong> the w<strong>in</strong>dow.<br />
Information Systems Services Page 7 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
1.7 The Syntax Reference Guide<br />
The full syntax <strong>of</strong> the <strong>SPSS</strong> language is def<strong>in</strong>ed <strong>in</strong> the <strong>SPSS</strong> Syntax Reference Guide available from <strong>SPSS</strong> UK<br />
Limited. The guide is also available electronically as part <strong>of</strong> the <strong>SPSS</strong> s<strong>of</strong>tware.<br />
Task 3: Us<strong>in</strong>g the <strong>SPSS</strong> Syntax Reference Guide<br />
Activity 3.1: Select Help… Command Syntax Reference from the Menu.<br />
Activity 3.2: From the <strong>in</strong>dex on the left- hand side <strong>of</strong> the display, Select Frequencies.<br />
You should see a def<strong>in</strong>ition <strong>of</strong> the syntax <strong>of</strong> the Frequencies command as shown <strong>in</strong> Figure 3.<br />
Figure 3: <strong>SPSS</strong> Onl<strong>in</strong>e Help<br />
The emphasis throughout this text will rema<strong>in</strong> on us<strong>in</strong>g <strong>SPSS</strong> <strong>in</strong>teractively. However, to aid learn<strong>in</strong>g <strong>of</strong> the<br />
<strong>SPSS</strong> command language, the <strong>SPSS</strong> commands generated by the <strong>SPSS</strong> menu system will be shown<br />
follow<strong>in</strong>g the description <strong>of</strong> the <strong>in</strong>teractive method. These commands will be highlighted <strong>in</strong> the text us<strong>in</strong>g the<br />
symbol .<br />
For a number <strong>of</strong> topics however, the use <strong>of</strong> <strong>SPSS</strong> commands is essential. <strong>SPSS</strong> commands are also used<br />
occasionally <strong>in</strong> preference to the <strong>in</strong>teractive style simply because they provide a clearer record <strong>of</strong> the operation<br />
be<strong>in</strong>g performed.<br />
Information Systems Services Page 8 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
2 Data Transformations<br />
It is common that the <strong>data</strong> currently available is not <strong>in</strong> the form required for analysis. For example,<br />
exam<strong>in</strong>ation <strong>of</strong> the <strong>data</strong> may reveal the need to create new variables for analysis derived from values <strong>of</strong> one or<br />
more exist<strong>in</strong>g variables. If <strong>data</strong> values are to hand, these can be entered at the keyboard. However, if <strong>data</strong><br />
values are to be derived from values <strong>of</strong> other variables, one <strong>of</strong> the <strong>data</strong> transformation facilities must be used.<br />
2.1 Creat<strong>in</strong>g New Variables<br />
The Compute facility on the Transform menu allows you to create new variables from values <strong>of</strong> exist<strong>in</strong>g<br />
variables.<br />
Task 4 Us<strong>in</strong>g Compute to create a new variable<br />
Objective Create a new variable based upon values <strong>of</strong> exist<strong>in</strong>g variables.<br />
Comments Compute can also be used to modify the value <strong>of</strong> an exist<strong>in</strong>g variable.<br />
Problem In the <strong>data</strong> file employee<strong>data</strong>.sav, the variables jobtime and prevexp record the number <strong>of</strong><br />
months employed and the number <strong>of</strong> months <strong>of</strong> previous work experience.<br />
We need to calculate the total number <strong>of</strong> years worked rounded to the nearest whole number.<br />
Activity 4.1 Open the <strong>SPSS</strong> <strong>data</strong> file employee<strong>data</strong>.sav.<br />
Activity 4.2 Select Transform from the menu.<br />
The Transform menu, pictured <strong>in</strong> Figure 4, provides access to a range <strong>of</strong> tools for perform<strong>in</strong>g<br />
<strong>data</strong> transformations <strong>in</strong> <strong>SPSS</strong>.<br />
Figure 4: Transform menu<br />
Activity 4.3 Select Compute Variable from the menu.<br />
The Compute Variable facility is used to create new variables whose values can be expressed<br />
as a mathematical function <strong>of</strong> values <strong>of</strong> exist<strong>in</strong>g variables.<br />
Activity 4.4 In the Compute Variable dialog w<strong>in</strong>dow, enter workyrs as the name <strong>of</strong> the Target Variable.<br />
Activity 4.5 Assemble the expression RND((jobtime + prevexp)/12) <strong>in</strong> the Numerical Expression box.<br />
To assemble the required expression,<br />
(i) Scroll down the Function group box and select Arithmetic.<br />
Information Systems Services Page 9 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
(ii) Hav<strong>in</strong>g selected Arithmetic functions, scroll down the list <strong>of</strong> Functions and Special<br />
Activity 4.6 Click on .<br />
Variables and highlight the RND function. Click on the upward arrow key to select this<br />
function.<br />
(iii) Replace the symbol ?, represent<strong>in</strong>g the argument <strong>of</strong> RND, by the expression (jobtime +<br />
prevexp)/12.<br />
The term jobtime+prevexp represents the total number <strong>of</strong> months worked. Division by 12<br />
converts this <strong>in</strong>to years. The RND function rounds the result to the nearest <strong>in</strong>teger.<br />
On completion <strong>of</strong> the expression, the dialog w<strong>in</strong>dow should resemble Figure 5.<br />
Figure 5: Compute dialog w<strong>in</strong>dow<br />
The result is shown <strong>in</strong> Figure 6, below.<br />
Figure 6: Data Editor show<strong>in</strong>g computed variable workyrs<br />
Note that if miss<strong>in</strong>g values had been present <strong>in</strong> either jobtime or prevexp, the result variable<br />
workyrs would have been assigned the system miss<strong>in</strong>g value.<br />
Figure 7 shows the <strong>SPSS</strong> commands used to perform the transformation.<br />
COMPUTE workyrs = RND((jobtime + prevexp) / 12) .<br />
EXECUTE .<br />
Figure 7: COMPUTE command<br />
Information Systems Services Page 10 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
2.1.2 <strong>SPSS</strong> functions<br />
Note the use <strong>of</strong> the command EXECUTE. This forces the transformation to be executed.<br />
Without this command, the execution <strong>of</strong> the transformation would be delayed until the next<br />
procedure is encountered.<br />
The RND function used <strong>in</strong> the previous example is one <strong>of</strong> many functions provided by <strong>SPSS</strong>. Table 1, below,<br />
lists some commonly used functions.<br />
Function Description<br />
ABS(arg) Returns the absolute value <strong>of</strong> arg. eg ABS(-3.5) = 3.5.<br />
CONCAT(str1, str2,…) Concatenates the text str<strong>in</strong>gs specified by str1, str2, …<br />
DATE.DMY(d,m,y) Returns a date value given day, month and year <strong>in</strong>dicated<br />
by <strong>in</strong>teger values <strong>in</strong> d, m and y.<br />
LN(arg) Returns the base-e logarithm <strong>of</strong> arg. (arg >0)<br />
MAX(value1,value2[,...]) Returns the maximum value <strong>of</strong> its arguments that have<br />
valid values. This function requires two or more<br />
arguments.<br />
MEAN(arg1,arg2[,...]) Returns the arithmetic mean <strong>of</strong> its arguments that have<br />
valid values. This function requires two or more<br />
arguments, which must be numeric.<br />
MIN(value1,value2[,...]) Returns the m<strong>in</strong>imum value <strong>of</strong> its arguments that have<br />
valid values. This function requires two or more<br />
arguments.<br />
MOD(arg,modulus) Returns the rema<strong>in</strong>der when arg is divided by modulus.<br />
eg MOD(13,12)=1<br />
RND(argr) Returns the <strong>in</strong>teger that results from round<strong>in</strong>g arg.<br />
eg RND(2.4)=2 ; RND(2.5)=3 ; RND(-2.5)=-3<br />
SQRT(arg) Returns the positive square root <strong>of</strong> arg.<br />
SUBSTR(str<strong>in</strong>g,pos,len) Returns the substr<strong>in</strong>g <strong>of</strong> str<strong>in</strong>g <strong>of</strong> length len beg<strong>in</strong>n<strong>in</strong>g at<br />
position pos. eg substr<strong>in</strong>g(‘JAN99’,4,2)=’99’.<br />
SYSMIS(numvar) Returns 1 or true if the value <strong>of</strong> numvar is system-miss<strong>in</strong>g.<br />
The argument numvar must be the name <strong>of</strong> a numeric<br />
variable <strong>in</strong> the work<strong>in</strong>g <strong>data</strong> file.<br />
TRUNC(arg) Returns the value <strong>of</strong> arg truncated to an <strong>in</strong>teger (towards<br />
0). eg TRUNC(2.7)=2; TRUNC(-2.7)=-2.<br />
XDATE.MDAY(arg) Returns day number <strong>in</strong> month from a date specified by<br />
arg.<br />
XDATE.MONTH(arg) Returns month number from a date specified by arg.<br />
XDATE.YEAR(arg) Returns a four-digit year from a date specified by arg.<br />
YRMODA(y,m,d) Returns the number <strong>of</strong> days from October 15, 1582, to the<br />
date represented by the arguments y, m, and d.<br />
Table 1: A selection <strong>of</strong> <strong>SPSS</strong> functions<br />
For a complete list <strong>of</strong> functions available, consult the <strong>SPSS</strong> Onl<strong>in</strong>e Help system or the <strong>SPSS</strong> Syntax Reference<br />
Guide.<br />
Information Systems Services Page 11 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
2.1.3 <strong>SPSS</strong> System Variables<br />
<strong>SPSS</strong> uses a set <strong>of</strong> special variables, called system variables, with reserved names to hold the values <strong>of</strong><br />
commonly used <strong>data</strong> items. Their names beg<strong>in</strong> with a dollar sign ($).<br />
Table 2 lists a selection <strong>of</strong> these variables and their values.<br />
Variable Description<br />
$CASENUM Records the number <strong>of</strong> cases read up to and <strong>in</strong>clud<strong>in</strong>g the current case.<br />
$SYSMIS System Miss<strong>in</strong>g Value<br />
$JDATE Current date <strong>in</strong> number <strong>of</strong> days s<strong>in</strong>ce October 14, 1582<br />
$DATE Current date <strong>in</strong> the form dd-mmm-yy<br />
$DATE11 Current date <strong>in</strong> the form dd-mmm-yyyy<br />
$TIME Current date and time. Value is the number <strong>of</strong> seconds from midnight,<br />
October 14, 1582 to the time when the value <strong>of</strong> $TIME is used.<br />
Table 2: A selection <strong>of</strong> <strong>SPSS</strong> system variables<br />
Although their values can not be modified, system variables may be used <strong>in</strong> the same way as normal variables<br />
with<strong>in</strong> <strong>data</strong> transformations.<br />
2.2 Comput<strong>in</strong>g Counts Us<strong>in</strong>g COUNT<br />
A common requirement is the need to form a count <strong>of</strong> the number <strong>of</strong> occurrences <strong>of</strong> a particular value across a<br />
list <strong>of</strong> variables. For example, patients’ medical <strong>data</strong> may conta<strong>in</strong> variables record<strong>in</strong>g the presence or absence<br />
<strong>of</strong> a host <strong>of</strong> symptoms. If the presence <strong>of</strong> a symptom is coded us<strong>in</strong>g ‘1’, the number <strong>of</strong> symptoms exhibited by<br />
the patient is just the number <strong>of</strong> variables with the value ‘1’. Although the Compute facility can deal with this<br />
problem, the Count facility provides a more efficient means <strong>of</strong> perform<strong>in</strong>g this task.<br />
Task 5 Comput<strong>in</strong>g a Count<br />
Objective Create a count <strong>of</strong> the number <strong>of</strong> occurrences <strong>of</strong> a value <strong>in</strong> a set <strong>of</strong> variables.<br />
Problem The <strong>SPSS</strong> <strong>data</strong> file census.sav conta<strong>in</strong>s <strong>data</strong> derived from a national census. The variables<br />
natenvir, natheal, natcrime, natdrug, nateduc, natdef and natmass record the respondents’<br />
views on the adequacy <strong>of</strong> the Government’s spend<strong>in</strong>g <strong>in</strong> several areas <strong>of</strong> national importance,<br />
such as Health and Education.<br />
The responses to these variables are coded as follows:<br />
Code Label<br />
0 Miss<strong>in</strong>g<br />
1 Too Little<br />
2 About Right<br />
3 Too Much<br />
Information Systems Services Page 12 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
A simple (albeit crude) statistic measur<strong>in</strong>g the overall satisfaction with Government spend<strong>in</strong>g<br />
might be the count <strong>of</strong> the number <strong>of</strong> these variables for which the respondent replied ‘About<br />
Right’. This is easily obta<strong>in</strong>ed us<strong>in</strong>g the Count facility.<br />
Activity 5.1 Open the <strong>SPSS</strong> <strong>data</strong> file census.sav.<br />
Activity 5.2 Select Transform … Count Values with<strong>in</strong> Cases from the menu. In the w<strong>in</strong>dow displayed:<br />
(i) Enter spend<strong>in</strong>g as the name <strong>of</strong> the Target Variable.<br />
(ii) Enter Satisfaction with Government Spend<strong>in</strong>g as the Target Label.<br />
(iii) Highlight natenvir, natheal, natcrime, natdrug, nateduc, natdef and natmass <strong>in</strong> the<br />
list <strong>of</strong> variables and use the button to transfer them to the list <strong>of</strong> selected variables.<br />
The w<strong>in</strong>dow should now resemble Figure 8.<br />
Figure 8: Specify<strong>in</strong>g variables <strong>in</strong> the Count dialog box<br />
Activity 5.3 Click on to specify the values that contribute to the count.<br />
(i) Enter 2 (the code for ‘About Right’) <strong>in</strong> the Value box.<br />
Figure 9: Def<strong>in</strong><strong>in</strong>g values for Count<br />
(ii) Click followed by .<br />
Activity 5.4 Click <strong>in</strong> the <strong>in</strong>itial Count w<strong>in</strong>dow to execute the request. Figure 10 shows the result.<br />
Information Systems Services Page 13 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Figure 10: The result <strong>of</strong> the count operation<br />
The variable spend<strong>in</strong>g, added to the end <strong>of</strong> the list <strong>of</strong> variables, conta<strong>in</strong>s the appropriate count (width and<br />
decimal places have been changed to 1 and 0 respectively us<strong>in</strong>g the Variable View w<strong>in</strong>dow s<strong>in</strong>ce the value <strong>of</strong><br />
spend<strong>in</strong>g is a number <strong>in</strong> the range 0 to 7). If required, value labels may be def<strong>in</strong>ed for spend<strong>in</strong>g us<strong>in</strong>g the<br />
Variable View w<strong>in</strong>dow.<br />
We can exam<strong>in</strong>e the distribution <strong>of</strong> values <strong>of</strong> the variable spend<strong>in</strong>g us<strong>in</strong>g Frequencies or Graph. The results<br />
are shown <strong>in</strong> Figure 11. The list <strong>of</strong> frequencies reveals that only 0.5% <strong>of</strong> those surveyed feel that spend<strong>in</strong>g is<br />
‘About Right’ <strong>in</strong> all <strong>of</strong> the areas <strong>of</strong> Government spend<strong>in</strong>g.<br />
Figure 11: Frequency distribution and bar chart for spend<strong>in</strong>g<br />
Figure 12 shows the <strong>SPSS</strong> commands used to perform the transformation.<br />
COUNT spend<strong>in</strong>g = natenvir natheal natcrime natdrug nateduc<br />
natdef natmass (2) .<br />
VARIABLE LABELS<br />
spend<strong>in</strong>g 'Satisfaction with Government Spend<strong>in</strong>g' .<br />
EXECUTE .<br />
Figure 12: COUNT command<br />
Information Systems Services Page 14 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
2.3 Recod<strong>in</strong>g variables<br />
There are several situations where modifications to the values <strong>of</strong> exist<strong>in</strong>g variables need to be made.<br />
Examples <strong>in</strong>clude correct<strong>in</strong>g errors <strong>in</strong> <strong>data</strong>, chang<strong>in</strong>g the codes used for variables and form<strong>in</strong>g categories<br />
based upon values <strong>of</strong> a given variable.<br />
<strong>SPSS</strong> <strong>of</strong>fers two facilities for re-cod<strong>in</strong>g <strong>data</strong>: Recode and Automatic Recode. The Recode facility is the most<br />
versatile <strong>of</strong> the two facilities and allows variables to be re-coded either <strong>in</strong>to the same variable or <strong>in</strong>to different<br />
variables. Automatic Recode addresses the specific problem <strong>of</strong> replac<strong>in</strong>g a set <strong>of</strong> character codes by a set <strong>of</strong><br />
numeric codes. This can be useful as a prelim<strong>in</strong>ary to us<strong>in</strong>g some <strong>of</strong> the statistical modell<strong>in</strong>g procedures which<br />
require categorical variables to be def<strong>in</strong>ed as numerical variables.<br />
The follow<strong>in</strong>g three tasks illustrate both styles <strong>of</strong> Recode and the Automatic Recode facility.<br />
Task 6 Recod<strong>in</strong>g <strong>in</strong>to the same variable<br />
Objective Correct errors <strong>in</strong> a variable.<br />
Comment The Recode facility can be used to change values <strong>of</strong> an exist<strong>in</strong>g variable. This can be useful<br />
for correct<strong>in</strong>g errors <strong>in</strong> the <strong>data</strong>.<br />
Problem Figure 13 shows the frequency distribution for the variable Zodiac. Note the s<strong>in</strong>gle case with a<br />
code <strong>of</strong> 13. This is an <strong>in</strong>correct value that needs correct<strong>in</strong>g.<br />
Figure 13: Frequency table for the variable zodiac<br />
We can use Recode to change the value 13 to the <strong>SPSS</strong> system miss<strong>in</strong>g value.<br />
Activity 6.1 Open the <strong>SPSS</strong> <strong>data</strong> file census.sav if it is not already open.<br />
Activity 6.2 Select Transform>Recode>Into Same Variables.<br />
(i) Select Zodiac from the list <strong>of</strong> variables displayed and click on .<br />
Information Systems Services Page 15 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Activity 6.3 Def<strong>in</strong>e old and new values.<br />
Figure 14: Recode <strong>in</strong>to Same Variables dialog<br />
(i) In the area on the left labelled Old Value, enter the value 13 <strong>in</strong>to the box labelled Value.<br />
(ii) In the area labelled New Value, select .<br />
(iii) Click on . The value 13 will disappear from the box on the left hand side and the<br />
mapp<strong>in</strong>g between the old and the new value will appear <strong>in</strong> the large box on the right<br />
hand side, as shown <strong>in</strong> Figure 15.<br />
Figure 15: Recode show<strong>in</strong>g mapp<strong>in</strong>g <strong>of</strong> old value to new value<br />
Note the use <strong>of</strong> the keyword SYSMIS. This is a reserved name <strong>in</strong> <strong>SPSS</strong> and is used to<br />
represent the value <strong>of</strong> the <strong>SPSS</strong> system miss<strong>in</strong>g value.<br />
(iv) Click .<br />
Activity 6.4 Click to execute the re-code operation.<br />
Figure 16 shows the result <strong>of</strong> re-runn<strong>in</strong>g Frequencies on the modified variable.<br />
Information Systems Services Page 16 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Figure 16: Revised frequency table after recod<strong>in</strong>g zodiac<br />
The entry correspond<strong>in</strong>g to code 13 has been replaced by an entry labelled ‘System’ under the<br />
general head<strong>in</strong>g ‘Miss<strong>in</strong>g’. Note the changes to the columns ‘Valid Percent’ and ‘Cumulative<br />
Percent’ result<strong>in</strong>g from the exclusion <strong>of</strong> this case from the analysis.<br />
Figure 17 shows the <strong>SPSS</strong> commands used to perform the transformation.<br />
RECODE<br />
zodiac (13=SYSMIS) .<br />
EXECUTE .<br />
Figure 17: RECODE <strong>in</strong>to ‘Same Variable’<br />
Task 7 Recod<strong>in</strong>g <strong>in</strong>to a new variable<br />
Objective Recode a variable <strong>in</strong>to a new variable.<br />
Comment A very common requirement <strong>in</strong> <strong>data</strong> analysis is to be able def<strong>in</strong>e a new variable by re-cod<strong>in</strong>g a<br />
range <strong>of</strong> values <strong>of</strong> an exist<strong>in</strong>g variable <strong>in</strong>to a s<strong>in</strong>gle category. This can serve a number <strong>of</strong><br />
purposes. It is <strong>of</strong>ten used as a method <strong>of</strong> reduc<strong>in</strong>g the dimensionality <strong>of</strong> the <strong>data</strong> allow<strong>in</strong>g some<br />
analysis to be performed where otherwise no analysis would be possible. It is also used to alter<br />
the format <strong>of</strong> summary tables and displays.<br />
Problem The chart <strong>in</strong> Figure 18 shows a histogram obta<strong>in</strong>ed for the variable <strong>in</strong>come.<br />
Figure 18: Histogram <strong>of</strong> the variable <strong>in</strong>come<br />
Information Systems Services Page 17 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Suppose that a chart show<strong>in</strong>g fewer bars is considered more appropriate. We can use Recode,<br />
as follows, to def<strong>in</strong>e a new variable, <strong>in</strong>crange, each <strong>of</strong> whose values correspond to a range <strong>of</strong><br />
<strong>in</strong>come values. Observ<strong>in</strong>g that <strong>in</strong>come values range from 24000 to 68200, we will use seven<br />
<strong>in</strong>come ranges, each <strong>of</strong> width 10000.<br />
Activity 7.1 Select Transform>Recode>Into Different Variables …<br />
Activity 7.2 Def<strong>in</strong>e variable names and labels.<br />
(i) Select Income as the variable to be re-coded.<br />
(ii) In the box labelled Name, enter <strong>in</strong>crange for the name <strong>of</strong> the new variable.<br />
(iii) In the box labelled Label, enter Income Range.<br />
(iv) Click on to assign <strong>in</strong>crange as the name <strong>of</strong> the Output variable.<br />
The w<strong>in</strong>dow should now resemble Figure 19.<br />
Figure 19: Recode <strong>in</strong>to different variables<br />
Activity 7.3 Def<strong>in</strong>e the mapp<strong>in</strong>gs between old and new values.<br />
(i) Click on .<br />
Note the different ways <strong>of</strong> specify<strong>in</strong>g values to be re-coded.<br />
Figure 20: Def<strong>in</strong><strong>in</strong>g old and new values<br />
You can specify <strong>in</strong>dividual values, system or user miss<strong>in</strong>g values or ranges <strong>of</strong> values as the<br />
values to be re-coded. In this example, we will re-code ranges <strong>of</strong> values <strong>of</strong> <strong>in</strong>come <strong>in</strong>to s<strong>in</strong>gle<br />
codes 1,2,3 … 7.<br />
Information Systems Services Page 18 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
(ii) Select the Range specification labelled (currently greyed), and enter<br />
10000 <strong>in</strong> the box alongside.<br />
Enter 1 <strong>in</strong> the box labelled New Value and click on .<br />
‘Lowest thru 10000 -> 1’ will appear <strong>in</strong> the Old --> New box. This def<strong>in</strong>es the first <strong>of</strong> our<br />
mapp<strong>in</strong>gs.<br />
(iii) Select the first <strong>of</strong> the Range specifications and enter the values 10000 and 20000 <strong>in</strong> the<br />
boxes provided.<br />
Enter 2 <strong>in</strong> the box labelled New Value and click on .<br />
‘10000 thru 20000 -> 2’ will be added to the list <strong>of</strong> mapp<strong>in</strong>gs.<br />
(iv) Def<strong>in</strong>e similar mapp<strong>in</strong>gs for each <strong>of</strong> the <strong>in</strong>come ranges 20000–30000, 30000–40000,<br />
40000–50000 and 50000–60000 us<strong>in</strong>g codes 3,4,5 and 6.<br />
(v) Def<strong>in</strong>e the f<strong>in</strong>al range us<strong>in</strong>g the Range specification labelled .<br />
Enter 60000 <strong>in</strong> the box alongside, enter 7 <strong>in</strong> the box labelled New Value and click on<br />
.<br />
The specification ‘60000 thru Highest -> 7’ will be added to the list <strong>of</strong> mapp<strong>in</strong>gs.<br />
Note that the apparent ambiguity surround<strong>in</strong>g the mapp<strong>in</strong>g <strong>of</strong> the values which appear as both<br />
the upper bound <strong>of</strong> one range and the lower bound <strong>of</strong> the next range, (such as 10000), is<br />
resolved automatically by <strong>SPSS</strong>. <strong>SPSS</strong> assigns such values to the first <strong>of</strong> the two ranges <strong>in</strong><br />
which the value appears. Hence 10000 will be mapped <strong>in</strong>to the value 1.<br />
Figure 21 shows the status <strong>of</strong> the w<strong>in</strong>dow after complet<strong>in</strong>g all the re-code specifications.<br />
Figure 21: Completed recode specifications<br />
(vi) Click on to return to the earlier w<strong>in</strong>dow.<br />
Activity 7.4 Click to perform the re-code.<br />
The variable <strong>in</strong>crange will appear as a new column on the right <strong>of</strong> the <strong>data</strong> editor w<strong>in</strong>dow.<br />
Information Systems Services Page 19 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Activity 7.5 Use the Variable View w<strong>in</strong>dow to def<strong>in</strong>e value labels for <strong>in</strong>crange as follows:<br />
Code Value label<br />
1 Legacy Dialogs<br />
…Bar on the variable <strong>in</strong>crange. Your graph should resemble Figure 22.<br />
Figure 22: Bar chart <strong>of</strong> the variable <strong>in</strong>crange<br />
Figure 23 shows the <strong>SPSS</strong> commands used to perform the transformation.<br />
RECODE <strong>in</strong>come<br />
(Lowest thru 10000=1) (10000 thru 20000=2)<br />
(20000 thru 30000=3) (30000 thru 40000=4)<br />
(40000 thru 50000=5) (50000 thru 60000=6)<br />
(60000 thru Highest=7) INTO <strong>in</strong>crange .<br />
VARIABLE LABELS <strong>in</strong>crange 'Income Range'.<br />
EXECUTE .<br />
Figure 23: RECODE <strong>in</strong>to ‘Different Variable’<br />
Information Systems Services Page 20 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Task 8 Perform<strong>in</strong>g an automatic recode<br />
Objective To recode a block <strong>of</strong> <strong>data</strong> <strong>in</strong> order to group some categories together.<br />
Comment Categorical variables may be coded us<strong>in</strong>g alphanumeric codes or numeric codes. For example,<br />
a variable such as gender may be coded us<strong>in</strong>g either ‘F’ and ‘M’ or numeric codes 1 and 2. For<br />
the purpose <strong>of</strong> obta<strong>in</strong><strong>in</strong>g descriptive statistics, the choice <strong>of</strong> cod<strong>in</strong>g would not usually be critical.<br />
However, some <strong>of</strong> the statistical procedures for modell<strong>in</strong>g categorical <strong>data</strong>, such as the logl<strong>in</strong>ear<br />
models procedure, require that categorical <strong>data</strong> be encoded us<strong>in</strong>g numerical codes.<br />
Whilst Recode could be used to change each alphanumeric code <strong>in</strong>to a numeric code, this<br />
could be a tedious process if many values required re-cod<strong>in</strong>g. Instead, the Automatic Recode<br />
facility can be used.<br />
Problem In the <strong>data</strong> file census.sav, the variable gender is coded us<strong>in</strong>g ‘f’ and ‘m’. To change these <strong>in</strong>to<br />
numeric codes, we need to create a new variable which is numeric (<strong>SPSS</strong> does not let you recode<br />
<strong>in</strong>to the same variable and change the type <strong>of</strong> the variable).<br />
Activity 8.1 From the Transform menu select Automatic Recode.<br />
Activity 8.2 Select Respondent’s sex [gender] from the list <strong>of</strong> variables and transfer it to the Variable -><br />
New Name box.<br />
Activity 8.3 Enter sex <strong>in</strong> the box below and click on the button. The completed dialog box<br />
should resemble Figure 24 below.<br />
Figure 24: Automatic recode <strong>of</strong> gender<br />
Activity 8.4 Click and <strong>in</strong>spect the output at the bottom <strong>of</strong> the output w<strong>in</strong>dow.<br />
<strong>SPSS</strong> produces a summary table <strong>in</strong> the output w<strong>in</strong>dow (similar to that shown <strong>in</strong> Figure 25)<br />
show<strong>in</strong>g the old and new codes.<br />
Figure 25: Summary <strong>of</strong> automatic recode<br />
Information Systems Services Page 21 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Note that the new variable, sex, has <strong>in</strong>herited the variable label and value labels <strong>of</strong> the old<br />
variable, gender.<br />
Figure 26 shows the <strong>SPSS</strong> commands used to perform the transformation.<br />
AUTORECODE<br />
VARIABLES=gender /INTO sex<br />
/PRINT.<br />
Figure 26: AUTORECODE<br />
Information Systems Services Page 22 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
2.4 Conditional transformations<br />
All the transformations described so far have been examples <strong>of</strong> unconditional transformations. These are<br />
transformations that affect every case <strong>in</strong> the <strong>data</strong> set <strong>in</strong> the same way. Situations arise, however, where the<br />
form <strong>of</strong> the transformation to be carried out varies from case to case. Such transformations are called<br />
conditional transformations. They can be applied to the <strong>data</strong> transformation commands Compute and Recode.<br />
All conditional transformations are specified us<strong>in</strong>g a logical expression. Usually, this expression is constructed<br />
us<strong>in</strong>g the dialog w<strong>in</strong>dow appropriate to the transformation be<strong>in</strong>g carried out. Alternatively, they may be<br />
constructed us<strong>in</strong>g an IF statement as part <strong>of</strong> an <strong>SPSS</strong> program compiled with<strong>in</strong> a syntax w<strong>in</strong>dow.<br />
Task 9 Perform<strong>in</strong>g a conditional compute<br />
Objective Apply a <strong>data</strong> transformation to a subset <strong>of</strong> <strong>data</strong> cases which satisfy a specified condition.<br />
Comment This requires a variation <strong>of</strong> the Compute command used <strong>in</strong> Task 4.<br />
Problem The <strong>data</strong> file employee<strong>data</strong>.sav conta<strong>in</strong>s <strong>data</strong> relat<strong>in</strong>g to employees <strong>of</strong> a company. The<br />
variable salary records the employees’ current salaries. An enlightened new CEO is concerned<br />
that the mean salary <strong>of</strong> female employees is only about $26,000 compared with a figure <strong>of</strong><br />
about $41,400 for men (see Figure 27 below).<br />
Figure 27: Salaries <strong>of</strong> employees broken down by gender<br />
The CEO decides to rectify this situation immediately by award<strong>in</strong>g a ten percent salary<br />
<strong>in</strong>crement to all female employees, irrespective <strong>of</strong> the length <strong>of</strong> service <strong>of</strong> the employee or <strong>of</strong><br />
any other factors such as job function.<br />
We can implement this modification us<strong>in</strong>g a conditional compute transformation<br />
Activity 9.1 Open the <strong>SPSS</strong> <strong>data</strong> file employee<strong>data</strong>.sav.<br />
Activity 9.2 From the Transform menu select Compute Variable.<br />
Activity 9.3 Def<strong>in</strong>e the transformation to be applied<br />
(i) Enter sal<strong>in</strong>cr <strong>in</strong> the box labelled Target Variable.<br />
(ii) Assemble the expression 0.1*salary <strong>in</strong> the Numeric Expression box.<br />
The dialog box should resemble Figure 28.<br />
Information Systems Services Page 23 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Figure 28: Conditional compute – def<strong>in</strong><strong>in</strong>g the transformation<br />
(iii) Click on the button marked and select .<br />
(iv) Select gender from the list <strong>of</strong> variables and transfer it to the box on the right us<strong>in</strong>g the<br />
transfer arrow .<br />
(v) Complete the condition by typ<strong>in</strong>g =‘f’ after the word ‘gender’.<br />
The dialog box should now look like Figure 29.<br />
Figure 29: Conditional compute – specify<strong>in</strong>g the condition<br />
(vi) Click on .<br />
If we were to click on at this stage, the value <strong>of</strong> sal<strong>in</strong>cr for male employees would be<br />
undef<strong>in</strong>ed. It would be assigned the system miss<strong>in</strong>g value. To avoid this, we could have<br />
performed an <strong>in</strong>itial unconditional compute operation, by assign<strong>in</strong>g a value <strong>of</strong> 0 to sal<strong>in</strong>cr for all<br />
employees. An alternative method is to use the Paste button and <strong>in</strong>sert this extra transformation<br />
<strong>in</strong>to the <strong>SPSS</strong> commands.<br />
Information Systems Services Page 24 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Activity 9.4 Def<strong>in</strong>e the action required for cases which do not satisfy the criterion for an <strong>in</strong>crement. This<br />
requires that we arrange for all males to receive an <strong>in</strong>crement <strong>of</strong> 0.<br />
(i) Click on .<br />
An <strong>SPSS</strong> syntax w<strong>in</strong>dow will be displayed conta<strong>in</strong><strong>in</strong>g the follow<strong>in</strong>g commands:<br />
Figure 30: Conditional transformation commands generated by <strong>SPSS</strong><br />
(ii) Directly before these two commands, <strong>in</strong>sert the command:<br />
compute sal<strong>in</strong>cr = 0.<br />
The w<strong>in</strong>dow should now look like Figure 31.<br />
Figure 31: Pre-sett<strong>in</strong>g values <strong>of</strong> computed variable to zero<br />
(iii) Select Run>All to execute the transformations.<br />
When these commands are executed, the value <strong>of</strong> 0 will first be assigned unconditionally to the<br />
variable sal<strong>in</strong>cr for each case. For females only, this value will be overwritten by the value<br />
calculated by the expression <strong>in</strong> the IF statement.<br />
Figure 32 shows a sample <strong>of</strong> the cases after runn<strong>in</strong>g the transformation. The Variable View<br />
w<strong>in</strong>dow was used to change the type <strong>of</strong> the variable sal<strong>in</strong>cr to Dollar and the width and<br />
decimal places to 8 and 0 respectively.<br />
Figure 32: Data w<strong>in</strong>dow show<strong>in</strong>g result <strong>of</strong> conditional transformation<br />
Information Systems Services Page 25 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
3 Work<strong>in</strong>g with subsets <strong>of</strong> <strong>data</strong><br />
A common requirement <strong>in</strong> analysis is to be able to analyse subsets <strong>of</strong> <strong>data</strong>.<br />
<strong>SPSS</strong> allows selections to be made on either a temporary or permanent basis. Temporary selections are<br />
performed us<strong>in</strong>g a filter. When filter<strong>in</strong>g is active, all cases rema<strong>in</strong> <strong>in</strong> the <strong>data</strong> editor w<strong>in</strong>dow but only those<br />
cases filtered are used for analysis. If permanent selection is used, cases not <strong>in</strong>cluded <strong>in</strong> the selected subset<br />
are deleted from the <strong>data</strong> editor w<strong>in</strong>dow.<br />
If only a few analyses are to be carried out on subsets, filter<strong>in</strong>g would be the best approach. However, if<br />
extensive analysis is to be performed on a selected subset, it would be more efficient to use a permanent<br />
selection. If permanent selection is used, however, it is advisable to save a copy <strong>of</strong> the selected <strong>data</strong> to a new<br />
<strong>SPSS</strong> system file before proceed<strong>in</strong>g with analysis. This will protect aga<strong>in</strong>st the danger <strong>of</strong> sav<strong>in</strong>g the <strong>data</strong> to the<br />
orig<strong>in</strong>al <strong>SPSS</strong> <strong>data</strong> file, with a resultant loss <strong>of</strong> <strong>data</strong>.<br />
Data selection is performed us<strong>in</strong>g the Select Cases facility on the Data menu.<br />
When the same analysis is required for more than one subset <strong>of</strong> <strong>data</strong>, the repeated use <strong>of</strong> the Select Cases<br />
facility can become tedious. If the <strong>data</strong> set conta<strong>in</strong>s a variable whose values identify the multiple subsets, the<br />
Split File facility can be used <strong>in</strong>stead to reduce the effort <strong>in</strong> carry<strong>in</strong>g out the analyses on multiple subsets.<br />
Task 10 Select<strong>in</strong>g a subset <strong>of</strong> cases<br />
Objective To select a subset <strong>of</strong> <strong>data</strong> cases for analysis.<br />
Comment This illustrates the use <strong>of</strong> a temporary selection or filter.<br />
Problem The <strong>data</strong> file census.sav conta<strong>in</strong>s <strong>in</strong>formation relat<strong>in</strong>g to <strong>in</strong>dividuals. It is required to restrict<br />
analysis to <strong>in</strong>dividuals who are unmarried and earn<strong>in</strong>g at least £20,000 per annum.<br />
Activity 10.1 Open the <strong>SPSS</strong> <strong>data</strong> file census.sav.<br />
Activity 10.2 Select Select Cases from the Data menu. The Select Cases dialog w<strong>in</strong>dow will be displayed.<br />
Figure 33: Select Cases dialog w<strong>in</strong>dow<br />
Note the different ways <strong>in</strong> which cases may be selected.<br />
The choice ‘If condition is satisfied’ restricts cases to those satisfy<strong>in</strong>g a logical condition<br />
entered by the user. The next choice, ‘Random Sample <strong>of</strong> Cases’, selects a random subset <strong>of</strong><br />
cases based on a pseudo-random number generated by <strong>SPSS</strong>. The third option allows case<br />
Information Systems Services Page 26 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
selection to be based on a range <strong>of</strong> times or on a range <strong>of</strong> case numbers. The f<strong>in</strong>al option, ‘Use<br />
filter variable’, allows case selection to be based upon values <strong>of</strong> an exist<strong>in</strong>g variable. Cases<br />
with values other than 0 or miss<strong>in</strong>g will be selected<br />
Activity 10.3 Specify the criterion for selection.<br />
(i) Select the option .<br />
(ii) Click on .<br />
The criterion for selection to be used <strong>in</strong> this example is ‘unmarried and earn<strong>in</strong>g at least 20,000<br />
per annum’. In terms <strong>of</strong> <strong>SPSS</strong> variables this can be expressed as ‘Marital=5 and Income<br />
>=20000’.<br />
(iii) Use the w<strong>in</strong>dow displayed to assemble the required logical expression.<br />
Figure 34: Select Cases dialog show<strong>in</strong>g selection criterion<br />
(iv) Click on .<br />
Activity 10.4 Execute the transformation.<br />
(i) Click .<br />
Figure 30 shows the effect <strong>of</strong> apply<strong>in</strong>g the selection.<br />
Figure 35: Select Cases dialog show<strong>in</strong>g selection criterion<br />
Cases that fail to meet the selection criterion are <strong>in</strong>dicated by a diagonal l<strong>in</strong>e <strong>in</strong> the row labels <strong>of</strong><br />
the Data Editor, as shown <strong>in</strong> Figure 30, and the status l<strong>in</strong>e also <strong>in</strong>dicates that filter<strong>in</strong>g is turned<br />
on.<br />
The <strong>SPSS</strong> commands used to perform this selection are shown <strong>in</strong> Figure 36.<br />
Information Systems Services Page 27 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
COMPUTE filter_$=(marital = 5 & <strong>in</strong>come >= 20000).<br />
VARIABLE LABEL filter_$ 'marital = 5 & <strong>in</strong>come >= 20000 (FILTER)'.<br />
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.<br />
FORMAT filter_$ (f1.0).<br />
FILTER BY filter_$.<br />
EXECUTE.<br />
Figure 36: Commands used to perform case selection us<strong>in</strong>g filter<strong>in</strong>g<br />
Inspection <strong>of</strong> Figure 36 reveals that <strong>SPSS</strong> creates a special variable called filter_$, coded 1 or<br />
0 depend<strong>in</strong>g on whether or not the case meets the selection criterion.<br />
Activity 10.5 Inspect the result <strong>of</strong> the selection.<br />
(i) Scroll to last column <strong>in</strong> the Data Editor.<br />
(ii) Click on the icon to display value labels. Case number 22, which<br />
satisfied the criterion, will show the label Selected <strong>in</strong> the filter_$ column.<br />
Note that the filter variable filter_$ is a temporary variable and will not be saved on exit from<br />
<strong>SPSS</strong>. To keep the variable for future use, rename it us<strong>in</strong>g the Variable View w<strong>in</strong>dow and save<br />
the <strong>data</strong> before exit<strong>in</strong>g from <strong>SPSS</strong>.<br />
A case selection criterion specified <strong>in</strong> this manner will rema<strong>in</strong> <strong>in</strong> force until a new specification is<br />
issued.<br />
Activity 10.6 Cancel the selection.<br />
(i) Click on the icon.<br />
(ii) Click on .<br />
The Select Cases dialog w<strong>in</strong>dow will be re-displayed.<br />
(iii) Click on the radio button.<br />
(iv) Click .<br />
All cases will now appear as selected <strong>in</strong> the <strong>SPSS</strong> Data Editor.<br />
Task 11 Delet<strong>in</strong>g selected cases<br />
Objective To delete cases from an active <strong>SPSS</strong> <strong>data</strong> set.<br />
Comment This use <strong>of</strong> Select Cases is more drastic <strong>in</strong> effect and results <strong>in</strong> the permanent deletion <strong>of</strong> <strong>data</strong><br />
cases from the active <strong>data</strong> set.<br />
Activity 11.1 Click on the Dialog Recall icon and click on . The Select Cases dialog w<strong>in</strong>dow<br />
will be re-displayed.<br />
Activity 11.2 Select .<br />
The previous selections will still be <strong>in</strong> place.<br />
Activity 11.3 Specify that permanent case selection is required.<br />
(i) Click on the radio button and click .<br />
The Data Editor will now conta<strong>in</strong> just the 13 cases that satisfy the selection criterion.<br />
Information Systems Services Page 28 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
When permanent case selection is used <strong>in</strong> an <strong>SPSS</strong>, great care is needed to ensure that<br />
important <strong>data</strong> is not lost from physical <strong>data</strong> files. If, after mak<strong>in</strong>g a permanent <strong>data</strong> selection,<br />
you were to use the Save command to save the rema<strong>in</strong><strong>in</strong>g <strong>data</strong> cases without specify<strong>in</strong>g a new<br />
file name, the orig<strong>in</strong>al <strong>SPSS</strong> <strong>data</strong> file would be overwritten with the new reduced set <strong>of</strong> <strong>data</strong> with<br />
a result<strong>in</strong>g loss <strong>of</strong> <strong>data</strong>. If you do want to save the selected cases, use Save As <strong>in</strong>stead to save<br />
the cases to a new <strong>data</strong> file.<br />
A similar danger arises if you decide to open a new <strong>data</strong> set. <strong>SPSS</strong> will ask you if you want to<br />
save the current <strong>data</strong>, the subset <strong>of</strong> 13 cases just selected. If you reply ‘Yes’, your orig<strong>in</strong>al <strong>data</strong><br />
file will be overwritten. This is illustrated <strong>in</strong> the next activity.<br />
Activity 11.4 Exercise caution when open<strong>in</strong>g a new <strong>data</strong> file.<br />
(i) Select File>New>Data from the menu.<br />
At this po<strong>in</strong>t EXTREME CAUTION is required!<br />
If you were to reply ‘Yes’ to this, you would overwrite the orig<strong>in</strong>al <strong>data</strong> file! If you did want to<br />
save this subset, you should reply Cancel and use File>Save As to save the <strong>data</strong> to a new file.<br />
Instead <strong>of</strong> sav<strong>in</strong>g this subset, revert to us<strong>in</strong>g the orig<strong>in</strong>al <strong>data</strong> file.<br />
(ii) Click on .<br />
(iii) Select File>Open>Data and select census.sav. Click .<br />
The orig<strong>in</strong>al <strong>data</strong> set, census.sav, will be re-displayed.<br />
Task 12 Sub-group process<strong>in</strong>g<br />
Objective Perform the same analysis on sub-groups <strong>of</strong> <strong>data</strong> def<strong>in</strong>ed by values <strong>of</strong> a specified variable.<br />
Comment Split File provides a more efficient way <strong>of</strong> perform<strong>in</strong>g the same analysis on different subsets <strong>of</strong><br />
<strong>data</strong> than the repeated use <strong>of</strong> Select Cases.<br />
Problem The file census.sav (which should be currently open follow<strong>in</strong>g completion <strong>of</strong> the previous task)<br />
conta<strong>in</strong>s a variable polviews <strong>in</strong>dicat<strong>in</strong>g the political views <strong>of</strong> the respondent. A second variable,<br />
genelec, <strong>in</strong>dicates the party for whom the respondent voted <strong>in</strong> the 1992 general election. It is<br />
required to exam<strong>in</strong>e the distribution <strong>of</strong> votes for each political party by <strong>in</strong>dividuals shar<strong>in</strong>g the<br />
same political views.<br />
Activity 12.1 Select Split File from the Data menu.<br />
Activity 12.2 Def<strong>in</strong>e the groups.<br />
(i) Click the radio button.<br />
(ii) Select Political views [polviews] from the list <strong>of</strong> variables and transfer it to the box<br />
labelled Groups Based on. The display should resemble Figure 37.<br />
Information Systems Services Page 29 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Figure 37: Split File dialog box<br />
The default action is to sort the cases by the split file variable. If the <strong>data</strong> is already sorted, the<br />
sort can by avoided by click<strong>in</strong>g the ‘File is already sorted’ radio button.<br />
(iii) Click .<br />
The Split File On message appears on the status l<strong>in</strong>e <strong>of</strong> the Data Editor.<br />
Activity 12.3 To see the effect <strong>of</strong> select<strong>in</strong>g Split File process<strong>in</strong>g, run a Frequencies analysis on the variable<br />
genelec.<br />
(i) Select Analyze>Descriptive Statistics>Frequencies.<br />
(ii) Select .<br />
(iii) Click on .<br />
A table <strong>of</strong> frequencies, similar to that <strong>in</strong> Figure 38, is produced for each category <strong>of</strong> political<br />
views.<br />
Figure 38: Frequencies on genelec with split file on polviews<br />
Activity 12.4 Before cont<strong>in</strong>u<strong>in</strong>g, turn <strong>of</strong>f split file process<strong>in</strong>g:<br />
(i) Select Split File from the Data menu.<br />
(ii) Select and click .<br />
The full <strong>data</strong> set will now be available for subsequent analyses.<br />
Information Systems Services Page 30 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
4 Merg<strong>in</strong>g <strong>SPSS</strong> <strong>data</strong> sets<br />
Commonly, collections <strong>of</strong> <strong>data</strong> may be built up over time possibly <strong>in</strong>volv<strong>in</strong>g more than one person <strong>in</strong> the<br />
process <strong>of</strong> <strong>data</strong> collection and <strong>data</strong> entry. For <strong>in</strong>stance, if a survey is be<strong>in</strong>g carried out <strong>in</strong> the field, several<br />
<strong>in</strong>terviewers may be <strong>in</strong>volved and if each <strong>in</strong>terviewer enters the <strong>data</strong> directly <strong>in</strong>to an <strong>SPSS</strong> file the files may<br />
have to be merged.<br />
The Merge facility allows <strong>data</strong> from two files to be merged. It can be used either to comb<strong>in</strong>e cases from two<br />
files which share exactly the same variables (the Add Cases option) or to comb<strong>in</strong>e variables from two files (the<br />
Add Variables option). These two <strong>operations</strong> are mutually exclusive.<br />
If the two files to be merged conta<strong>in</strong> exactly the same variables with exactly the same structure, the cases from<br />
the second file will be added at the end <strong>of</strong> the first file.<br />
If the files do not match exactly, <strong>SPSS</strong> will produce a list <strong>of</strong> unpaired variables. This list will conta<strong>in</strong>:<br />
Variables from either <strong>data</strong> file that do not match a variable name <strong>in</strong> the other (<strong>in</strong> this case, pairs<br />
can be created from the unpaired set and these pairs can be <strong>in</strong>cluded <strong>in</strong> the new merged file).<br />
Variables def<strong>in</strong>ed as numeric <strong>data</strong> <strong>in</strong> one file and str<strong>in</strong>g <strong>data</strong> <strong>in</strong> the other file. Numeric variables<br />
cannot be merged with str<strong>in</strong>g variables<br />
Str<strong>in</strong>g variables <strong>of</strong> unequal width (<strong>in</strong> this case it would be necessary to modify the structure <strong>of</strong><br />
such variables <strong>in</strong> one <strong>of</strong> the files).<br />
The new <strong>data</strong> file will conta<strong>in</strong> all those variables that match exactly, all the new pairs selected, and any<br />
unpaired variables that have been matched to create new pairs.<br />
Any rema<strong>in</strong><strong>in</strong>g unpaired variables which are <strong>in</strong>cluded will conta<strong>in</strong> miss<strong>in</strong>g <strong>data</strong> for the cases from the file that<br />
does not conta<strong>in</strong> that variable.<br />
Before merg<strong>in</strong>g the files, any unwanted variables can be deselected.<br />
Task 13 Add<strong>in</strong>g cases<br />
Objective To add <strong>data</strong> cases to the work<strong>in</strong>g file from another <strong>SPSS</strong> <strong>data</strong> file.<br />
Comment This illustrates the use <strong>of</strong> the Add Cases facility under Merge.<br />
Problem The <strong>data</strong> file possums.sav conta<strong>in</strong>s 462 <strong>data</strong> cases and 16 variables. The file possums1.sav<br />
conta<strong>in</strong>s 2 <strong>data</strong> cases for the same variables. It is necessary to comb<strong>in</strong>e the two sets <strong>of</strong> cases<br />
<strong>in</strong>to one <strong>data</strong> file.<br />
Activity 13.1 Open the file possums1.sav from the directory <strong>in</strong> which your <strong>SPSS</strong> example <strong>data</strong> sets are<br />
stored.<br />
Activity 13.2 Select Data>Merge Files>Add Cases. The Merge Files menu will be displayed.<br />
Information Systems Services Page 31 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Figure 39: Merge Files menu<br />
Activity 13.3 Choose the file possums.sav. Click on Open.<br />
Figure 40: Select<strong>in</strong>g the file to be merged<br />
The follow<strong>in</strong>g dialog w<strong>in</strong>dow will be opened.<br />
Figure 41: Variable specification <strong>in</strong> Merge<br />
Information Systems Services Page 32 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Note that the left hand w<strong>in</strong>dow <strong>of</strong> the dialog box has listed a pair <strong>of</strong> variables which do not<br />
match. The reason for this is that the declared width <strong>of</strong> the variable complics differs <strong>in</strong> the two<br />
files.<br />
Activity 13.4 Try to pair the variables by click<strong>in</strong>g on the first variable, complics
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Note: If a column displays asterisks (*****), simply widen the column to see the <strong>data</strong>. This can<br />
be done by po<strong>in</strong>t<strong>in</strong>g the mouse to the boundary between the cells, and dragg<strong>in</strong>g to the right<br />
when the mouse po<strong>in</strong>ter changes to a double headed arrow.<br />
Figure 44: Widen<strong>in</strong>g the column width<br />
A practical consideration <strong>in</strong> perform<strong>in</strong>g a merge is the order <strong>in</strong> which the two files are specified.<br />
They can be specified <strong>in</strong> either order, but the choice is important because it can affect the<br />
variable def<strong>in</strong>itions assigned to common variables <strong>in</strong> the merged file. For example, if a numeric<br />
variable common to both files has a decimal places value <strong>of</strong> 0 <strong>in</strong> the active file and 2 <strong>in</strong> the file<br />
be<strong>in</strong>g added, the value assigned <strong>in</strong> the merged file would be 0. If the active file is our master<br />
file, then this might be acceptable. But if the file be<strong>in</strong>g added was our master file, then this<br />
would probably not be acceptable s<strong>in</strong>ce it would mean that we are over-rid<strong>in</strong>g def<strong>in</strong>itions which<br />
we have laid down <strong>in</strong> our master file.<br />
The conclusion might be, therefore, to always open the master file first and to add cases from<br />
the second file. Whilst this would prevent the possibility <strong>of</strong> master def<strong>in</strong>itions be<strong>in</strong>g changed, it<br />
would also prevent us from chang<strong>in</strong>g any other def<strong>in</strong>itions <strong>in</strong> the active file prior to the merge.<br />
Accord<strong>in</strong>gly, <strong>in</strong> this example, we have chosen to open the file conta<strong>in</strong><strong>in</strong>g the new cases first so<br />
that we can change the active file prior to the merge. This approach does require, <strong>of</strong> course,<br />
that care is taken to ensure that master def<strong>in</strong>itions are not <strong>in</strong>advertently changed. The motto<br />
must therefore be: caution.<br />
Task 14 Add<strong>in</strong>g variables<br />
Objective To add variables to the work<strong>in</strong>g file from another <strong>SPSS</strong> <strong>data</strong> file.<br />
Comment This illustrates the use <strong>of</strong> the Add Variables under Merge.<br />
Problem The <strong>data</strong> file possums1.sav conta<strong>in</strong>s 2 <strong>data</strong> cases and 16 variables. The file possums2.sav<br />
conta<strong>in</strong>s 3 new variables plus the common key variable patid with values 600 and 601. It is<br />
necessary to comb<strong>in</strong>e the two sets <strong>of</strong> variables <strong>in</strong>to one <strong>data</strong> file.<br />
Activity 14.1 Open the file possums1.sav from the directory <strong>in</strong> which your <strong>SPSS</strong> example <strong>data</strong> sets are<br />
stored.<br />
Activity 14.2 Select Data>Merge Files>Add Variables.<br />
Activity 14.3 Choose the file possums2.sav.<br />
This file conta<strong>in</strong>s three variables id, height and weight with entries for patients with patid 600<br />
and 601.<br />
The follow<strong>in</strong>g dialog w<strong>in</strong>dow will be opened.<br />
Information Systems Services Page 34 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Version 6 (November 2009)<br />
Figure 45: Add Variables dialog<br />
The box labelled ‘Excluded Variables’ lists the variable patid which is common to both files.<br />
Activity 14.4 Select ‘Match cases on key variables <strong>in</strong> sorted files’, highlight the variable patid and select it<br />
as the key variable <strong>in</strong>to the ‘Key Variables’ box. Click .<br />
A warn<strong>in</strong>g message, shown below, will be displayed warn<strong>in</strong>g you <strong>of</strong> the need for each file to be<br />
pre-sorted on values <strong>of</strong> the key variable(s). In this case, we are safe to proceed but <strong>in</strong> if your<br />
<strong>data</strong> is not pre-sorted, and you still choose to proceed, some <strong>in</strong>formation may be lost <strong>in</strong> the<br />
merge.<br />
Click .<br />
Figure 46: Warn<strong>in</strong>g to ensure <strong>data</strong> is sorted by key variables<br />
Activity 14.5 (i) Scroll to the right <strong>of</strong> the <strong>data</strong> view. The variables height and weight will appear at the<br />
right-hand end <strong>of</strong> the <strong>data</strong> vector.<br />
Figure 47: Merged <strong>data</strong> cases<br />
Activity 14.6 (ii) Use File>Save As to save the merged <strong>data</strong> to a new file (use the filename allvars.sav).<br />
Information Systems Services Page 35 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}
<strong>Basic</strong> <strong>data</strong> <strong>operations</strong> <strong>in</strong> <strong>SPSS</strong> for W<strong>in</strong>dows 17<br />
Task 15 F<strong>in</strong>ish<strong>in</strong>g <strong>SPSS</strong><br />
Objective Term<strong>in</strong>ate the <strong>SPSS</strong> session.<br />
Comments You should always quit any computer program when you have f<strong>in</strong>ished your session. Never<br />
switch <strong>of</strong>f the computer when <strong>SPSS</strong> is still runn<strong>in</strong>g unless absolutely necessary as this will<br />
corrupt your spreadsheet files. Also never leave a computer whilst you are still logged on to it as<br />
others may use your account and could potentially damage your files.<br />
Activity 15.1 Select the Exit option from the File menu.<br />
Activity 15.2 <strong>SPSS</strong> will ask you if you want to save the contents <strong>of</strong> various w<strong>in</strong>dows before it lets you quit, so<br />
a dialogue box similar to Figure 48 appears:<br />
Figure 48: Sav<strong>in</strong>g the <strong>SPSS</strong> <strong>data</strong><br />
Click either or depend<strong>in</strong>g on whether you want the <strong>data</strong> saved. If you choose to<br />
save it the file is saved to the same name on disk. Once the <strong>data</strong>sheet has been saved it can be<br />
reopened by <strong>SPSS</strong> and edited.<br />
You will also be asked if you want to save the contents <strong>of</strong> your output w<strong>in</strong>dow – usually the<br />
numerical results <strong>of</strong> all your analysis work. Normally you would want to save this, but here you<br />
should click .<br />
Once you have done this <strong>SPSS</strong> will quit. If you click on <strong>in</strong> any <strong>of</strong> the boxes you will be<br />
returned to your unsaved work to cont<strong>in</strong>ue.<br />
Information Systems Services Page 36 <strong>of</strong>36<br />
Version 4.1 (December 2006) tut113_vn17nov2009.doc}