A Guide to the Language (PPL) P-STAT Programming - P-STAT, Inc.
A Guide to the Language (PPL) P-STAT Programming - P-STAT, Inc.
A Guide to the Language (PPL) P-STAT Programming - P-STAT, Inc.
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
P-<strong>STAT</strong><br />
A <strong>Guide</strong> <strong>to</strong> <strong>the</strong><br />
P-<strong>STAT</strong> <strong>Programming</strong><br />
<strong>Language</strong> (<strong>PPL</strong>)<br />
®<br />
$C.1<br />
P-<strong>STAT</strong><br />
®
P-<strong>STAT</strong>: A <strong>Guide</strong> <strong>to</strong> <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong> (<strong>PPL</strong>),<br />
Second Edition January 2013<br />
This publication corresponds <strong>to</strong> P-<strong>STAT</strong> Version 3, January 2013. This publication is designed for those<br />
already familiar with <strong>the</strong> P-<strong>STAT</strong> system, ei<strong>the</strong>r from <strong>the</strong> menu or <strong>the</strong> command language interface and is<br />
intended <strong>to</strong> be a complete description of <strong>the</strong> programming language.<br />
Please direct any questions <strong>to</strong>:<br />
P-<strong>STAT</strong>, <strong>Inc</strong>.<br />
230 Lambertville-Hopewell Rd.<br />
Hopewell, New Jersey 08525-2809<br />
U.S.A.<br />
Telephone: 609-466-9200<br />
Fax: 609-466-1688<br />
Internet: support@pstat.com<br />
Web Page URL: http://www.pstat.com<br />
All rights reserved. Except as permitted under <strong>the</strong> United States Copyright Act of 1976, no part of this<br />
publication may be reproduced or distributed in any form or by any means, electronic or mechanical,<br />
including pho<strong>to</strong>copying, recording, or any information s<strong>to</strong>rage and retrieval system without <strong>the</strong> prior written<br />
permission of P-<strong>STAT</strong>, <strong>Inc</strong>.<br />
P-<strong>STAT</strong> is a registered trademark of P-<strong>STAT</strong>, <strong>Inc</strong>. Windows is a registered trademark of MicroSoft Corp.<br />
Copyright © 1972-2013 P-<strong>STAT</strong>, <strong>Inc</strong>. Printed in <strong>the</strong> US. Published by P-<strong>STAT</strong>, <strong>Inc</strong>.
<strong>PPL</strong>: Introduction<br />
i<br />
CONTENTS<br />
THE VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2<br />
VECTORS AND ARRAYS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2<br />
THE COMMANDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.3<br />
P-<strong>STAT</strong> SYSTEM FILE: CURRENT OR PREVIOUS. . . . . . . . . . . . . . . . . . . . . . .1.3<br />
ORGANIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.4<br />
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
CASE AND VARIABLE SELECTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1<br />
Case Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3<br />
Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3<br />
Using Ranges in Selection Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.4<br />
Multiple Variable Selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.5<br />
Reordering Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.5<br />
Masks and Wildcards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.6<br />
MODIFYING AND GENERATING VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . .2.7<br />
Modifying Variables with SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.7<br />
Using INCREASE and DECREASE Instead of SET . . . . . . . . . . . . . . . . . . . . .2.8<br />
Creating New Variables with GENERATE. . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.9<br />
Numeric Opera<strong>to</strong>rs and <strong>the</strong>ir Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.9<br />
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.10<br />
LOGICAL SELECTION OF CASES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.11<br />
Logical Opera<strong>to</strong>rs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.12<br />
The Special Opera<strong>to</strong>rs MISSING and GOOD. . . . . . . . . . . . . . . . . . . . . . . . . .2.12<br />
AND and OR Relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.13<br />
Common Errors in Complex Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.14<br />
AMONG and NOTAMONG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.15<br />
MISSING DATA with AMONG and NOTAMONG . . . . . . . . . . . . . . . . . . . .2.16<br />
INRANGE and OUTRANGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.16<br />
ANY and ALL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.16<br />
INSTRUCTIONS AFTER IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.17<br />
Conditional Case Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.18
Conditional Modification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.18<br />
Three-Way Logic of IF Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.18<br />
Renaming Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.19<br />
<strong>PPL</strong>: MODIFY, PROCESS and PUT<br />
FILE MODIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1<br />
How Modifications Are Processed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1<br />
Temporary Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2<br />
Permanent Modifications and <strong>the</strong> MODIFY Command . . . . . . . . . . . . . . . . . . .3.2<br />
TEMPLATE Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3<br />
On-<strong>the</strong>-Fly Concatenation of Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.4<br />
Repeating Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.6<br />
OTHER INSTRUCTIONS AFTER IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.7<br />
GOTO To Process Modifications Selectively . . . . . . . . . . . . . . . . . . . . . . . . . . .3.7<br />
Cleaning Data With PUT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.8<br />
Report Writing Using PUT and PUTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.9<br />
STANDALONE <strong>PPL</strong> COMMANDS AND PROCESS . . . . . . . . . . . . . . . . . . . . . .3.12<br />
Scratch Variables and Standalone <strong>PPL</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.12<br />
The PROCESS Command and More PUT Information . . . . . . . . . . . . . . . . . .3.13<br />
COMMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.14<br />
QUITTING A PROCESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.14<br />
<strong>PPL</strong>: NCOT and RECODE<br />
The NCOT Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1<br />
The RECODE Function: Single Argument Usage . . . . . . . . . . . . . . . . . . . . . . .4.3<br />
COMPLEX RECODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.6<br />
RECODE: The Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.6<br />
The RECODE Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.6<br />
Defining a Set of Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.8<br />
The Result Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.9<br />
RECODE or IF/SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.10<br />
RECODE Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.11<br />
XRECODE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.12<br />
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
DO LOOPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.1<br />
DO USING a Variable List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.2<br />
DO Stepping Through a Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.4<br />
ii
DO Loops: O<strong>the</strong>r Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.6<br />
GENERATE AND RENAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.7<br />
Using GENERATE in DO Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.8<br />
Using RENAME in DO Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.9<br />
Masks for RENAME and GENERATE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.9<br />
IF-THEN-ELSE BLOCKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.14<br />
IF-THEN-ELSE: O<strong>the</strong>r Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.15<br />
IF-THEN-ELSE: Ano<strong>the</strong>r Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.18<br />
<strong>PPL</strong>: Functions and System Variables<br />
ONE-EXPRESSION FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.1<br />
Rounding Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2<br />
Floor and Ceiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2<br />
Exponential and Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.3<br />
The Fac<strong>to</strong>rial Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.3<br />
Creating Dummy Variables with <strong>the</strong> LOC Function . . . . . . . . . . . . . . . . . . . . . .6.3<br />
Creating a Single Variable from Dummy Variables . . . . . . . . . . . . . . . . . . . . . .6.5<br />
LIST FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.6<br />
Numeric List Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.6<br />
Character and Numeric List Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.7<br />
SPECIAL FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.7<br />
The LAG and DIF Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.8<br />
Modular (Remainder) Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.9<br />
Setting PLACES in Specific Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.10<br />
Extracting Digits Using NUMEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.10<br />
COMBINATIONS of N things, K at a time . . . . . . . . . . . . . . . . . . . . . . . . . . .6.11<br />
EXPAND ONE OR MORE VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.11<br />
Overall Syntax of a <strong>PPL</strong> EXPAND Statement . . . . . . . . . . . . . . . . . . . . . . . . .6.12<br />
Numeric Input Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.12<br />
Character Input Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.12<br />
The GENERATE or GEN phrase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.12<br />
Options With Several Input Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.13<br />
Options When <strong>the</strong> Input Variables Are Character . . . . . . . . . . . . . . . . . . . . . . .6.13<br />
SYSTEM VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.14<br />
Referencing Good and Missing Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.14<br />
Selecting Variables with .NEW. and .OTHERS.. . . . . . . . . . . . . . . . . . . . . . . .6.15<br />
Referencing <strong>the</strong> Number of Variables in <strong>the</strong> File . . . . . . . . . . . . . . . . . . . . . . .6.15<br />
Referencing <strong>the</strong> Current Case Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.16<br />
Referencing Numeric and Character Variables . . . . . . . . . . . . . . . . . . . . . . . . .6.17<br />
iii
Accessing <strong>the</strong> PUT Counter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.18<br />
File, Date, Page and Line References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.18<br />
Random Number and Distribution Functions<br />
RANDOM NUMBER FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.1<br />
Normal and Uniform Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.2<br />
Binary and User's Tabled Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.3<br />
DISTRIBUTION FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.3<br />
Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.4<br />
Inverse Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.5<br />
THE FUZZY EQUALS PROBLEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.6<br />
The Fuzzy Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.7<br />
Fuzzy Logical Opera<strong>to</strong>rs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.7<br />
How Fuzzy Opera<strong>to</strong>rs Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.8<br />
FUZZY Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.8<br />
<strong>PPL</strong>: Across-Case Modifications<br />
BASIC ACROSS-CASE AGGREGATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.2<br />
Accessing FIRST and LAST Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.2<br />
Scratch Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.3<br />
The Permanent Vec<strong>to</strong>r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.5<br />
User-defined Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.7<br />
Interaction of FIRST, LAST and O<strong>the</strong>r <strong>PPL</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . .8.10<br />
Example: Checking a List of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.11<br />
Example: Selecting a Block of Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.12<br />
THE SPLIT FUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.12<br />
Splitting a Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.12<br />
CARRYing Identifying Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.13<br />
Selecting Variables To USE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.14<br />
Defining New Variables with CREATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.14<br />
Wildcard Notation and Masks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.15<br />
INDEXing Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.16<br />
Ordering Variables with STEP and CYCLE . . . . . . . . . . . . . . . . . . . . . . . . . . .8.17<br />
How SPLIT Interacts With O<strong>the</strong>r <strong>PPL</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.19<br />
THE COLLECT FUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.19<br />
Collecting BY Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.20<br />
CARRYing Common Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.21<br />
Ordering Cases with INDEX and SORT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.21<br />
COLLECT System Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.27<br />
iv
<strong>PPL</strong>: Modification of Character Variables<br />
BASIC CHARACTER PROCEDURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.1<br />
Generating New Character Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.1<br />
Modifying Existing Character Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.2<br />
Logical Selection of Character Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.2<br />
Locating Non-Missing Character Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.3<br />
CHARACTER OPERATORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.3<br />
The CONTAINS and XCONTAINS Opera<strong>to</strong>rs . . . . . . . . . . . . . . . . . . . . . . . . .9.4<br />
The Concatenate Opera<strong>to</strong>r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.4<br />
The Trim Concatenate Opera<strong>to</strong>r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.5<br />
Exactly Equal Opera<strong>to</strong>r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.5<br />
CHARACTER FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.6<br />
Centering and Justifying Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.7<br />
Changing <strong>the</strong> Case of Strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.7<br />
Length and Size of Strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.8<br />
Locating Strings Within Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.8<br />
Extracting Substrings and Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.9<br />
Blanking Out and Changing Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.10<br />
Squeezing Out Specified Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.11<br />
Trimming Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.12<br />
Padding Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.12<br />
Converting Numbers <strong>to</strong> Characters and Vice Versa . . . . . . . . . . . . . . . . . . . . .9.13<br />
Character/Integer Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.14<br />
Complex Character Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.16<br />
Using <strong>the</strong> Name of a Variable as a Character Value . . . . . . . . . . . . . . . . . . . . .9.17<br />
The MATCHES and XMATCHES Opera<strong>to</strong>rs. . . . . . . . . . . . . . . . . . . . . . . . . .9.18<br />
MATCHES: Meta-Characters and Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.20<br />
CLAG: A Lag using a character argument . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.23<br />
CONCATENATION OF CHARACTER CONSTANTS . . . . . . . . . . . . . . . . . . . .9.23<br />
<strong>PPL</strong>: Date and Time Commands and Functions<br />
DATE ANDTIME FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.1<br />
Functions Which create or Use Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.1<br />
Six Simple Date Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.3<br />
DATE and TIME function details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.4<br />
DATE AND TIME COMMANDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.13<br />
The DATE.LANGUAGE Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.14<br />
The DATE.ORDER Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.14<br />
v
Changing <strong>the</strong> Case and Length of names. . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.15<br />
Month and Weekday Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.15<br />
DATE LOGICAL OPERATORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.16<br />
FORMAT.DATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.17<br />
TEXTWRITER: Report Writing<br />
OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1<br />
Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2<br />
The “No-Break” Character . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2<br />
<strong>PPL</strong> INSTRUCTIONS PUT AND PUTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2<br />
Character Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.3<br />
Values of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.3<br />
Expressions and Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.4<br />
A Sample Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.4<br />
Comments in <strong>PPL</strong> Clauses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.6<br />
OPTIONAL IDENTIFIERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.6<br />
CASE and STREAM: The Modes of Operation . . . . . . . . . . . . . . . . . . . . . . . .11.6<br />
JUSTIFY, BLANKS, PUTL.CHAR and SPREAD. . . . . . . . . . . . . . . . . . . . . .11.7<br />
MARGIN, LEADBLANK and WIDTH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.7<br />
Optional Files: LABELS and OUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.8<br />
CONTROL WORDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.8<br />
Control Words <strong>to</strong> Produce a Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.8<br />
Positioning Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.9<br />
Positioning Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.10<br />
Positioning Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.10<br />
Labeling Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.13<br />
Specifying Missing Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.14<br />
A Complex Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.15<br />
Control Word Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.18<br />
COMPARING TEXTWRITER AND OTHER COMMANDS . . . . . . . . . . . . . .11.19<br />
OPTIONAL IDENTIFIERS: PostScript. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.20<br />
PostScript Page Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.21<br />
Setting <strong>the</strong> Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.21<br />
TEXTWRITER Control Words: The Fonts. . . . . . . . . . . . . . . . . . . . . . . . . . .11.22<br />
Control Words: Positioning <strong>the</strong> Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.24<br />
Indenting Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.27<br />
Colors in PostScript Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.28<br />
Underlining Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.29<br />
vi
P-<strong>STAT</strong> MACROS<br />
MACRO FORMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.1<br />
Types of Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.1<br />
S<strong>to</strong>ring and Activating Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.2<br />
Comments Within a Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.3<br />
Macros With Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.3<br />
Using Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.5<br />
Default Values for Arguments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.6<br />
Nested Instream Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.7<br />
Instream Macros in a Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.8<br />
Instream Macros in Subcommands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.8<br />
Using Lots of Instream Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.10<br />
MACRO COMMANDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.11<br />
CORRECTING MACROS IN THE EDITOR . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.11<br />
BLOCK MACROS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.12<br />
Executing a Block Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.12<br />
Macro Substitution Using Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.14<br />
Scope of Temporary Scratch Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.14<br />
Scratch Variables and Nested Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.14<br />
Temporary Files in Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.15<br />
Subcommands in Macros. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.15<br />
Conditional Execution of Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.18<br />
DIALOG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.19<br />
Format of <strong>the</strong> DIALOG command. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.21<br />
Does <strong>the</strong> File Exist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.22<br />
SUBFILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.23<br />
SUBFILES Optional Identifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.23<br />
SUBFILES Looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.25<br />
SUBFILES System Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.26<br />
vii
viii
ix<br />
FIGURES<br />
Basic Types of Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2<br />
Format of <strong>the</strong> SET Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8<br />
Format of <strong>the</strong> IF Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11<br />
AND and OR: Evaluations of Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.13<br />
IF and Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.17<br />
Permanent Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3<br />
Template Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4<br />
Renaming All <strong>the</strong> Variables in a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5<br />
Repeating Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6<br />
Using GOTO and PUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8<br />
Using PUT To Produce a Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10<br />
Accessing <strong>the</strong> Variable Name Within a Report . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11<br />
PROCESS: Counting Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.13<br />
NCOT: Numeric Recodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2<br />
Multi-Variable RECODE With Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9<br />
RECODE or IF/SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11<br />
EQ and NE Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12<br />
Simple DO Loop with a List of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2<br />
DO With Two Scratch Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3<br />
DO: Range and Stepsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4<br />
DO Loops: An Example of Each Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5<br />
Labelled DO, EXITDO and NEXTDO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6<br />
Rename Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10<br />
GENERATE: Generated Versus Original . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12<br />
Dynamic Array, Wildcard, Prefix and GENERATE . . . . . . . . . . . . . . . . . . . . . . . 5.12<br />
Complex MASK: Generate Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13<br />
IF or IF-THEN-ELSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14<br />
IF-THEN with F.ELSE and M.ELSE in a Simple Hot Deck Example . . . . . . . . . 5.16<br />
IF-THEN-ELSE: The Data and <strong>the</strong> Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.18
IF-THEN-ELSE Block with Nested IF and a DO Loop . . . . . . . . . . . . . . . . . . . . . 5.19<br />
Calculating Variable Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4<br />
Using LAG and DIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8<br />
Interaction of LAG and IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9<br />
EXPAND Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.14<br />
Showing <strong>the</strong> Differences Between .N., .HERE. and .USED. . . . . . . . . . . . . . . . 6.17<br />
FIRST and LAST with Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2<br />
Using Scratch Variables and FIRST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4<br />
Creating a Summary Case with FIRST and LAST . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5<br />
Moving Values Between Files with <strong>the</strong> P Vec<strong>to</strong>r . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6<br />
DEFINE.ARRAY and SHOW.ARRAYS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8<br />
One-dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.9<br />
Two-dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.10<br />
Checking Variables Using PUT and Scratch Variables . . . . . . . . . . . . . . . . . . . . . 8.11<br />
Using CARRY in <strong>the</strong> SPLIT Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.13<br />
Selecting Variables for SPLIT with USE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.14<br />
Naming <strong>the</strong> New Variables with CREATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.15<br />
Multiple CREATE Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.16<br />
Producing an Index Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.17<br />
Using STEP and CYCLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.18<br />
A Simple COLLECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.19<br />
Collecting BY Group Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.20<br />
Collecting Cases in a Specified Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.21<br />
Sorting <strong>the</strong> Collected Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.22<br />
A Complex Modification Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.23<br />
A Second Complex Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.25<br />
Before and After COLLECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.26<br />
The XEQ Opera<strong>to</strong>r for Tests that Respect Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6<br />
The CVAL Function for Bells and Whistles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.15<br />
Nesting Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.16<br />
Using VARNAME, SPLIT and COLLECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.18<br />
File of Character Data for MATCHES and XMATCHES . . . . . . . . . . . . . . . . . . . 9.19<br />
MATCHES and XMATCHES: Meta-Characters . . . . . . . . . . . . . . . . . . . . . . . . . 9.21<br />
DATE Logical Opera<strong>to</strong>rs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.16<br />
x
FORMAT.DATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.18<br />
FORMAT.DATE Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.19<br />
Producing a Report: The Input Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4<br />
Producing a Report: The TEXTWRITER Command . . . . . . . . . . . . . . . . . . . . . . 11.5<br />
Producing a Report: The Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6<br />
A Form Letter: The Input File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.9<br />
A Form Letter: The TEXTWRITER Command . . . . . . . . . . . . . . . . . . . . . . . . . 11.11<br />
A Form Letter: One Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.12<br />
TEXTWRITER: Displaying all <strong>the</strong> Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.14<br />
A Complex Report: The Input and Labels Files . . . . . . . . . . . . . . . . . . . . . . . . . 11.15<br />
A Complex Report: The Report (Two Pages) . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.16<br />
A Complex Report: The TEXTWRITER Command . . . . . . . . . . . . . . . . . . . . . 11.17<br />
PostScript Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.21<br />
Justification in PostScript Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.23<br />
Changing Fonts Text in a PostScript Paragraph . . . . . . . . . . . . . . . . . . . . . . . . . . 11.23<br />
Font Changes in a Justified PostScript Paragraph . . . . . . . . . . . . . . . . . . . . . . . . 11.24<br />
TEXTWRITER: Tabular Ouput with PostScript . . . . . . . . . . . . . . . . . . . . . . . . . 11.25<br />
PostScript: Tables with Proportional Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.26<br />
Indenting <strong>the</strong> Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.28<br />
Underlining <strong>the</strong> Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.30<br />
Activating Three Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3<br />
Block Macro With Keyword Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4<br />
Block Macro With Positional Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4<br />
Macro With Positional Arguments and Default Values . . . . . . . . . . . . . . . . . . . . . 12.6<br />
Macros Can Call Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7<br />
Instream Macros in Subcommand Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.9<br />
Lots of Instream Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.10<br />
Defining a Block Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.12<br />
The RUN Command and Partial Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.13<br />
Macros: Temporary File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.15<br />
Macros: Supplying Subcommands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.16<br />
Macro with Conditional Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.17<br />
Macros: Reversing <strong>the</strong> Order of Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.18<br />
Macros: DIALOG Provides an Interactive Front End . . . . . . . . . . . . . . . . . . . . . 12.19<br />
Macro With SUBFILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.23<br />
The SUBFILE Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.24<br />
xi
1<br />
<strong>PPL</strong>:<br />
Introduction<br />
P-<strong>STAT</strong> accepts information in many different forms. Information may be numeric, as is average yearly rainfall<br />
or <strong>to</strong>tal au<strong>to</strong>mobile production, or it may be text or character, as is a name or an address. P-<strong>STAT</strong> accepts information<br />
from a variety of sources, including disk, tape, or <strong>the</strong> users terminal and holds this information in a<br />
compressed rectangular format called a “P-<strong>STAT</strong> system file”. This file is composed of rows (cases or records)<br />
which contain one or more variables (fields). The first step in using P-<strong>STAT</strong> is <strong>to</strong> convert your data in<strong>to</strong> P-<strong>STAT</strong><br />
system file format. The commands which create a P-<strong>STAT</strong> system file are described in “P-<strong>STAT</strong> Introduc<strong>to</strong>ry<br />
Manual” and “P-<strong>STAT</strong>: Utility Commands”. They include:<br />
1. MAKE when <strong>the</strong> data are in ASCII format on an external disk or tape, or when full<br />
screen capabilities are not available on <strong>the</strong> terminal.<br />
2. TEXTFILE.IN when <strong>the</strong> data are in ASCII (text) format delimited by tabs, commas or blanks.<br />
The first row of data may contain variable labels.<br />
3. FILE.IN primarily used when <strong>the</strong> data in an external file are in a binary format.<br />
4. SPSS.IN when <strong>the</strong> data are in SPSS export format.<br />
The P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong>, called <strong>PPL</strong> for short, is a language within <strong>the</strong> P-<strong>STAT</strong> program. Both<br />
simple and complex manipulations may be done using <strong>PPL</strong> instructions, opera<strong>to</strong>rs, functions and system variables.<br />
<strong>PPL</strong> permits logical testing and selection of cases and variables, modification of existing variables, and generation<br />
of new variables.<br />
<strong>PPL</strong> may be used <strong>to</strong> modify any P-<strong>STAT</strong> system file as that file is read by any command. Modifications are<br />
temporary unless a new output file is produced. Both numeric and character variables may be tested, selected and<br />
modified. Most of <strong>the</strong> basic <strong>PPL</strong> instructions and opera<strong>to</strong>rs are applicable <strong>to</strong> ei<strong>the</strong>r numeric or character variables.<br />
However, <strong>the</strong>re is a class of functions such as SQRT, <strong>the</strong> square root function, which only applies <strong>to</strong> numeric data,<br />
and <strong>the</strong>re is ano<strong>the</strong>r class of functions such as MATCHES, <strong>the</strong> string matching function, which only applies <strong>to</strong><br />
character data.<br />
An important concept in understanding how <strong>PPL</strong> works is that an input file is not changed in any way by <strong>the</strong><br />
programming language statements. When you request a given P-<strong>STAT</strong> command such as LIST or MODIFY:<br />
1. <strong>the</strong> P-<strong>STAT</strong> executive routines determine which command is required and passes control <strong>to</strong> that<br />
command;<br />
2. <strong>the</strong> command prepares <strong>to</strong> do its job and <strong>the</strong>n, when it is ready, asks <strong>the</strong> executive routines for a row<br />
of data from <strong>the</strong> input file;<br />
3. <strong>the</strong> executive routines determine if <strong>the</strong>re is any <strong>PPL</strong>. It is <strong>the</strong>se routines which create new variables,<br />
recode existing variables, and process any logical selections;<br />
4. if, during <strong>the</strong> processing of <strong>the</strong> <strong>PPL</strong>, <strong>the</strong> executive routines determine that <strong>the</strong> current case has failed<br />
in some way and is not needed by <strong>the</strong> command, it s<strong>to</strong>ps processing that case and reads <strong>the</strong> next case.<br />
Thus <strong>the</strong> command that is currently executing has no knowledge of <strong>the</strong> original input case. It knows<br />
only about those cases which survive <strong>the</strong> <strong>PPL</strong>, and it knows about those cases only in <strong>the</strong>ir post-<strong>PPL</strong><br />
form.
1.2 <strong>PPL</strong>: Introduction<br />
When a command like MODIFY produces an output file, any <strong>PPL</strong> that is done <strong>to</strong> <strong>the</strong> input file is permanent in<br />
<strong>the</strong> sense that <strong>the</strong> output file reflects that <strong>PPL</strong>.<br />
While <strong>PPL</strong> is most frequently used <strong>to</strong> modify a case of data in an existing P-<strong>STAT</strong> system file, <strong>the</strong>re are provisions<br />
for passing data between cases within a file, between files, and even between P-<strong>STAT</strong> commands. This<br />
makes it possible <strong>to</strong> get summary information, <strong>to</strong> do conditional execution of <strong>the</strong> <strong>PPL</strong> within a command, and also<br />
<strong>to</strong> change <strong>the</strong> direction of a job stream depending on <strong>the</strong> data that are found or <strong>the</strong> results of a previous<br />
computation.<br />
1.1 THE VARIABLES<br />
There are three types of variables. The first type is a variable in a P-<strong>STAT</strong> system file. The variables in a P-<strong>STAT</strong><br />
file may be numbers or character strings. Every case (row) of a file contains 1 or more such variables. Each variable<br />
has a name. The name of a variable can contain letters, digits, dots, underscores and, if starting with a tag,<br />
two colons. It has at most 64 characters and must start with a letter. If a tag is supplied, it may be 1 <strong>to</strong> 16 characters<br />
long and MUST be followed by <strong>the</strong> double colon (::).<br />
The variables in a given P-<strong>STAT</strong> system file can only be modified when <strong>the</strong> rows of that file are rad by a<br />
P-<strong>STAT</strong> command. P-<strong>STAT</strong> system variables and scratch variables, described below, can only be 16 characters<br />
long and do not have a tag.<br />
The second type of variable is a P-<strong>STAT</strong> system variable. These variables are not part of a P-<strong>STAT</strong> file. Instead,<br />
<strong>the</strong>y reside in memory. They contain values such as <strong>the</strong> current date and <strong>the</strong> current page number. These<br />
variables, some of <strong>the</strong>m numeric and some of <strong>the</strong>m character strings, are created and maintained by <strong>the</strong> P-<strong>STAT</strong><br />
executive routines and are available for your use. For example, .DATE. is <strong>the</strong> system variable for <strong>the</strong> current date.<br />
Most of <strong>the</strong> system variables except for .PAGE., <strong>the</strong> current page number, cannot be changed by a user. System<br />
variable names look like regular variable names except that <strong>the</strong>y always begin and end with a decimal point.<br />
Scratch variables also exist in memory ra<strong>the</strong>r than in a P-<strong>STAT</strong> system file. Scratch variables, which can be<br />
ei<strong>the</strong>r numeric or character, are created by you as you need <strong>the</strong>m. Scratch variables come in two flavors which<br />
are distinguished by <strong>the</strong> way <strong>the</strong>y are named. A scratch variable with a name that begins with a single pound sign<br />
(#) only exists for <strong>the</strong> duration of <strong>the</strong> current command or macro. This temporary form of scratch variable is usually<br />
used ei<strong>the</strong>r <strong>to</strong> hold an intermediate results in a series of computations or <strong>to</strong> pass information between cases in<br />
a P-<strong>STAT</strong> file.<br />
A scratch variable with a name that begins with two pound signs (##) exists from <strong>the</strong> time it is created until<br />
<strong>the</strong> end of <strong>the</strong> P-<strong>STAT</strong> session. This permanent form of scratch variable allows information <strong>to</strong> be passed between<br />
files and between commands. Because a permanent scratch variable exists between commands, it can be created<br />
and changed even when <strong>the</strong>re is no active P-<strong>STAT</strong> file.<br />
1.2 MATCHING NAMES<br />
With <strong>the</strong> longer names <strong>the</strong>re is an increasing need <strong>to</strong> be able <strong>to</strong> refer <strong>to</strong> <strong>the</strong>m in some abbreviated manner. Wildcards<br />
are one way <strong>to</strong> do this. They can be used in Version 3, anywhere that a variable name is referenced. A<br />
wildcard reference contains at least one question mark (?). Wildcards can be used in both commands and subcommands.<br />
They are discussed in datail in <strong>the</strong> next chapter.<br />
1.3 VECTORS AND ARRAYS<br />
A vec<strong>to</strong>r is a one dimensional array of values. The variables that are represented in a case of data can be thought<br />
of as an array. This array is referenced as <strong>the</strong> V vec<strong>to</strong>r. Using <strong>the</strong> V vec<strong>to</strong>r, <strong>the</strong> variables in a case can be addressed<br />
with array notation. The variable V(1) is <strong>the</strong> first variable in <strong>the</strong> case. The variable V(23) is <strong>the</strong> twenty third variable<br />
in <strong>the</strong> case. The V vec<strong>to</strong>r has a dimension that corresponds <strong>to</strong> <strong>the</strong> number of variables in <strong>the</strong> current P-<strong>STAT</strong><br />
system file. The V vec<strong>to</strong>r can only be referenced as <strong>the</strong> P-<strong>STAT</strong> system file is being read in<strong>to</strong> a command.<br />
There is a second vec<strong>to</strong>r that is also available for your use. This vec<strong>to</strong>r is know as <strong>the</strong> P vec<strong>to</strong>r. It contains<br />
as many double precision numeric elements as <strong>the</strong> maximum number of variables in a file. In most versions of
<strong>PPL</strong>: Introduction 1.3<br />
P-<strong>STAT</strong> <strong>the</strong> P vec<strong>to</strong>r has 6000 elements. The elements of <strong>the</strong> P vec<strong>to</strong>r are initialized <strong>to</strong> missing when <strong>the</strong> P-<strong>STAT</strong><br />
run begins and remain missing until you change <strong>the</strong>m. The P vec<strong>to</strong>r provides an easy way <strong>to</strong> pass a large number<br />
of values across cases or between commands. Since <strong>the</strong> P vec<strong>to</strong>r exists in memory ra<strong>the</strong>r than in a P-<strong>STAT</strong> system<br />
file it can be referenced even when <strong>the</strong>re is no active P-<strong>STAT</strong> system file.<br />
A third type of vec<strong>to</strong>r, which uses <strong>the</strong> variables in your P-<strong>STAT</strong> system file, is also available. If you wish a<br />
group of variables <strong>to</strong> be addressed with vec<strong>to</strong>r notation, you must name <strong>the</strong>m in such a way that all <strong>the</strong> variables<br />
<strong>to</strong> be included in <strong>the</strong> vec<strong>to</strong>r and only those variables have <strong>the</strong> same prefix or suffix. This prefix or suffix, combined<br />
with <strong>the</strong> wildcard character “?”, is used <strong>to</strong> denote <strong>the</strong> members of a vec<strong>to</strong>r that can be addressed with a<br />
subscript. This feature is usually used ei<strong>the</strong>r <strong>to</strong> simplify <strong>the</strong> instructions when selecting variables with <strong>the</strong> KEEP<br />
or DROP instruction, or in conjunction with DO loops which provide a powerful mechanism for creating<br />
subscripts.<br />
Similar <strong>to</strong> <strong>the</strong> dynamic vec<strong>to</strong>rs are multi-dimensional user-defined arrays which can hold ei<strong>the</strong>r numbers or<br />
characters. These are discussed in full in Chapter 8 “<strong>PPL</strong>: Across-Case Modifications”.<br />
1.4 THE COMMANDS<br />
<strong>PPL</strong> can be used any time that a P-<strong>STAT</strong> system file is read by any P-<strong>STAT</strong> command. The input file remains<br />
unchanged, but <strong>the</strong> cases that are processed by <strong>the</strong> command reflect <strong>the</strong> modifications. There are five commands<br />
which do not have a statistical or display function but which are specifically associated with <strong>PPL</strong>. These commands<br />
are covered in detail in <strong>the</strong> following chapters.<br />
The MODIFY command is used <strong>to</strong> read an existing P-<strong>STAT</strong> system file and produce a new file which contains<br />
<strong>the</strong> cases after <strong>the</strong> <strong>PPL</strong> is applied. The MODIFY command is especially useful when you are preparing a<br />
new study for analysis and need <strong>to</strong> clean <strong>the</strong> data.<br />
The COMPARE command takes two files and compares <strong>the</strong> contents. A major use of COMPARE is <strong>to</strong> compare<br />
<strong>the</strong> input and output from a MODIFY command as a check that <strong>the</strong> resulting output file is as expected.<br />
The CHECK command examines an existing P-<strong>STAT</strong> file for problems and s<strong>to</strong>res <strong>the</strong> results in system variables<br />
that can <strong>the</strong>n be tested or printed. The CHECK command should always be used when <strong>the</strong>re has been a<br />
power failure or system crash while a P-<strong>STAT</strong> file was being processed. It is also useful when you need <strong>to</strong> know<br />
if a file has any remaining cases after a MODIFY.<br />
The TEXTWRITER command is a vehicle for <strong>PPL</strong>, with additional controls <strong>to</strong> format <strong>the</strong> printed page.<br />
The PROCESS command has a P-<strong>STAT</strong> system file as input but has no output file and does no computation.<br />
PROCESS is used <strong>to</strong> s<strong>to</strong>re <strong>the</strong> information in <strong>the</strong> P vec<strong>to</strong>r, arrays, or in permanent scratch variables which can <strong>the</strong>n<br />
be accessed by subsequent commands.<br />
In addition, a number of <strong>PPL</strong> opera<strong>to</strong>rs can be used as standalone commands. They can be used with system<br />
variables, scratch variables, <strong>the</strong> P vec<strong>to</strong>r and <strong>the</strong> user-defined arrays. These standalone <strong>PPL</strong> commands are: IF,<br />
SET, INCREASE, DECREASE, GENERATE, PUT, PUTL, IF-THEN-ELSE, DIALOG, BRANCH and DO<br />
loops.<br />
PUT .DATE. $<br />
GEN ##Project:C40 = 'ABC, <strong>Inc</strong>. January 2008 Report' $<br />
GEN ##Constant = SQRT ( 43.265 ) $<br />
1.5 P-<strong>STAT</strong> SYSTEM FILE: CURRENT OR PREVIOUS<br />
P-<strong>STAT</strong> keeps track of <strong>the</strong> previous and current versions of each P-<strong>STAT</strong> system file that you create. You supply<br />
a file name of sixteen or fewer characters, and P-<strong>STAT</strong> adds <strong>the</strong> extension (suffix) “.PS1” or “.PS2”. As that file<br />
is modified, <strong>the</strong> extension name alternates. However, at all times, P-<strong>STAT</strong> knows which file is <strong>the</strong> current one<br />
and which is <strong>the</strong> previous one. You use only <strong>the</strong> name you gave <strong>the</strong> file:<br />
PLOT Cells;
1.4 <strong>PPL</strong>: Introduction<br />
for example, and P-<strong>STAT</strong> inputs <strong>the</strong> current version <strong>to</strong> PLOT. However, if you want <strong>the</strong> prior version for some<br />
reason, use <strong>the</strong> <strong>PPL</strong> instruction PREVIOUS<br />
PLOT Cells [ PREVIOUS ] ;<br />
The <strong>PPL</strong> instruction PREVIOUS is enclosed in square brackets and follows directly after <strong>the</strong> file name. It must<br />
be <strong>the</strong> first <strong>PPL</strong> clause. Additional <strong>PPL</strong> clauses may follow. The comparable instruction CURRENT is also available.<br />
When nei<strong>the</strong>r is used, it is assumed that <strong>the</strong> current file is <strong>the</strong> desired one.<br />
1.6 ORGANIZATION<br />
This manual contains chapters describing <strong>the</strong> details of <strong>the</strong> programming language and <strong>the</strong> commands specifically<br />
associated with <strong>PPL</strong>.<br />
• “<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong>” covers <strong>PPL</strong> punctuation, case selection, variable selection<br />
and simple logical selection with “IF”.<br />
• “<strong>PPL</strong>: The Commands” explains more about temporary and permanent modifications and covers <strong>the</strong><br />
MODIFY, <strong>PPL</strong>, and PROCESS commands in detail.<br />
• “<strong>PPL</strong>: NCOT and RECODE” covers <strong>the</strong> NCOT and RECODE functions, including multi-variable<br />
recodes.<br />
• “<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks” covers those functions in detail. This chapter includes<br />
<strong>the</strong> use of DO loops <strong>to</strong> generate and rename lists of variables.<br />
• “<strong>PPL</strong>: Functions and System Variables” covers <strong>the</strong> numeric functions and many of <strong>the</strong> P-<strong>STAT</strong> system<br />
variables.<br />
• Random Number and Distribution Functions also covers <strong>the</strong> “Fuzzy equals problem” and <strong>the</strong> functions<br />
<strong>to</strong> protect against this problem.<br />
• “<strong>PPL</strong>: Across Case Modification” covers <strong>the</strong> use of <strong>the</strong> SPLIT and COLLECT functions as well as<br />
uses of <strong>the</strong> P vec<strong>to</strong>r, permanent scratch variables and user-defined arrays.<br />
• “<strong>PPL</strong>: Modification of Character Variables” covers <strong>the</strong> character functions including <strong>the</strong> MATCH<br />
function. MATCH provides string matching capabilities similar <strong>to</strong> those found in <strong>the</strong> Unix commands<br />
lex and yacc.<br />
• <strong>PPL</strong>: Date and Time Commands and Functions.<br />
• TEXTWRITER: A Vehicle for <strong>PPL</strong><br />
• MACROS
<strong>PPL</strong>: Introduction 1.5<br />
VARIABLES<br />
There are three different types of variables available in P-<strong>STAT</strong><br />
Fields in a P-<strong>STAT</strong> system file<br />
SUMMARY<br />
may be numeric or character strings. Variable names have 1-64 characters composed only of letters,<br />
numbers, underscores and decimal points. The first character must be a letter. Variable names may begin<br />
with a tag of 1-16 characters followed by 2 colons (::). These variables are only available when a P-<strong>STAT</strong><br />
system file is read by a P-<strong>STAT</strong> command<br />
System variables<br />
may be numeric or character. These variables are created and maintained by <strong>the</strong> P-<strong>STAT</strong> system itself<br />
<strong>to</strong> contain information such as <strong>the</strong> current date, current file name, or <strong>the</strong> results of <strong>the</strong> most recent command.<br />
These variables, which always have names that both begin and end with a period, for example<br />
.DATE. , are s<strong>to</strong>red in memory and can be used (printed, interrogated, etc.) but not changed by <strong>the</strong> user.<br />
Scratch variables<br />
may be numeric or character. These variables, which are created by <strong>the</strong> user as needed, reside in memory.<br />
Temporary scratch variables, which only exist for <strong>the</strong> duration of a command or macro, have names that<br />
begin with a single pound (#) sign. Permanent scratch variables exist for <strong>the</strong> remainder of <strong>the</strong> P-<strong>STAT</strong><br />
session and have names that begin with two pound (##) signs. Scratch variables are limited <strong>to</strong> 16 characters<br />
starting with a letter and containing letters, numbers, underscores and decimal points.<br />
VECTORS AND ARRAYS<br />
Groups of related variables may be considered a vec<strong>to</strong>r of values. These are typically used in DO loops.<br />
V vec<strong>to</strong>r<br />
P vec<strong>to</strong>r<br />
The V vec<strong>to</strong>r references <strong>the</strong> current row (case) of data in a P-<strong>STAT</strong> system file. Variables may be refereed<br />
<strong>to</strong> by <strong>the</strong>ir names (Age, Q1, Density, etc.) or by <strong>the</strong>ir position in <strong>the</strong> file, for example: v(3) or V(#j).<br />
The subscript may be a constant or an expression (such as a scratch variable) that evaluates <strong>to</strong> a position.<br />
This vec<strong>to</strong>r is only available when a file is being read by a P-<strong>STAT</strong> command.<br />
The P vec<strong>to</strong>r is a numeric vec<strong>to</strong>r whose size depends on <strong>the</strong> maximum number of variables allowed in<br />
<strong>the</strong> version of P-<strong>STAT</strong> that is being used. The values in <strong>the</strong> P vec<strong>to</strong>r are set <strong>to</strong> missing when <strong>the</strong> P-<strong>STAT</strong><br />
session begins. They are available for use in <strong>PPL</strong> and allow values <strong>to</strong> be passed between cases in a file<br />
and between commands.<br />
Dynamic vec<strong>to</strong>r<br />
Dynamic vec<strong>to</strong>rs depend on <strong>the</strong> naming of <strong>the</strong> variables in <strong>the</strong> P-<strong>STAT</strong> system file. Any group of variables<br />
with <strong>the</strong> same prefix or suffix can be referenced as a vec<strong>to</strong>r by combining <strong>the</strong> prefix or suffix with<br />
<strong>the</strong> wildcard character, <strong>the</strong> question mark (?). Thus Q1? refers <strong>to</strong> all variables in <strong>the</strong> file beginning with<br />
<strong>the</strong> characters “Q1”.
1.6 <strong>PPL</strong>: Introduction<br />
User-defined arrays<br />
Arrays, one-dimensional and multi-dimensional, for numeric and character data can be defined and used.<br />
They are described in full in Chapter 8 “<strong>PPL</strong>: Across-Case Modificiations”.<br />
COMMANDS<br />
The P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong> can be used with any P-<strong>STAT</strong> command. However, <strong>the</strong>re are 6 commands<br />
of particular importance when <strong>PPL</strong> is considered.<br />
MODIFY<br />
CHECK<br />
COMPARE<br />
PROCESS<br />
takes an input P-<strong>STAT</strong> system file and applies <strong>PPL</strong> <strong>to</strong> produce an output P-<strong>STAT</strong> system file that is<br />
changed in some way.<br />
MODIFY Myfile [ here goes <strong>PPL</strong> ], OUT Newfile $<br />
examines a P-<strong>STAT</strong> system file and reports on its status. It is very useful when a system crash has occurred.<br />
It is also useful for obtaining information such as <strong>the</strong> number of cases in <strong>the</strong> file. The information<br />
from CHECK is s<strong>to</strong>red in system variables which may be tested in subsequent <strong>PPL</strong>.<br />
takes two P-<strong>STAT</strong> system files and compares <strong>the</strong>ir contents. The resulting differences are s<strong>to</strong>red in a new<br />
P-<strong>STAT</strong> system file.<br />
uses a P-<strong>STAT</strong> system file but produces nei<strong>the</strong>r an output file nor a printed report. Is is used <strong>to</strong> accumulated<br />
information about <strong>the</strong> file and s<strong>to</strong>re it in <strong>the</strong> P vec<strong>to</strong>r or permanent scratch variables for use in<br />
subsequent commands.<br />
TEXTWRITER<br />
is a vehicle for <strong>PPL</strong> and <strong>the</strong> PUT function. It has additional features for formatting <strong>the</strong> output such as<br />
justification of <strong>the</strong> text, indenting, paragraph controls, and font changes for postscript output.<br />
STANDALONE <strong>PPL</strong> COMMANDS<br />
Many <strong>PPL</strong> opera<strong>to</strong>rs can be used as standalone commands. These standalone <strong>PPL</strong> commands are: IF,<br />
SET,INCREASE, DECREASE, GENERATE, PUT, PUTL, IF-THEN-ELSE, DIALOG, BRANCH and<br />
DO loops.<br />
<strong>PPL</strong> INSTRUCTIONS<br />
<strong>PPL</strong> instructions are enclosed in brackets and immediately follow <strong>the</strong> filename.<br />
CURRENT<br />
COMPARE Myfile [ CURRENT ] Myfile [ PREVIOUS ], OUT mydiffs $<br />
CURRENT selects <strong>the</strong> more recently created version of <strong>the</strong> P-<strong>STAT</strong> system file. CURRENT may be<br />
used with o<strong>the</strong>r <strong>PPL</strong>.
<strong>PPL</strong>: Introduction 1.7<br />
PREVIOUS<br />
PREVIOUS selects <strong>the</strong> previous version of <strong>the</strong> P-<strong>STAT</strong> system file. PREVIOUS may be used with o<strong>the</strong>r<br />
<strong>PPL</strong>
2<br />
<strong>PPL</strong>:<br />
Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
This chapter explains <strong>the</strong> syntax and punctuation of <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong>. Case and variable selection<br />
is covered in detail. In addition, generating new variables, recoding existing variables and logical selection<br />
using a simple “IF” are explained.<br />
2.1 CASE AND VARIABLE SELECTION<br />
<strong>PPL</strong> begins with a left bracket “[” and ends with a right bracket “]”. Individual clauses are terminated with ei<strong>the</strong>r<br />
a semicolon “;” or a right bracket “]”. Clauses within brackets are separated by semicolons as in <strong>the</strong> first example<br />
below. If <strong>the</strong> right bracket is used, <strong>the</strong> next clause (if any) must begin with ano<strong>the</strong>r left bracket. The following<br />
two command phrases are functionally equivalent.:<br />
SURVEY Patients [ CASES 1 TO 10 ;<br />
KEEP Age Sex Race ],<br />
SURVEY Patients [ CASES 1 <strong>to</strong> 10 ]<br />
[ KEEP Age Sex Race ],<br />
Each is a single phrase that contains two modification clauses. The command name is SURVEY. Its argument is<br />
<strong>the</strong> filename Patients and both of <strong>the</strong> modification clauses which are <strong>to</strong> be applied <strong>to</strong> that file. In this example, <strong>the</strong><br />
modifications are a case selection, indicated by <strong>the</strong> word CASES (ROWS is a synonym) in <strong>the</strong> first modification<br />
clause, and a variable selection, indicated by <strong>the</strong> word KEEP in <strong>the</strong> second modification clause.<br />
The first word in each clause tells P-<strong>STAT</strong> what kind of modification is involved. P-<strong>STAT</strong> recognizes CAS-<br />
ES or CASE as <strong>the</strong> keyword for case selection and ei<strong>the</strong>r KEEP or DROP as keywords for variable selection. IF<br />
is <strong>the</strong> keyword for logical selection. SET is <strong>the</strong> keyword for recoding or setting an existing variable <strong>to</strong> a new value.<br />
GENERATE is <strong>the</strong> keyword for generating or creating a new variable. Figure 2.1 contains examples of <strong>the</strong> basic<br />
types of modifications — case selection, variable selection, logical selection, recoding of existing variables, and<br />
creation of new variables. File Dogs contains five variables and three cases. The results of each modification<br />
clause are shown on <strong>the</strong> right.<br />
Many modification clauses may be used within <strong>the</strong> single command phrase which describes an input file.<br />
Each clause is used in turn <strong>to</strong> modify <strong>the</strong> cases of <strong>the</strong> file as it is read. The command itself is executed after <strong>the</strong><br />
modifications have taken place. A comma following a right bracket means that <strong>the</strong> <strong>PPL</strong> for that file is finished,<br />
and some <strong>to</strong>tally different command clause is about <strong>to</strong> begin. Therefore, you should NOT put commas between<br />
sets of <strong>PPL</strong> brackets.<br />
LIST Dogs [ KEEP Name Sex; IF Sex EQ 2, RETAIN ] $ is correct<br />
LIST Dogs [ KEEP Name Sex ] [ IF Sex EQ 2, RETAIN ] $ is correct<br />
LIST DOGS [ KEEP Name Sex ], [ IF Sex EQ 2, RETAIN ] $ is an ERROR<br />
The comma in a command is a signal that <strong>the</strong> next word is an identifier, a keyword, recognized by <strong>the</strong> command.<br />
The string “[ IF ...” is part of <strong>the</strong> <strong>PPL</strong> and not an identifier for <strong>the</strong> LIST command. It is easy <strong>to</strong> avoid this<br />
error if you use brackets only for major pieces of <strong>PPL</strong> and use <strong>the</strong> semicolon as <strong>the</strong> termina<strong>to</strong>r for individual<br />
clauses.
2.2 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
__________________________________________________________________________<br />
Figure 2.1 Basic Types of Modifications<br />
File Dogs: Before Modifications File Dogs: After Modifications<br />
Name Sex Age Wt Ht Diet<br />
Max 1 2 15 12 1<br />
Spot 2 7 24 18 1<br />
Rags 1 4 10 - 2<br />
CASES <strong>to</strong> select cases: Name Sex Age Wt Ht Diet<br />
LIST Dogs Max 1 2 15 12 1<br />
[ CASES 1 3 ] $ Rags 1 4 10 - 2<br />
KEEP <strong>to</strong> select variables: Name Diet<br />
LIST Dogs Max 1<br />
[ KEEP Name Diet ] $ Spot 1<br />
Rags 2<br />
DROP <strong>to</strong> omit variables: Name Sex Age Ht Diet<br />
LIST Dogs Max 1 2 12 1<br />
[ CASE 1 ; DROP Wt ] $<br />
IF for logical selection: Name Sex Age Wt Ht Diet<br />
LIST Dogs Spot 2 7 24 18 1<br />
[ IF Sex EQ 2, RETAIN ] $<br />
SET <strong>to</strong> modify existing variables: Name Sex Age Wt Ht Diet<br />
LIST Dogs Max 1 2 15 1.0 1<br />
[ SET Ht = Ht / 12 ] $ Spot 2 7 24 1.5 1<br />
Rags 1 4 10 - 2<br />
GENERATE <strong>to</strong> create new variables: Name Ratio<br />
LIST Dogs Max .80<br />
[ GEN Ratio = Ht / Wt ; Spot .75<br />
KEEP Name Ratio ] $ Rags -<br />
__________________________________________________________________________
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.3<br />
Modification clauses are part of <strong>the</strong> P-<strong>STAT</strong> command structure. As such, <strong>the</strong>y are free-format and may be<br />
continued on successive lines. However, each individual word or label must fit entirely on one line; it must not<br />
be broken across lines.<br />
There is a limit <strong>to</strong> <strong>the</strong> number of modifications which can be done at one time. This limit varies with <strong>the</strong> size<br />
of P-<strong>STAT</strong> that is being used. The size of <strong>the</strong> <strong>PPL</strong> workspace, measured in 4-byte words, is:<br />
Whopper II = 250,000 Whopper IV = 1,500,000<br />
An error message is printed when <strong>the</strong> limit is exceeded. The data modification area is adequate for most uses.<br />
However, if <strong>the</strong> space should prove <strong>to</strong>o small <strong>to</strong> do a particular series of modifications in a single pass of <strong>the</strong> data<br />
file, <strong>the</strong> modifications may be done using <strong>the</strong> MODIFY command several times, creating temporary intermediate<br />
files.<br />
2.2 Case Selection<br />
Cases in a P-<strong>STAT</strong> system file are synonymous with rows in a file, despite <strong>the</strong> fact that <strong>the</strong> data for each case may<br />
have originally been collected on multiple records or may list on a terminal or printer over several lines. Case<br />
selection takes <strong>the</strong> following form:<br />
[ CASES 125 TO 199 345 ]<br />
It is indicated by <strong>the</strong> word CASES or CASE immediately following <strong>the</strong> left bracket. (Ei<strong>the</strong>r ROWS or ROW may<br />
also be used.)<br />
Case selection uses <strong>the</strong> position of <strong>the</strong> case in <strong>the</strong> file <strong>to</strong> determine which cases are selected. Case references<br />
must be in ascending order whenever P-<strong>STAT</strong> files are accessed sequentially. Each of <strong>the</strong> following is a legal<br />
case selection clause:<br />
[ CASES 33 49 105 TO 200 223 300 TO 305 700 .ON. ]<br />
[ CASE 1 ]<br />
[ CASE 3 .ON. ]<br />
The use of <strong>the</strong> system variable .ON. in <strong>the</strong> first and third examples means “continue selecting cases from <strong>the</strong> current<br />
case onward until all <strong>the</strong> cases have been read”. You can tell that “.ON.” is a system variable because of <strong>the</strong><br />
name. System variables have names that look like legal P-<strong>STAT</strong> names except that <strong>the</strong>y always begin and end<br />
with a decimal point.<br />
A case may not be repeated in a case selection clause. (However, <strong>the</strong>re are o<strong>the</strong>r ways <strong>to</strong> include a case more<br />
than once. See <strong>the</strong> REPEAT instruction later in this manual.) Case selection acts as a filter on <strong>the</strong> file and is done<br />
before any o<strong>the</strong>r modifications take place, regardless of <strong>the</strong> position of <strong>the</strong> CASE clause among <strong>the</strong> o<strong>the</strong>r modifications.<br />
If ten cases are selected from a file with 2000 cases, <strong>the</strong> tenth of <strong>the</strong> selected cases is processed as if it<br />
were <strong>the</strong> last case in <strong>the</strong> file. A file may be modified by no more than one case selection clause.<br />
A major reason for using case selection is for test runs. If you have a large file and are doing transformations,<br />
it is prudent <strong>to</strong> do a trial run, selecting a few cases and printing <strong>the</strong> results so <strong>the</strong>y can be examined before <strong>the</strong> final<br />
run is made. When a trial run looks correct, <strong>the</strong> case selection is removed and <strong>the</strong> final<br />
run is done.<br />
2.3 Variable Selection<br />
There are two keywords which indicate variable selection: 1) KEEP, which is followed by a list of variables <strong>to</strong> be<br />
used, and 2) DROP, which is followed by a list of variables <strong>to</strong> be omitted. Variables may be selected by referencing<br />
ei<strong>the</strong>r <strong>the</strong>ir names or <strong>the</strong>ir positions in <strong>the</strong> file.<br />
These are selections of variables by <strong>the</strong>ir names (variable labels):<br />
LIST Myfile [ KEEP Sex Age Education ] $
2.4 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
SORT Myfile [ DROP <strong>Inc</strong>ome Rent ] ,<br />
BY Education, OUT SortEduc $<br />
LIST File3 [ KEEP Id Sex Age .ON. ] $<br />
Each selection clause begins with ei<strong>the</strong>r KEEP or DROP. These keywords identify a P-<strong>STAT</strong> variable selection.<br />
Each continues with a list of variable names, which are separated from each o<strong>the</strong>r by blanks. “.ON.” used for variable<br />
selection has <strong>the</strong> same meaning as it does for case selection. When used for variable selection it means<br />
“starting with <strong>the</strong> current variable do <strong>the</strong> KEEP or DROP <strong>to</strong> all <strong>the</strong> remaining variables”.<br />
It is often convenient <strong>to</strong> refer <strong>to</strong> a variable by position ra<strong>the</strong>r than by name, particularly when <strong>the</strong> variable<br />
name is long. There are some situations in which, by definition, a number can only refer <strong>to</strong> a position. There are<br />
o<strong>the</strong>r situations where a number could represent ei<strong>the</strong>r a constant or a variable position. To distinguish between<br />
<strong>the</strong>se two situations, <strong>the</strong> convention in P-<strong>STAT</strong> is that a constant is a number by itself, and a variable position is<br />
referenced with <strong>the</strong> notation V(n), where “n” is <strong>the</strong> position. V(1) is <strong>the</strong> variable in position 1 of <strong>the</strong> file. V(33) is<br />
<strong>the</strong> variable in position 33 of <strong>the</strong> file. With reference <strong>to</strong> <strong>the</strong> example in Figure 2.1,<br />
LIST Dogs [ KEEP V(1) V(2) V(6) ] $ is <strong>the</strong> same as<br />
LIST Dogs [ KEEP Name Sex Diet ] $<br />
Variable names and variable positions can be used in <strong>the</strong> same variable selection clause. The position of a variable<br />
is always <strong>the</strong> “current” position of that variable in <strong>the</strong> file. After variable selection or reordering, <strong>the</strong> initial positions<br />
of <strong>the</strong> variables may change. For example, this command:<br />
PLOT Tree [ KEEP V(10) V(3) TO V(6) ] ;<br />
inputs cases with five variables <strong>to</strong> <strong>the</strong> PLOT command. The variables are ordered as specified. A subsequent<br />
subcommand <strong>to</strong> plot variable 10 by variable 3:<br />
P V(10) * V(3) ;<br />
yields an error message, because <strong>the</strong>re are only five variables in <strong>the</strong> file given <strong>to</strong> <strong>the</strong> PLOT command. The variable<br />
that previously was in position 10 is now in position 1; <strong>the</strong> variable that was in position 3 is now in position 2, and<br />
so on.<br />
2.4 Variable Selection With WIldcards<br />
Consider a variable with <strong>the</strong> following name:<br />
A wildcard like<br />
age.oldest.surviving.child<br />
age?sur?ch?<br />
might be <strong>the</strong> most efficient way <strong>to</strong> refer <strong>to</strong> it. When compared <strong>to</strong> <strong>the</strong> above name, 'age' matches, <strong>the</strong> '?sur' says<br />
accept anything until 'sur' is found, <strong>the</strong> '?ch' says from <strong>the</strong>re accept anything through a 'ch', and <strong>the</strong> <strong>the</strong> final '?'<br />
says accept anything at all after that, if indeed <strong>the</strong>re is anything else. Thus,<br />
age.oldest.surviving.child<br />
is matched by age sur ch<br />
A wildcard usage can be thought of as a template for name matching. Differences in case do not matter. A<br />
wildcard template can be used <strong>to</strong> specify which variable (or, in some situations, variables) are <strong>to</strong> be used. The<br />
template is matched against <strong>the</strong> name of each variable in <strong>the</strong> file. The template uses single (?) or double (??) question<br />
marks <strong>to</strong> indicate how <strong>the</strong> matching should be done.<br />
A wildcard template contains at least one single (?) or double (??) question mark, and at least one string. The<br />
question marks serve as 'move until' opera<strong>to</strong>rs. A string consists of one or more ordinary characters that can be<br />
found in names. String matching ignores case. A template successfully matches a name when each template element<br />
progressively matches a part of <strong>the</strong> name, with <strong>the</strong> entire name being matched when <strong>the</strong> template is done.
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.5<br />
If a template starts with a string, <strong>the</strong> name being matched must also begin with that string. If <strong>the</strong> template<br />
ends with a single or double question mark, and <strong>the</strong> match has been successful so far, <strong>the</strong> rest of <strong>the</strong> name is accepted,<br />
and a match has occurred.<br />
A single question mark, followed by a string, matches <strong>the</strong> name through <strong>the</strong> NEXT remaining occurrence of that<br />
string. If no such string is found, <strong>the</strong> match fails. A double question mark, followed by a string, matches <strong>the</strong> name<br />
through <strong>the</strong> LAST remaining occurrence of that string. If no such string is found, <strong>the</strong> match fails.<br />
___________________________________________________________________________<br />
Figure 2.2 Examples of Wildcard Matching<br />
qq? will match any name that starts with 'qq'.<br />
?qq? will match any name that contains 'qq' anywhere.<br />
??qq will match any name that ends with 'qq' .<br />
ab?cde will match abxxcde, and also abcde.<br />
ab?cde will NOT match abcdecde, whereas<br />
ab??cde will.<br />
a?o?c will NOT match age.oldest.child .<br />
a?o?c? will because <strong>the</strong> final ? moved <strong>to</strong> <strong>the</strong> name's end<br />
a?ld will NOT, because it ends on <strong>the</strong> ld in old.<br />
a??ld will, because <strong>the</strong> ?? moved <strong>to</strong> <strong>the</strong> last ld.<br />
___________________________________________________________________________<br />
A wildcard can be used anywhere that <strong>the</strong> name of a variable could be used. A single match is usually what<br />
is expected. However:<br />
1. In KEEP or DROP phrases, and in LIST functions like SUM, a wildcard usage can have multiple<br />
matches, in which case all will be used.<br />
1. For example [KEEP ??income]<br />
2. There can be, in <strong>PPL</strong> expressions, multiple matches <strong>to</strong> a wildcard usage if a subscript follows, in paren<strong>the</strong>ses,<br />
<strong>to</strong> show which of <strong>the</strong> matches should be accessed at that point in <strong>the</strong> execution of <strong>the</strong> <strong>PPL</strong>.<br />
This permits looping through <strong>the</strong> matches.<br />
SORT xxx, BY pulse??pre pulse??post, OUT zzz $<br />
In <strong>the</strong> BY phrase each template should match one name, and <strong>the</strong> sort will be<br />
done on those two BY variables.<br />
[ SET <strong>to</strong>t? TO ?11?inc? + ?11?div? ]<br />
In <strong>the</strong> above, <strong>the</strong> actual variable names could be something like <strong>to</strong>tal_income_all_sources, year_2011.income<br />
and year_2011.dividends .<br />
[ KEEP ??income ]<br />
Wildcards can be used in KEEP or DROP phrases. The phrase shown above keeps all of <strong>the</strong> variables whose<br />
names end with INCOME. There can be one or more matches.<br />
[ SET <strong>to</strong>tal TO SUM( ??income) ]<br />
Wildcards may be used as input <strong>to</strong> <strong>the</strong> various LIST functions, which include sum, mean, max, first.good and<br />
such. The phrase shown above sets TOTAL <strong>to</strong> <strong>the</strong> sum of <strong>the</strong> variables whose names end with INCOME. There<br />
can be one or more matches. SET, INCREASE and DECREASE can be followed by a subscripted wildcard, as<br />
can <strong>the</strong> various operands in <strong>the</strong> rest of <strong>the</strong> expression.
2.6 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
[ DO #j = 1,5; SET ??income(#j) = 0; ENDDO ].<br />
Here, ??income remembers <strong>the</strong> positions of <strong>the</strong> variables whose names end with 'income'. In this example, <strong>the</strong>re<br />
are presumably five of <strong>the</strong>m. Using ??income(#j) when #j=2 accesses <strong>the</strong> second of <strong>the</strong>m, wherever in <strong>the</strong> file that<br />
variable may actually be. The example would set <strong>the</strong> 5 variables whose names end with 'income' <strong>to</strong> zero.<br />
2.5 Using Ranges in Selection Clauses<br />
Lists may contain single variables, many variables, and/or ranges of variables:<br />
[ KEEP Siblings TO Children Occup.Mo<strong>the</strong>r Race Age ]<br />
The cases received by <strong>the</strong> individual P-<strong>STAT</strong> commands contain all <strong>the</strong> variables from <strong>the</strong> variable named Siblings<br />
through <strong>the</strong> variable named Children, plus <strong>the</strong> three variables, Occup.Mo<strong>the</strong>r, Race and Age. In this case,<br />
<strong>the</strong>re will be an error message if <strong>the</strong> variable named Children has a position in <strong>the</strong> file before that of <strong>the</strong> variable<br />
named Siblings, or if Occup.Mo<strong>the</strong>r, Race or Age have positions in <strong>the</strong> file between Siblings and Children. O<strong>the</strong>r<br />
than <strong>the</strong>se situations, <strong>the</strong> order of <strong>the</strong> individual variables or ranges does not matter.<br />
O<strong>the</strong>r valid selections are:<br />
[ CASES 1 10 TO 50 56 ]<br />
[ DROP V(13) TO V(16) Occupation ]<br />
[ KEEP V(1) Education TO V(23) Region V(3) ]<br />
The system variable .ON. may be used <strong>to</strong> make <strong>the</strong> referencing and typing of variable selections easier. The<br />
clause:<br />
[ KEEP V(6) Children .ON. ]<br />
instructs P-<strong>STAT</strong> <strong>to</strong> use <strong>the</strong> sixth variable in <strong>the</strong> file and all <strong>the</strong> variables from <strong>the</strong> one named Children through<br />
<strong>the</strong> last variable in <strong>the</strong> file. .ON. means “from here on through <strong>the</strong> end.” This is particularly useful if you have<br />
added a number of new variables <strong>to</strong> <strong>the</strong> file and are not certain just how many you currently have.<br />
The use of: 1) TO <strong>to</strong> indicate a range, and 2) .ON. <strong>to</strong> indicate “from <strong>the</strong> current item on through <strong>the</strong> last item,”<br />
are valid in both variable and case selection clauses.<br />
2.6 Multiple Variable Selections<br />
DROP and KEEP, unlike CASES, may be used in more than one modification clause. Variable selections take<br />
place in a sequential and cumulative manner. An initial variable selection often winnows out <strong>the</strong> unnecessary variables.<br />
A second selection occurs after all <strong>the</strong> modifications are done and selects only those variables actually<br />
needed as input for a given command:<br />
LIST Dept<br />
[ KEEP Name TO Race Test1 Test2 ;<br />
GENERATE Pass = 1;<br />
GENERATE Test.Average = ( Test1 + Test2 ) / 2 ;<br />
IF Test.Average LT 65, SET Pass = 0 ;<br />
DROP Test1 Test2 ] $<br />
Variables should not be selected out of <strong>the</strong> file before <strong>the</strong>y are used. The following command causes an error<br />
because <strong>the</strong> variable named Year is not available when <strong>the</strong> IF clause is processed:<br />
LIST Produce [ DROP Year ;<br />
IF Year EQ 2002, RETAIN ] $<br />
The correct order of <strong>the</strong> variable selection clauses is:
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.7<br />
LIST Product [ IF Year EQ 2002, RETAIN ;<br />
DROP Year ] $<br />
The value of <strong>the</strong> variable Year is tested and <strong>the</strong> case retained when it is 2002, and <strong>the</strong>n variable Year is dropped<br />
from each case of <strong>the</strong> file as it is passed <strong>to</strong> <strong>the</strong> LIST command.<br />
DROP and KEEP require a great deal of overhead because both <strong>the</strong> variable names and <strong>the</strong> data are rearranged<br />
for each DROP or KEEP clause. Your run will be more efficient if you limit <strong>the</strong> number of DROP and KEEP<br />
clauses in any one command. For example:<br />
[ DROP V(1) ] [ DROP V(1) ] [ DROP V(1) ]<br />
is less efficent than<br />
[ DROP V(1) TO V(3) }<br />
even though <strong>the</strong>y do <strong>the</strong> same thing.<br />
RETAIN keeps <strong>the</strong> case. It is <strong>the</strong>n passed <strong>to</strong> <strong>the</strong> next <strong>PPL</strong> clause or, if <strong>the</strong>re are no more clauses, <strong>to</strong> <strong>the</strong> current<br />
command. DELETE does not pass a case <strong>to</strong> any subsequent <strong>PPL</strong> clauses or <strong>to</strong> <strong>the</strong> current command. <strong>PPL</strong> does<br />
not change <strong>the</strong> contents of <strong>the</strong> input file. <strong>PPL</strong> only affects what is sent <strong>to</strong> <strong>the</strong> current command. If <strong>the</strong> command<br />
creates an output file, <strong>the</strong> changes are “permanent”. If <strong>the</strong> command does not create an output file, <strong>the</strong> changes<br />
are temporary.<br />
2.7 Reordering Variables<br />
The KEEP instruction may be used <strong>to</strong> reorder variables For example, any of <strong>the</strong> following clauses reorders <strong>the</strong><br />
variables Rent, Sex and Age in listings of File1:<br />
LIST File1 [ KEEP Age Sex Rent ] $<br />
LIST File1 [ KEEP Sex Age Rent ] $<br />
LIST File1 [ KEEP Rent Age Sex ] $<br />
Often <strong>the</strong> rearrangement is done <strong>to</strong> place one or two variables at <strong>the</strong> left of <strong>the</strong> file. These two clauses are<br />
equivalent:<br />
[ KEEP V(16) V(23) V(1) TO V(15)<br />
V(17) TO V(22) V(24) .ON. ]<br />
[ KEEP V(16) V(23) .OTHERS. ]<br />
.OTHERS. is a system variable meaning all <strong>the</strong> variables which are not mentioned elsewhere in <strong>the</strong> KEEP<br />
clause. System variables are set by P-<strong>STAT</strong>. Most of <strong>the</strong> system variables cannot be changed by <strong>the</strong> user but are<br />
available for use and testing in <strong>PPL</strong> statements. System variable names always begin and end with a decimal point.<br />
Since variable names must begin with a letter, system variable names will never conflict with legal variable names.<br />
.NEW., .CHARACTER. and .NUMERIC. are system variables which can be used after KEEP <strong>to</strong> select or reorder<br />
variables. .NEW. refers <strong>to</strong> any new variables which have been created in <strong>the</strong> previous <strong>PPL</strong> clauses.<br />
.CHARACTER. refers <strong>to</strong> all <strong>the</strong> character variables and .NUMERIC. refers <strong>to</strong> all <strong>the</strong> numeric variables.<br />
LIST Dept<br />
[ KEEP Name Test.1 Test.2 ;<br />
GENERATE Pass = 1 ;<br />
GENERATE Test.Average = ( Test.1 + Test.2) / 2 ;<br />
IF Test.Average LT 65, SET Pass = 0 ;<br />
KEEP Name .NEW. ] $<br />
In this example, Name, <strong>the</strong> original variable, and <strong>the</strong> new variables created in this command, are included in <strong>the</strong><br />
list. .NEW. and .OTHERS. can be used both <strong>to</strong> rearrange and <strong>to</strong> select variables:
2.8 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
[ KEEP .NEW. .OTHERS. ]<br />
[ KEEP .OTHERS. Age .NEW. Race ]<br />
2.8 Masks and Wildcards<br />
Masks and wildcards are shortcuts that make it easier <strong>to</strong> refer <strong>to</strong> variables that have ei<strong>the</strong>r a pattern <strong>to</strong> <strong>the</strong>ir order<br />
in <strong>the</strong> input file or a common prefix or suffix in <strong>the</strong>ir names. Masks are strings of characters that mean “yes” and<br />
“no”. In this example, <strong>the</strong> inner paren<strong>the</strong>sis contains a mask (MASK 100) with a length of three:<br />
[ KEEP Test1 .ON. (MASK 100) ]<br />
A variable or case selection mask is a string of digits that are ei<strong>the</strong>r zeros or ones. Variable selection starts<br />
with <strong>the</strong> variable named Test1 and continues through <strong>the</strong> last variable in <strong>the</strong> file (.ON.), applying <strong>the</strong> mask <strong>to</strong> successive<br />
groups of three variables. Variables corresponding <strong>to</strong> mask values of 1 are kept (“yes”) and variables<br />
corresponding <strong>to</strong> mask values of 0 are dropped (“no”). In this example, if <strong>the</strong> variable named Test1 is in position<br />
6 in <strong>the</strong> file, variables 6, 9, 12, and so on, are selected. Variables 1 <strong>to</strong> 5, 7, 8, 10, 11, 13, 14, and so on, are not<br />
selected.<br />
Masks are particularly useful when <strong>the</strong> file contains repeating groups of variables and only some are needed<br />
for a particular analysis. Given <strong>the</strong>se variables:<br />
Date1, Grade1, Date2, Grade2, .... Date9, Grade9<br />
this variable selection clause selects <strong>the</strong> Grade variables:<br />
[ KEEP Grade1 TO Grade9 (MASK 10) ]<br />
The following variable selection clause could be used <strong>to</strong> reorder <strong>the</strong> variables so that all those whose names begin<br />
with “Date” are followed by those whose names begin with “Grade”:<br />
[ KEEP Date1 TO Grade9 (MASK 10)<br />
Date1 TO Grade9 (MASK 01) ]<br />
Masks may also be used in case selection clauses:<br />
[ CASES 5 .ON. ( MASK 1000 ) ]<br />
The question mark “?” is used as a wildcard, that is, <strong>to</strong> refer <strong>to</strong> any variables with a common prefix or suffix<br />
in <strong>the</strong>ir names. (The question mark replaces <strong>the</strong> asterisk, used in earlier versions of P-<strong>STAT</strong>, as <strong>the</strong> wildcard character.<br />
This avoids any possible confusion of <strong>the</strong> wildcard with <strong>the</strong> symbol for multiplication, which is <strong>the</strong><br />
asterisk.) This selection clause:<br />
[ KEEP Grade? ]<br />
keeps all variables beginning with <strong>the</strong> character string “Grade”. This clause:<br />
[ KEEP ?Batch ]<br />
keeps all variables ending with <strong>the</strong> character string “Batch”. Wildcard notation can be used <strong>to</strong> reorder <strong>the</strong> variables<br />
so that all <strong>the</strong> variables beginning <strong>the</strong> “Date” are followed by all <strong>the</strong> variables beginning with “Grade” and <strong>the</strong>n<br />
by any o<strong>the</strong>r variables in <strong>the</strong> same order that <strong>the</strong>y occur in <strong>the</strong> input file.<br />
[ KEEP Date? Grade? .OTHERS. ]<br />
The prefix or suffix used with <strong>the</strong> wildcard ? must be unique <strong>to</strong> <strong>the</strong> desired variables. This KEEP instruction:<br />
[ KEEP Family.ID <strong>Inc</strong>ome.Male.HH <strong>Inc</strong>ome.Fem.HH <strong>Inc</strong>ome.Total ]<br />
may be shortened <strong>to</strong>:<br />
[ KEEP Family.ID <strong>Inc</strong>ome.? ]<br />
However, if <strong>the</strong> file also contains <strong>the</strong> variables <strong>Inc</strong>ome.Last.Yr and <strong>Inc</strong>ome.Child, <strong>the</strong>y will also be kept. Sometimes,<br />
an error situation results because <strong>the</strong> wildcard reference is not unique:
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.9<br />
[ KEEP Family.ID <strong>Inc</strong>ome.? <strong>Inc</strong>ome.Total ]<br />
This KEEP instruction includes <strong>the</strong> variable <strong>Inc</strong>ome.Total twice, once in <strong>the</strong> middle of <strong>the</strong> file and a second time<br />
as <strong>the</strong> right-most variable. This is an error because each variable in a P-<strong>STAT</strong> system file must have a unique<br />
name.<br />
Case is ignored in wildcard selection. Variable ax and bx are both selected by ?x, or for that matter, by ?X.<br />
2.9 MODIFYING AND GENERATING VARIABLES<br />
Values are changed using <strong>the</strong> SET instruction, which “sets” an existing variable <strong>to</strong> new values. New variables are<br />
created using <strong>the</strong> GENERATE instruction, which “generates” a new variable with <strong>the</strong> specified values. If SET is<br />
used with a name that is not <strong>the</strong> name of a variable in <strong>the</strong> file, an error message is printed. If GENERATE is used<br />
with a name that already belongs <strong>to</strong> a variable in <strong>the</strong> file, an error message is printed. In general it is a good practice<br />
<strong>to</strong> generate all <strong>the</strong> variables that will be needed before any recodes or logical selections.<br />
2.10 Modifying Variables with SET<br />
The keyword SET indicates <strong>to</strong> P-<strong>STAT</strong> that modification is <strong>to</strong> be done <strong>to</strong> an existing variable. There are four elements<br />
<strong>to</strong> a SET clause:<br />
1. The keyword SET;<br />
2. The name or position of <strong>the</strong> variable that is <strong>to</strong> be modified;<br />
3. An equal-sign (=);<br />
4. The value or expression <strong>to</strong> be used as <strong>the</strong> new value of that variable.<br />
The format of <strong>the</strong> SET instruction is illustrated in Figure 2.3.<br />
__________________________________________________________________________<br />
Figure 2.3 Format of <strong>the</strong> SET Instruction<br />
SET Var Opera<strong>to</strong>r Expression<br />
[ SET Score = Test ]<br />
[ SET Score = Test1 + Test2 ]<br />
[ SET Score = SQRT ( Score ) ]<br />
[ SET Notes = 'Late 2 days' ]<br />
[ SET V(1) = V(1) + Test ]<br />
[ SET V(1) = 1 + V(1) ]<br />
A variable may be referred <strong>to</strong> by its name or by its position. Note that in a SET clause, constants are often<br />
used. Character constants must be enclosed in quotes. There is often no way <strong>to</strong> infer from <strong>the</strong> context whe<strong>the</strong>r a<br />
number is a constant or <strong>the</strong> position of a variable. Therefore, <strong>the</strong> <strong>PPL</strong> syntax rule is that a number by itself is a<br />
constant, and a number indicated with <strong>the</strong> V(n) notation refers <strong>to</strong> a variable position.<br />
In addition <strong>to</strong> distinguishing between constants and variable positions, <strong>the</strong> “V” notation references <strong>the</strong> vec<strong>to</strong>r<br />
containing <strong>the</strong> values for <strong>the</strong> current case. The subscript (<strong>the</strong> contents of <strong>the</strong> paren<strong>the</strong>sis) pointing in<strong>to</strong> <strong>the</strong> V vec<strong>to</strong>r<br />
may be a number or an expression. V(17) points <strong>to</strong> <strong>the</strong> value of <strong>the</strong> variable in <strong>the</strong> 17th position of a given case,<br />
in o<strong>the</strong>r words <strong>to</strong> its 17th variable. V(Region) points <strong>to</strong> <strong>the</strong> value of <strong>the</strong> first variable if Region is equal <strong>to</strong> 1 and<br />
<strong>to</strong> <strong>the</strong> value of variable 33 if Region is equal <strong>to</strong> 33. Calculation of variable positions is discussed in detail later in<br />
this manual.<br />
If <strong>the</strong> variable that follows <strong>the</strong> SET instruction is not found in <strong>the</strong> file, an error occurs:
2.10 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
ERROR... Variable Bad.Label, used in a <strong>PPL</strong> phrase,<br />
is not found in <strong>the</strong> file.<br />
In an interactive session, control is given <strong>to</strong> <strong>the</strong> internal P-<strong>STAT</strong> edi<strong>to</strong>r so that <strong>the</strong> error may be corrected and execution<br />
can continue.<br />
A series of SET instructions can be separated in<strong>to</strong> individual clauses or grouped <strong>to</strong>ge<strong>the</strong>r in <strong>the</strong> same modification<br />
clause:<br />
LIST File [ SET Score = SQRT (Score),<br />
SET Test = Test + 1,<br />
SET <strong>Inc</strong>hes = Feet / 12 ] $<br />
2.11 Using INCREASE and DECREASE Instead of SET<br />
These usages of SET increase or decrease <strong>the</strong> value of an existing variable ei<strong>the</strong>r by a constant or by an expression:<br />
[ SET Count = Count + 1 ]<br />
[ SET Total = Total + Score ]<br />
[ SET Used = Used - 3 ]<br />
SET clauses like <strong>the</strong>se may be expressed more simply using <strong>the</strong> instructions INCREASE and DECREASE:<br />
[ INCREASE Count ]<br />
[ INCREASE Total BY Score ]<br />
[ DECREASE Used BY 3 ]<br />
When BY is omitted, BY 1 is assumed. INCREASE may be abbreviated <strong>to</strong> INC and DECREASE may be abbreviated<br />
<strong>to</strong> DEC. Wearing new pants in <strong>the</strong> rain is an example of DECREASE.<br />
2.12 Creating New Variables with GENERATE<br />
The GENERATE instruction indicates that a new variable is <strong>to</strong> be created. It may be abbreviated <strong>to</strong> GEN. The<br />
format is like that of <strong>the</strong> SET instruction. GENERATE is immediately followed by <strong>the</strong> name of <strong>the</strong> variable <strong>to</strong> be<br />
created. This name must be one that does not already exist within <strong>the</strong> file. If a question mark (?) is used instead<br />
of a name, P-<strong>STAT</strong> generates a variable name. This name is <strong>the</strong> position of <strong>the</strong> variable in <strong>the</strong> file with <strong>the</strong> prefix<br />
VAR:<br />
LIST File [ GENERATE Total = Score1 + Score2 ] $<br />
LIST File [ GENERATE ID:C = Last.Name ] $<br />
LIST File [ GENERATE ? = MEAN ( XA TO XE ) ] $<br />
Character variables need “:C” or “:Cnn”, where nn is <strong>the</strong> maximum number of characters, directly after <strong>the</strong>ir<br />
names. When <strong>the</strong> number is not supplied, 16 is assumed. The following creates a new variable which can contain<br />
up <strong>to</strong> 30 characters:<br />
[ GENERATE ?:C30 = 'generated character variable';<br />
Once generated, <strong>the</strong> variables are referenced by just <strong>the</strong>ir names. The expression following <strong>the</strong> “=” in GENER-<br />
ATE is exactly like that following <strong>the</strong> “=” in SET. The MEAN function in <strong>the</strong> example above computes <strong>the</strong> means<br />
of <strong>the</strong> variables in <strong>the</strong> list following <strong>the</strong> function name.<br />
The difference between SET and GENERATE is that <strong>the</strong> variable referenced by SET must already exist while<br />
<strong>the</strong> variable referenced by GENERATE must not yet exist. If <strong>the</strong> variable referenced by GENERATE does exist,<br />
an error occurs:<br />
Error... Attempting <strong>to</strong> GENERATE a new variable named var4,<br />
but <strong>the</strong> name already exists in position 4.<br />
The variable name (label) and <strong>the</strong> position it currently occupies (n) are both supplied in <strong>the</strong> error message.
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.11<br />
The expression which follows <strong>the</strong> “=” in SET and GENERATE instructions can be ano<strong>the</strong>r variable, a constant,<br />
or a complicated expression involving variables, constants and functions. These are all valid expressions<br />
after an equal-sign:<br />
Age<br />
3.33<br />
'Sarah Wilson'<br />
SQRT ( V(3) + Age / 12 )<br />
RECODE ( Age, 80 TO 99 = 80 )<br />
The “+” and “/” are numeric opera<strong>to</strong>rs, whereas SQRT and RECODE are functions. SQRT is <strong>the</strong> square root function.<br />
The RECODE function allows individual values of a variable <strong>to</strong> be changed; it is discussed in detail later in<br />
this manual.<br />
If <strong>the</strong> GENERATED variable is not set <strong>to</strong> anything as in:<br />
[ GENERATE abc ]<br />
it is set <strong>to</strong> Missing 1.<br />
2.13 Numeric Opera<strong>to</strong>rs and <strong>the</strong>ir Order<br />
In <strong>the</strong> example above, <strong>the</strong> expression “Age / 12” is a numeric expression which requests <strong>the</strong> value of <strong>the</strong> variable<br />
Age, divided by <strong>the</strong> constant 12. The slash (/) is <strong>the</strong> symbol for division. The numeric or arithmetic opera<strong>to</strong>rs are:<br />
+ for addition<br />
- for subtraction<br />
* for multiplication<br />
/ for division<br />
** for exponentiation<br />
A series of unparen<strong>the</strong>sized numeric operations may not necessarily be performed from left <strong>to</strong> right. All exponentiation<br />
at a given paren<strong>the</strong>sis level is done first, followed by all multiplication and division, followed by all<br />
addition and subtraction. If <strong>the</strong>re is a series of additions and subtractions, <strong>the</strong>y are performed from left <strong>to</strong> right.<br />
If <strong>the</strong>re is a series of multiplications and divisions, <strong>the</strong>y are also performed from left <strong>to</strong> right. A series of exponentiations,<br />
however, is done from right <strong>to</strong> left. Therefore:<br />
A - B + C is done as ( A - B ) + C<br />
A / B * C is done as ( A / B ) * C<br />
A ** B ** C is done as A ** ( B ** C )<br />
A + B * C is done as A + ( B * C )<br />
A * B ** C is done as A * ( B ** C )<br />
If this order of execution is not <strong>the</strong> desired order, paren<strong>the</strong>ses may be used <strong>to</strong> enclose portions of a numeric<br />
expression. Operations within a pair of paren<strong>the</strong>ses are performed before operations outside, regardless of <strong>the</strong> order<br />
defined above. Thus,<br />
( A + B + C ) / 3<br />
would take <strong>the</strong> sum of A, B, and C and divide <strong>the</strong> result by 3. Without <strong>the</strong> paren<strong>the</strong>ses, <strong>the</strong> result would be C<br />
divided by 3 plus A and B, that is, A + B + ( C / 3 ).<br />
2.14 Functions<br />
P-<strong>STAT</strong> functions are special expressions which transform variables according <strong>to</strong> particular rules. For instance,<br />
<strong>the</strong> SQRT function calculates <strong>the</strong> square root of a variable or an expression. A variable or expression used by a<br />
function is called <strong>the</strong> “argument” of that function.<br />
Functions require at least one argument. Arguments follow <strong>the</strong> function name and are enclosed in paren<strong>the</strong>ses.<br />
Some of <strong>the</strong> functions, like <strong>the</strong> square-root function SQRT, require only a single argument, an expression
2.12 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
which is <strong>to</strong> be used by <strong>the</strong> function. This expression can be a variable name or position, a constant, ano<strong>the</strong>r function<br />
with its arguments, or a combination of such elements:<br />
[ SET Score = SQRT ( Score ) ]<br />
[ SET Score = SQRT ( 55 ) ]<br />
[ SET Score = SQRT ( Score + 33 ) ]<br />
[ SET Score = SQRT ( V(1) * .5 ) ]<br />
Functions like <strong>the</strong> square root function SQRT are called “numeric functions”. The argument for a numeric<br />
function is a single expression. If any of <strong>the</strong> elements in <strong>the</strong> expression is a missing value, <strong>the</strong> result is a missing<br />
value. If <strong>the</strong> expression yields a good value which is legal for <strong>the</strong> function, <strong>the</strong> function will produce an appropriate<br />
result. If <strong>the</strong> argument is invalid, like SQRT(-3), <strong>the</strong> result is set <strong>to</strong> missing.<br />
A number of functions, such as <strong>the</strong> MEAN function, operate on a list of arguments:<br />
[ SET AVERAGE = MEAN ( V(2) Test1 Test2 Test3 ) ]<br />
[ SET AVERAGE = MEAN ( V(2) Test1 TO Test3 ) ]<br />
Each argument is a numeric variable name or position. The function takes <strong>the</strong> list of arguments and yields a single<br />
value. For instance, <strong>the</strong> MEAN function illustrated above calculates <strong>the</strong> mean of <strong>the</strong> variables in <strong>the</strong> list.<br />
The functions, which are covered in detail later in this manual, can be broadly classified as:<br />
1. numeric functions such as SQRT<br />
2. list functions which operate on a list of variables such as MEAN<br />
3. character functions such as UPPER, LOWER and CAPS<br />
4. special functions. For example, RECODE and NCOT are used <strong>to</strong> recode <strong>the</strong> values of one or more<br />
variables. SPLIT and COLLECT are used for cross case data manipulation. Date and time functions<br />
have a chapter of <strong>the</strong>ir own later in this manual.<br />
2.15 LOGICAL SELECTION OF CASES<br />
Cases in a P-<strong>STAT</strong> system file may be selected or deleted from processing by logical testing. This is sometimes<br />
referred <strong>to</strong> as “filtering.” IF is <strong>the</strong> keyword that precedes all logical selections and modifications The following<br />
is a discussion of <strong>the</strong> simple logical IF. Full IF-THEN-ELSE blocks are discussed later in this manual.<br />
__________________________________________________________________________<br />
Figure 2.4 Format of <strong>the</strong> IF Clause<br />
Logical<br />
IF Exp 1 Opera<strong>to</strong>r Exp 2 Action<br />
[ IF Test1 EQ Test2, DELETE ]<br />
[ IF Test1 LT 3, RETAIN ]<br />
[ IF Test1 - 3 GE Test4 * .5, SET .... ]<br />
[ IF Test1 - V(3) GT SQRT (Test3), SET .... ]<br />
[ IF SQRT (Test1) GT .2, SET .... ]<br />
[ IF School EQ 'Longwood', SET .... ]<br />
__________________________________________________________________________<br />
The IF itself is usually composed of five parts:<br />
1. <strong>the</strong> keyword IF<br />
2. an expression
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.13<br />
3. an opera<strong>to</strong>r indicating <strong>the</strong> relationship between <strong>the</strong> expressions<br />
4. a second expression<br />
5. one or more action instructions <strong>to</strong> be taken.<br />
The format of <strong>the</strong> IF clause is illustrated in Figure 2.4. Expressions may be as simple as a variable name or a<br />
constant, or <strong>the</strong>y may be complex numeric or character expressions combining variables, constants and functions.<br />
Character constants need <strong>to</strong> be enclosed in single or double quotes.<br />
The V(n) notation provides a consistent means for differentiating between a constant and a variable position.<br />
Test1 - 3 means <strong>the</strong> value of <strong>the</strong> variable named Test1 minus <strong>the</strong> constant 3; Test1 - V(3) means <strong>the</strong> value of <strong>the</strong><br />
variable named Test1 minus <strong>the</strong> value of <strong>the</strong> variable located in position 3.<br />
Any of <strong>the</strong> expressions in <strong>the</strong> IF can be complex and refer <strong>to</strong> variables, constants or functions. The functions<br />
<strong>the</strong>mselves can call functions:<br />
[ IF INT ( SQRT (XA ) ) GT SQRT (XC) + 5, SET XF = 1 ]<br />
Here <strong>the</strong> square root of XA is computed. Then that result is truncated <strong>to</strong> an integer using <strong>the</strong> INT function. Finally,<br />
<strong>the</strong> result is compared with <strong>the</strong> result of <strong>the</strong> second expression, namely, <strong>the</strong> sum of <strong>the</strong> square root of XC and <strong>the</strong><br />
constant 5. If <strong>the</strong> comparison is evaluated as true, that is, if <strong>the</strong> value of <strong>the</strong> first expression is greater than <strong>the</strong><br />
value of <strong>the</strong> second expression when both expressions are non-missing, <strong>the</strong> specified action (SET XF = 1) occurs.<br />
The action taken after <strong>the</strong> evaluation of an IF clause is typically <strong>the</strong> modification of a variable’s value (SET),<br />
<strong>the</strong> keeping of a case (RETAIN), or <strong>the</strong> exclusion of a case ( DELETE).<br />
RETAIN keeps a case and passes it <strong>to</strong> <strong>the</strong> next <strong>PPL</strong> clause or, if <strong>the</strong>re are no more clauses, <strong>to</strong> <strong>the</strong> current<br />
command. The case is passed through <strong>to</strong> <strong>the</strong> current command, unless it is deleted in a subsequent <strong>PPL</strong> clause.<br />
DELETE does not pass a case <strong>to</strong> any subsequent <strong>PPL</strong> clauses or <strong>to</strong> <strong>the</strong> current command. It deletes <strong>the</strong> case<br />
from any fur<strong>the</strong>r modification or testing, and from <strong>the</strong> current command. In o<strong>the</strong>r words, <strong>the</strong> processing of <strong>PPL</strong><br />
ceases for that case. The next case is read and <strong>PPL</strong> is restarted with <strong>the</strong> new case.<br />
The action that follows <strong>the</strong> IF test is usually taken only if <strong>the</strong> expression is true.<br />
IF Test GE 65, SET Pass = 'true', is <strong>the</strong> same as:<br />
IF Test GE 65, T.SET Pass = 'true',<br />
Any action can be prefaced by any combination of <strong>the</strong> letters “T” for true, “F” for false, and “M” for missing <strong>to</strong><br />
control how <strong>the</strong> results of <strong>the</strong> IF test are <strong>to</strong> be evaluated.<br />
IF Test LT 65, MF.SET Pass = 'false', T.SET Pass = 'true';<br />
The action section can contain multiple actions, each one prefaced with appropriate “TMF” combinations.<br />
2.16 Logical Opera<strong>to</strong>rs<br />
The basic logical opera<strong>to</strong>rs are <strong>the</strong> following:<br />
Meaning Symbol<br />
equal EQ<br />
not equal NE<br />
less than LT<br />
less than or equal LE<br />
greater than GT<br />
greater than or equal GE<br />
Each expression in an IF clause is analyzed and a value is computed. The expressions are <strong>the</strong>n compared according<br />
<strong>to</strong> <strong>the</strong> logical opera<strong>to</strong>r that was used. If <strong>the</strong> logical opera<strong>to</strong>r correctly describes <strong>the</strong> relationship between<br />
<strong>the</strong> expressions, <strong>the</strong> IF statement is evaluated as true. If it is incorrect, <strong>the</strong> IF statement is false. If ei<strong>the</strong>r expression<br />
is missing so that <strong>the</strong> comparison cannot be made, <strong>the</strong> IF is evaluated as missing.<br />
The logical opera<strong>to</strong>rs may be prefaced with “X” for eXact comparisons of character strings:
2.14 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
[ IF Symp<strong>to</strong>m XEQ 'a', RETAIN ]<br />
When an exact comparison is specified, <strong>the</strong> case of <strong>the</strong> character string must be exactly <strong>the</strong> same as that of <strong>the</strong> test<br />
string for <strong>the</strong> IF statement <strong>to</strong> be true. In <strong>the</strong> example above, <strong>the</strong> value of Symp<strong>to</strong>m must be a lower case “a” for<br />
<strong>the</strong> case <strong>to</strong> be retained; an upper case “A” would be evaluated as false. When <strong>the</strong> logical opera<strong>to</strong>rs are not prefaced<br />
with “X”, <strong>the</strong> case of character strings is not relevant.<br />
In addition <strong>to</strong> <strong>the</strong>se opera<strong>to</strong>rs <strong>the</strong>re are 6 logical opera<strong>to</strong>rs which are used <strong>to</strong> compare dates and times. They<br />
are described in Chapter 10, “<strong>PPL</strong>: Date and Time Commands and Functions”.<br />
2.17 The Special Opera<strong>to</strong>rs MISSING and GOOD<br />
The values .M. and .G. are <strong>the</strong> system values for missing and good. Missing can be fur<strong>the</strong>r specified as .M1., .M2.<br />
and .M3. . Note that names for system values and system variables look much like variable names except that <strong>the</strong>y<br />
begin and end with a decimal point. This:<br />
[ IF Age EQ .M., DELETE ]<br />
may be used <strong>to</strong> delete any case with a missing value of Age. Note that when an IF statement is used <strong>to</strong> explicitly<br />
test for missing or good values, it has only a true or false result.<br />
The two special opera<strong>to</strong>rs MISSING and GOOD can also be used <strong>to</strong> test whe<strong>the</strong>r or not missing data are present<br />
in an expression:<br />
[ IF Test1 MISSING, is <strong>the</strong> same as<br />
[ IF Test1 EQ .M.,<br />
[ IF Test1 GOOD, is <strong>the</strong> same as<br />
[ IF Test1 EQ .G.,<br />
The special opera<strong>to</strong>rs MISSING and GOOD combine <strong>the</strong> “EQ” (=) opera<strong>to</strong>r and <strong>the</strong> system value .M. or .G.<br />
in<strong>to</strong> a single keyword. MISSING1, MISSING2, and MISSING3 can be used in <strong>the</strong> same way as MISSING <strong>to</strong> test<br />
specifically for <strong>the</strong> individual types of missing.<br />
[ IF Age MISSING3, DELETE ]<br />
Here, a case is deleted if Age equals <strong>the</strong> system value for missing type 3.<br />
2.18 AND and OR Relationships<br />
An IF may consist of a series of logical relationships linked by AND or OR. For example:<br />
[ IF Age GE 14 AND Sex EQ 1, SET Membership = 2 ]<br />
[ IF Age LT 14 AND Sex EQ 1 OR V(1) EQ 77, DELETE ]<br />
There can be many ANDs and ORs and <strong>the</strong>y can be nested. Paren<strong>the</strong>ses control <strong>the</strong> order in which <strong>the</strong> parts of <strong>the</strong><br />
expression are evaluated:<br />
[ IF<br />
( Age GT 21 OR ( Voter EQ 2 AND Married EQ 1 ) )<br />
AND<br />
( Education GT 12 OR ( Job EQ 4 AND <strong>Inc</strong>ome GT 20000 ) ),<br />
RETAIN ]<br />
This example illustrates <strong>the</strong> types of complex expression that are possible. However, a frequent cause of an empty<br />
file (no cases found) is an IF with expressions so complex that <strong>the</strong> user cannot follow <strong>the</strong> logic.
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.15<br />
__________________________________________________________________________<br />
Figure 2.5 AND and OR: Evaluations of Expressions<br />
In <strong>the</strong> following table, <strong>the</strong> evaluations of <strong>the</strong> expressions are:<br />
t for true, f for false and m for missing.<br />
EXPRESSIONS: EVALUATIONS:<br />
Exp1 Exp2 Exp1 AND Exp2 Exp1 OR Exp2<br />
t t t t<br />
t f f t<br />
t m m t<br />
f t f t<br />
f f f f<br />
f m f m<br />
m t m t<br />
m f f m<br />
m m m m<br />
__________________________________________________________________________<br />
Unless paren<strong>the</strong>ses indicate o<strong>the</strong>rwise, ANDs are done before ORs.<br />
Paren<strong>the</strong>ses determine clusters of logic that get evaluated as a piece. In <strong>the</strong> previous example, if <strong>the</strong> value of<br />
Age is not greater than 21 or if Age is missing, <strong>the</strong> expression:<br />
( Voter EQ 2 AND Married EQ 1 )<br />
needs <strong>to</strong> be evaluated. If this expression is also not true, <strong>the</strong> entire modification clause cannot be true, and <strong>the</strong> rest<br />
of <strong>the</strong> clause does not need <strong>to</strong> be processed. However, if this expression is true, <strong>the</strong> next expression:<br />
( Education GT 12 OR ( .... ) )<br />
is evaluated in <strong>the</strong> same manner. If Education is greater than 12, <strong>the</strong>re is no need <strong>to</strong> complete <strong>the</strong> evaluation of<br />
<strong>the</strong> expression. A true result is returned and <strong>the</strong> case continues <strong>to</strong> <strong>the</strong> next <strong>PPL</strong> clause. However, if Education is<br />
not greater than 12 or is missing, <strong>the</strong> evaluation proceeds because <strong>the</strong> expression following <strong>the</strong> OR might be true.<br />
Figure 2.5 contains a table which shows <strong>the</strong> interaction of true, false and missing evaluations with <strong>the</strong> AND<br />
and OR opera<strong>to</strong>rs. The following example illustrates OR with three different evaluations:<br />
( IF Occupation EQ 40 OR Education EQ 12 )<br />
Occupation Education Evaluation<br />
43 (f) 12 (t) true<br />
43 (f) 16 (f) false<br />
43 (f) - (m) missing<br />
In <strong>the</strong> third example, <strong>the</strong> first expression is false (Occupation is not 40), but <strong>the</strong> second expression is nei<strong>the</strong>r<br />
false nor true because <strong>the</strong> value for Education is missing.<br />
Because AND has precedence over OR some of <strong>the</strong> paren<strong>the</strong>ses in <strong>the</strong> previous example can be omitted.
2.16 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
can be written as<br />
[ IF<br />
( Age GT 21 OR ( Voter EQ 2 AND Married EQ 1 ) )<br />
[ IF ( Age GT 21 OR Voter EQ 2 AND Married EQ 1 )<br />
The full statement reduces <strong>to</strong>:<br />
[ IF ( Age GT 21 OR Voter EQ 2 AND Married EQ 1 )<br />
AND<br />
( Education GT 12 OR Job EQ 4 AND <strong>Inc</strong>ome GT 20000 ),<br />
However, <strong>the</strong> use of <strong>the</strong> paren<strong>the</strong>ses is recommended whenever <strong>the</strong> logic is complex with a mixture of AND and<br />
OR phrases.<br />
2.19 Common Errors in Complex Expressions<br />
The most common errors in constructing a complex expression occur because <strong>the</strong> relationship that follows <strong>the</strong> IF,<br />
OR, or AND is not complete. For example:<br />
[ IF Occupation EQ 3 OR Occupation EQ 4, .... is correct<br />
[ IF Occupation EQ 3 OR 4, .... is incorrect<br />
In <strong>the</strong> second example above, “Occupation EQ 3” is complete. It has <strong>the</strong> proper three parts with “Occupation” as<br />
<strong>the</strong> first expression, “EQ” as <strong>the</strong> opera<strong>to</strong>r, and “3” as <strong>the</strong> second expression. However,<br />
[ IF Occupation EQ 3 OR 4 EQ Occupation,<br />
is also allowed. Since a number on ei<strong>the</strong>r side of <strong>the</strong> opera<strong>to</strong>r is a valid expression, <strong>the</strong> 4 following <strong>the</strong> OR (in <strong>the</strong><br />
earlier example):<br />
[ IF Occupation EQ 3 OR 4, ....<br />
is interpreted as <strong>the</strong> first expression in a clause. The “,” which follows it is not a legal opera<strong>to</strong>r. Error messages<br />
indicate what was expected in <strong>the</strong> clause and what was found:<br />
LIST Patients [ IF Occupation EQ 3 or 4, RETAIN ] $<br />
ERROR... Expected a logical opera<strong>to</strong>r like EQ<br />
RETAIN ] $<br />
A second common source of error is <strong>to</strong> include <strong>the</strong> IF for each relationship in <strong>the</strong> complex statement. The<br />
following is correct:<br />
This is incorrect:<br />
[ IF Occupation EQ 3 OR Occupation EQ 4, ....<br />
[ IF Occupation EQ 3 OR IF Occupation EQ 4, ....<br />
In this example, an error message results because IF is a legal name for a variable:<br />
LIST KK [ IF Occupation EQ 3 OR IF Occupation EQ 4,<br />
RETAIN ] $<br />
ERROR... Expected a logical opera<strong>to</strong>r like EQ<br />
Occupation EQ 4,<br />
Since IF is a legal variable name, and a variable is a valid expression, P-<strong>STAT</strong> is expecting <strong>the</strong> next character<br />
string <strong>to</strong> be an opera<strong>to</strong>r such as EQ or LT. The variable name Occupation is not a legal opera<strong>to</strong>r and an error condition<br />
occurs.
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.17<br />
2.20 AMONG and NOTAMONG<br />
Two o<strong>the</strong>r logical opera<strong>to</strong>rs, AMONG and NOTAMONG, simplify <strong>the</strong> specification of logical relationships.<br />
They follow an initial expression and require a list of values and variables (not a second expression) as <strong>the</strong>ir<br />
argument.<br />
The argument list for AMONG and NOTAMONG contains individual values and ranges of values. This logical<br />
clause:<br />
[ IF Test.Score AMONG ( 90 TO 100 ),<br />
SET High = 1 ]<br />
produces exactly <strong>the</strong> same result as:<br />
[ IF Test.Score GE 90 AND<br />
Test.Score LE 100, SET High = 1 ]<br />
AMONG is easier <strong>to</strong> type and <strong>to</strong> understand.<br />
NOTAMONG is used similarly:<br />
[ IF Test.Score NOTAMONG ( 90 TO 100 ), DELETE ]<br />
Any cases with values on Test.Score that are below 90 or over 100 are deleted. Cases with missing values are<br />
not deleted. Prefixing <strong>the</strong> consequence:<br />
[ IF Test.Score NOTAMONG ( 90 TO 100 ), TM.DELETE ]<br />
deletes cases with missing values as well. The system variables for missing values (.M.) may not be included in<br />
<strong>the</strong> argument list for AMONG or NOTAMONG.<br />
AMONG and NOTAMONG are particularly powerful when multiple values are specified. Thus:<br />
[ IF Religion EQ 1 OR ( Religion GE 3<br />
AND Religion LE 5 ) OR Religion EQ 7<br />
OR Religion EQ 9, SET Protestant = 1 ]<br />
is exactly <strong>the</strong> same as:<br />
[ IF Religion AMONG ( 1, 3 TO 5, 7, 9 ),<br />
SET Protestant = 1 ]<br />
The arguments for <strong>the</strong> opera<strong>to</strong>rs AMONG and NOTAMONG are lists of values (constants) and variables;<br />
<strong>the</strong>y cannot be complex expressions but <strong>the</strong>y can be scratch variables. The use of commas separating <strong>the</strong> AMONG<br />
values is optional. In this example, <strong>the</strong> arguments for NOTAMONG are variable names:<br />
[ SET Low.Score = MIN ( Test1 TO Test10 );<br />
SET High.Score = MAX ( Test1 TO Test10 );<br />
IF Final.Exam NOTAMONG ( Low.Score TO High.Score ),<br />
RETAIN ]<br />
The MIN function yields <strong>the</strong> minimum value of a list of variables, which can include ranges, wildcards and<br />
.ON. . The MAX function yields <strong>the</strong> maximum value. Here <strong>the</strong>se functions are used <strong>to</strong> find <strong>the</strong> lowest and highest<br />
scores on a series of tests. If <strong>the</strong> value of <strong>the</strong> variable named Final.Exam is less than <strong>the</strong> lowest value or above <strong>the</strong><br />
highest value, <strong>the</strong> case is retained. The retained cases are students who have done ei<strong>the</strong>r better or worse than expected,<br />
given <strong>the</strong>ir scores on Test1 <strong>to</strong> Test10.<br />
AMONG and NOTAMONG may be prefaced with “X” for eXact comparisons of character strings. When<br />
exact comparisons are specified, <strong>the</strong> string must be identical and <strong>the</strong> case (upper, lower or mixed) must also be<br />
identical:<br />
[ IF Symp<strong>to</strong>m XAMONG ( 'a' 'A' 'Aa' ), RETAIN ]<br />
For example, cases with “aa”, “AA” and “aA” as values of Symp<strong>to</strong>m would not be retained.
2.18 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
2.21 MISSING DATA with AMONG and NOTAMONG<br />
If <strong>the</strong> value being tested is a missing value, <strong>the</strong> result of <strong>the</strong> IF will be missing unless it matches a missing value<br />
in <strong>the</strong> argument list. If variable TestScore has a value of MISSING2 <strong>the</strong> following <strong>PPL</strong>:<br />
[ IF TestScore AMONG ( 60 TO 100 ) T.SET Grade = 'Pass',<br />
F.SET Grade = 'Fail'.<br />
M.SET Grade = '<strong>Inc</strong>omplete' ]<br />
as expected, produces <strong>the</strong> missing result of “incomplete”.<br />
[ IF TestScore AMONG ( 60 TO 100, .M1. ) ....<br />
also produces <strong>the</strong> missing result when variable TestScore is MISSING2. However, <strong>the</strong> statement:<br />
[ IF TestScore AMONG ( 60 TO 100, .M2. ) ....<br />
produces a result of true and variable Grade has a value of “Pass” for that case.<br />
2.22 INRANGE and OUTRANGE<br />
INRANGE and OUTRANGE can be used when <strong>the</strong> test <strong>to</strong> be done is for a single range of values.<br />
[ IF TestScore INRANGE ( 60, 100 ), SET Grade = 'Pass' ]<br />
[ IF Age OUTRANGE ( 13, 19 ), DELETE ]<br />
These two examples can also be done using AMONG with <strong>the</strong> keyword “TO”. The reason for including functions<br />
INRANGE and OUTRANGE is because <strong>the</strong> names of <strong>the</strong>se functions are more intuitive for some situations than<br />
AMONG and NOTAMONG.<br />
2.23 ANY and ALL<br />
There are two o<strong>the</strong>r logical opera<strong>to</strong>rs that may follow an IF: ANY and ALL. They must be followed by a list of<br />
variables. ANY is equivalent <strong>to</strong> a series of ORs. This example:<br />
is <strong>the</strong> same as:<br />
[ IF Q11 GT 10 OR Q12 GT 10 OR Q13 GT 10 OR Q14 GT 10, DELETE ]<br />
[ IF ANY ( Q11 TO Q14 ) GT 10, DELETE ]<br />
ALL is equivalent <strong>to</strong> a series of ANDs. This example:<br />
is <strong>the</strong> same as:<br />
[ IF Q11 GOOD AND Q12 GOOD AND Q13 GOOD<br />
AND Q14 GOOD, RETAIN ]<br />
[ IF ALL ( Ql1 TO Ql4 ) GOOD, RETAIN ]<br />
The argument list which follows ALL and ANY may contain variable names or variable positions. The variable<br />
positions are indicated by V(n). A common use of ANY or ALL selects cases with good (non-missing) data<br />
on all of <strong>the</strong> variables. Ei<strong>the</strong>r of <strong>the</strong>se statements does this:<br />
[ IF ALL ( V(1) .ON. ) GOOD, RETAIN ]<br />
[ IF ANY ( V(1) .ON. ) MISSING, DELETE ]<br />
2.24 INSTRUCTIONS AFTER IF<br />
The IF statement is incomplete by itself. It must be followed by an instruction that describes <strong>the</strong> action <strong>to</strong> be taken<br />
as a consequence of <strong>the</strong> IF test. A comma (,) is <strong>the</strong> punctuation that separates <strong>the</strong> IF and <strong>the</strong> instruction which<br />
follows. The three most common instructions that follow an IF are: DELETE and RETAIN for conditional case
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.19<br />
selection, and SET for variable recoding. Functions such as LAG and DIF which work across cases should seldom<br />
be used following an IF. See <strong>the</strong> discussions of LAG and DIF for an example.<br />
__________________________________________________________________________<br />
Figure 2.6 IF and Missing Data<br />
Case<br />
Num Age Race<br />
Given <strong>the</strong>se The cases below are <strong>the</strong> ones<br />
five cases: 1 29 2 given <strong>to</strong> a P-<strong>STAT</strong> command<br />
2 31 4 as a result of evaluation of<br />
3 - 2 <strong>the</strong> IF clauses on <strong>the</strong> left:<br />
4 - 4<br />
5 32 -<br />
Case<br />
Num Age Race<br />
1. IF Age LE 30, RETAIN 1 29 2<br />
2. IF Age GT 30, DELETE 1 29 2<br />
3 - 2<br />
4 - 4<br />
3. IF Age GOOD AND Race GT 3, RETAIN 2 31 4<br />
4. IF Age MISSING OR Race LE 3, DELETE 2 31 4<br />
5 32 -<br />
__________________________________________________________________________<br />
2.25 Conditional Case Selection<br />
Cases may be retained or deleted as <strong>the</strong> result of logical evaluations. Figure 2.6 shows four conditional case selections<br />
which appear straightforward. In <strong>the</strong> first IF clause, <strong>the</strong>re is no ambiguity. Only <strong>the</strong> first of <strong>the</strong> five cases<br />
in <strong>the</strong> figure has a value for variable Age which is both non-missing and less than or equal <strong>to</strong> 30. Since action is<br />
taken only when <strong>the</strong> result of an IF is true, only that one case is retained.<br />
The second IF in Figure 2.6 looks like <strong>the</strong> first IF. The second and fifth cases, which have non-missing values<br />
greater than 30, are deleted. However, <strong>the</strong> third and fourth cases, which were not retained in <strong>the</strong> first IF because<br />
of missing values on Age, are not deleted in <strong>the</strong> second IF for <strong>the</strong> same reason. When a value is missing, <strong>the</strong> result<br />
of an IF is missing ra<strong>the</strong>r than true. Unless explicitly specified o<strong>the</strong>rwise, actions that follow an IF are done only<br />
when <strong>the</strong> result of <strong>the</strong> IF is true. If <strong>the</strong> result is false or missing, <strong>the</strong> action is not done.<br />
The fourth IF in Figure 2.6 is similarly affected because <strong>the</strong>re is a missing value of Race in case 5. Since <strong>the</strong><br />
result of <strong>the</strong> IF will be missing for that case, it is not deleted from <strong>the</strong> file.<br />
2.26 Conditional Modification<br />
The keyword SET can ei<strong>the</strong>r begin a modification clause or it can be used as an instruction following an IF. In<br />
each of <strong>the</strong>se examples, <strong>the</strong> IF expression is evaluated first:<br />
-------- THE IF -------- -------- THE SET --------<br />
[IF Age EQ 1, SET Age = 99 ]
2.20 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
[IF <strong>Inc</strong>ome MISSING, SET <strong>Inc</strong>ome = 0 ]<br />
[IF Test3 LT Test1, SET Sum = Test2 / 2 ]<br />
[IF Total EQ V(1) + 3, SET Sum = Count * 2 ]<br />
[IF Sum - .5 LT 0, F.SET V(1) = V(3) ]<br />
[IF MEAN (T1 TO T3) LT 65, SET Sum = .M. ]<br />
Except in <strong>the</strong> fifth example, <strong>the</strong> SET is done only if <strong>the</strong> expression is true. In <strong>the</strong> fifth example, F.SET causes <strong>the</strong><br />
SET <strong>to</strong> occur only if <strong>the</strong> expression is false.<br />
In some situations, an IF test may have more than one desired consequence. Multiple instructions may directly<br />
follow <strong>the</strong> IF, within <strong>the</strong> same clause. In <strong>the</strong> following example, if Work.Status equals 3, four instructions<br />
follow — a RETAIN, a and three SETS:<br />
[ IF Work.Status EQ 3, RETAIN,<br />
SET Current.Job = 0,<br />
SET Current.<strong>Inc</strong>ome = 0,<br />
SET Total.Hours = .M1. ]<br />
2.27 Three-Way Logic of IF Statements<br />
Three-way (true, false and missing) logic in <strong>the</strong> evaluation of IF statements is powerful and gives precise control<br />
over data:<br />
[ IF Age GE 18, T.SET Voter = 1,<br />
FM.SET Voter = 0 ]<br />
However, <strong>to</strong> use this power and obtain <strong>the</strong> expected results, consideration must be given <strong>to</strong> <strong>the</strong> treatment of missing<br />
data.<br />
This is especially true with logical selection of cases. DELETE is occasionally useful, but RETAIN is better<br />
because its treatment of missing data is more natural. The action which follows <strong>the</strong> IF is normally done only if<br />
<strong>the</strong> result of <strong>the</strong> IF is true. However, it is possible <strong>to</strong> direct <strong>the</strong> action explicitly by using <strong>the</strong> prefixes T, F and M<br />
before <strong>the</strong> action instruction. The consequence DELETE actually means T.DELETE or delete if true. TM.DE-<br />
LETE deletes a case if <strong>the</strong> result of <strong>the</strong> IF is ei<strong>the</strong>r true or missing and yields <strong>the</strong> expected result. Thus:<br />
[ IF Age GT 30, TM.DELETE ] is <strong>the</strong> same as<br />
[ IF Age LE 30, T.RETAIN ] which is <strong>the</strong> same as<br />
[ IF Age LE 30, RETAIN ]<br />
Similarly, F.DELETE means delete if <strong>the</strong> result of <strong>the</strong> IF is false.<br />
There may be multiple consequences of a given IF. he following are possible combinations of instructions:<br />
[ IF logical expression, T.SET ..., F.SET ..., M.SET ... ]<br />
[ IF logical expression, TM.SET ..., F.SET ... ]<br />
[ IF logical expression, TFM.SET ..., FM.SET ... ]<br />
All combinations of T, F and M, in any order, are permitted as prefixes <strong>to</strong> <strong>the</strong> consequences of an IF. TFM.SET<br />
causes <strong>the</strong> action <strong>to</strong> occur, whatever <strong>the</strong> result of <strong>the</strong> IF.<br />
If a prefix is not given, T is always assumed no matter what prefix was used in <strong>the</strong> previous consequence:<br />
[ IF Age GT 18, F.SET Minor.Child = 1,<br />
SET Voter = 1 ]<br />
The variable Voter is set <strong>to</strong> 1 if <strong>the</strong> expression is true.<br />
In this example, <strong>the</strong> consequences are more complex:<br />
[ IF Sex EQ 1 AND Work.Status GE 2,<br />
T.SET Occupation = Last.Occup,<br />
F.SET Occupation = Current.Occup,
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.21<br />
M.DELETE ]<br />
The evaluation still returns a single logical result of true, false or missing, and <strong>the</strong> various actions are done<br />
accordingly.<br />
2.28 Renaming Variables<br />
RENAME is <strong>the</strong> <strong>PPL</strong> instruction that is used <strong>to</strong> rename individual variables.<br />
[ RENAME Test1 TO Math121;<br />
RENAME V(2) TO Chem34 ]<br />
RENAME requires <strong>the</strong> existing name, TO, and <strong>the</strong> new name, which must be a unique name in <strong>the</strong> file. If you<br />
wish <strong>to</strong> rename most of <strong>the</strong> variables in <strong>the</strong> file with names that have no particular pattern you can use a MODIFY<br />
with an on-<strong>the</strong>-fly concatenation of files which is described in <strong>the</strong> chapter “<strong>PPL</strong>:MODIFY, PROCESS and PUT”..<br />
If you wish <strong>to</strong> rename a group of variables using a pattern such as a prefix, suffix, or sequence number, see <strong>the</strong><br />
chapter “<strong>PPL</strong>:DO LOOPS and IF-THEN-ELSE Blocks”.
2.22 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
<strong>PPL</strong><br />
SUMMARY<br />
<strong>Programming</strong> language modifications may be used in <strong>the</strong> MODIFY command or in any o<strong>the</strong>r P-<strong>STAT</strong><br />
command. <strong>PPL</strong> statements begin with a left bracket and end with a right bracket. They follow <strong>the</strong> input<br />
file directly (with no intervening punctuation):<br />
LIST Patients [ DROP Hospital ;<br />
IF Age GE 65, RETAIN ] ,<br />
BY Diagnosis, MEAN Length.of.Stay $<br />
Modifications are done in <strong>the</strong> programming language using:<br />
• Instructions<br />
• Opera<strong>to</strong>rs<br />
• Functions<br />
• System variables<br />
Both character and numeric variables may be modified. Some instructions, opera<strong>to</strong>rs and functions apply<br />
<strong>to</strong> both types of variables, and o<strong>the</strong>rs apply only <strong>to</strong> one type. Some system variables take on both character<br />
and numeric values, and o<strong>the</strong>rs take on only one type of value.<br />
Wildcards may be used anywhere that <strong>the</strong> name of a variable could be used<br />
[ KEEP ?test Weight?]<br />
or <strong>to</strong> request that P-<strong>STAT</strong> supply a name for a new variable. For example:<br />
[ GENERATE ? = V(2) / V(3) ]<br />
[ GEN ?:C = 'No comments' ]<br />
The “question mark” (?) is <strong>the</strong> wildcard character. Wildcards may be used in lists — in KEEP, DROP,<br />
SPLIT, COLLECT and DO loop instructions, after ANY and ALL opera<strong>to</strong>rs, and following list functions<br />
such as MEAN, SUM, MAX, MIN and SDEV.<br />
Comments may be interspersed among <strong>PPL</strong> clauses:<br />
[/* Selecting cases with outstanding balances */ ;<br />
IF Amount.Owed GT 0, RETAIN ]<br />
The whole comment is a <strong>PPL</strong> clause following ei<strong>the</strong>r a left bracket or a semicolon. The comment text<br />
follows “/*” and is followed by “*/”. <strong>PPL</strong> comments document modifications within a command.<br />
The C.TRANSPOSE command may be used <strong>to</strong> rotate a newly-modified file,<br />
C.TRANSPOSE File12 [ CASES 1 TO 10 ], OUT File12.Chr $<br />
producing an output file containing character representations of <strong>the</strong> data in <strong>the</strong> original file. In <strong>the</strong> transposed<br />
file, <strong>the</strong> variables (columns) are Variable, Case.1, Case.2, Case.3 and so on. The cases (rows) are<br />
<strong>the</strong> names and values of all of <strong>the</strong> variables. Thus, <strong>the</strong> first 10 or so cases in <strong>the</strong> file may be examined in<br />
a concise prin<strong>to</strong>ut — use LIST with FOLD, if necessary. FOLD causes long character variables <strong>to</strong> be<br />
broken in<strong>to</strong> pieces and printed on several lines.<br />
nn=number variable name/position vn=variable name exp=expression
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.23<br />
<strong>PPL</strong> Instructions<br />
The following instructions may begin a modification clause. CASE selections are done before o<strong>the</strong>r<br />
modifications.<br />
CASES nn nn<br />
specifies a list of <strong>the</strong> positions (in ascending order) of cases <strong>to</strong> be selected:<br />
[ CASES 2 5 11 TO 99 333 .ON. ]<br />
Ei<strong>the</strong>r CASE or CASES may be used for case selection. (ROW and ROWS are synonyms.) Case selections<br />
are done before all o<strong>the</strong>r <strong>PPL</strong> modifications.<br />
DECREASE vnp<br />
recodes an existing numeric variable by decreasing its value by 1 or a specified amount:<br />
[ DECREASE Counter ;<br />
DEC Days BY 7 ;<br />
DEC Profit BY Expenses ]<br />
DEC is an abbreviation for DECREASE.<br />
DROP vnp vnp<br />
DELETE<br />
specifies a list of variables, by name or position, <strong>to</strong> be dropped:<br />
[ DROP <strong>Inc</strong>ome V(4) TO V(10) V(26) .ON. ]<br />
Unspecified variables in <strong>the</strong> input file are kept. .ON. means “on through <strong>the</strong> end of <strong>the</strong> variables in <strong>the</strong><br />
file.” Ei<strong>the</strong>r KEEP or DROP may be used for variable selection. Wildcards, .NUMERIC., .CHARAC-<br />
TER., and .NEW. can be used.<br />
specifies that <strong>the</strong> current case not pass <strong>to</strong> any subsequent <strong>PPL</strong> clauses or <strong>to</strong> <strong>the</strong> command in use. Cases<br />
not deleted are retained. DELETE is used as a consequence of an IF test.<br />
GENERATE vn = exp<br />
creates a new numeric or character variable:<br />
[ GENERATE Average = MEAN ( Score1 Score2 ) ;<br />
GEN Current.Age = Year - Birth.Year ;<br />
GEN Area.Code:C = '609' ]<br />
GENERATE requires a new variable name. If <strong>the</strong> new variable is a character variable, <strong>the</strong> name must be<br />
followed by “:C” ,“:Cnn”, “:nn” or “:cnn”, where nn is a number indicating <strong>the</strong> maximum number of<br />
characters in <strong>the</strong> variable. When <strong>the</strong> number (nn) is not supplied, 16 is assumed. GENERATE may be<br />
abbreviated <strong>to</strong> GEN.<br />
IF exp op exp, consequence<br />
specifies a logical selection. The format of an IF clause is:<br />
[ IF exp logical opera<strong>to</strong>r exp , consequence ]<br />
[ IF Age LE 65 , RETAIN ]<br />
[ IF City EQ 'Miami' , DELETE ]<br />
[ IF (V(4) + 1) EQ V(5) , DEC V(4) ]<br />
[ IF 'yes' EQ Answer.4 , DELETE ]<br />
vn=variable name exp=expression nn=number variable name/position
2.24 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
The expressions may be simple or complex numeric or character expressions. Character strings must be<br />
enclosed in single or double quotes. The logical opera<strong>to</strong>rs may be any of those in <strong>the</strong> subsequent section:<br />
<strong>PPL</strong> Logical Opera<strong>to</strong>rs. The consequences may be any of <strong>the</strong>se <strong>PPL</strong> instructions: DELETE, RETAIN,<br />
SET, INCREASE or DECREASE. (Additional instructions which may be used as consequences of an<br />
IF are explained and summarized in <strong>the</strong> second of <strong>the</strong> <strong>PPL</strong> chapters.)<br />
Consequences may be prefixed with T, F, or M, singly (T.SET ... , F.DELETE ... ) or in combination<br />
(FM.SET ... , TFM.INCREASE ... ), <strong>to</strong> direct whe<strong>the</strong>r <strong>the</strong> consequence should be performed when <strong>the</strong><br />
result of <strong>the</strong> IF is true, false or missing. T is assumed if no prefix is supplied.<br />
INCREASE vnp<br />
recodes an existing numeric variable by increasing its value by 1 or a specified amount:<br />
[ INCREASE Counter ;<br />
INC Days BY 7 ;<br />
INC Profit BY Sales ]<br />
INC is an abbreviation for INCREASE.<br />
KEEP vnp vnp<br />
specifies a list of variables, by name or position, <strong>to</strong> be kept or simply reordered:<br />
[ KEEP Name V(13) TO V(44) Education V(49) ]<br />
Unspecified variables in <strong>the</strong> input file are dropped. The system variables .NEW. and .OTHERS. may be<br />
used with KEEP <strong>to</strong> refer <strong>to</strong> variables newly generated in this command and any variables not explicitly<br />
mentioned:<br />
[ KEEP .NEW. ID.Number .OTHERS. ]<br />
Ei<strong>the</strong>r KEEP or DROP may be used for variable selection.<br />
Variables may be referenced with a subscript-type notation: V(2) means <strong>the</strong> variable in <strong>the</strong> second position<br />
from <strong>the</strong> left of <strong>the</strong> file. KEEP may also be followed by a wildcard <strong>to</strong> reference variables with a<br />
common prefix or suffix, and <strong>the</strong> system variables .NUMERIC., .CHARACTER, .NEW., .ON., and<br />
.OTHERS. .<br />
[ KEEP V(3) Score.? .CHARACTER. ]<br />
[ KEEP .NUMERIC. .OTHERS. ]<br />
RENAME vn TO vn<br />
RETAIN<br />
renames an existing variable with a new variable name<br />
[ RENAME VAR1 TO Age; RENAME VAR2 TO <strong>Inc</strong>ome ]<br />
specifies that <strong>the</strong> current case pass <strong>to</strong> <strong>the</strong> next <strong>PPL</strong> clause, or if <strong>the</strong>re are no additional clauses, <strong>to</strong> <strong>the</strong> current<br />
command. Cases not retained are deleted. RETAIN is generally used as a consequence of an IF test.<br />
(CONTINUE is a synonym for RETAIN.)<br />
SET vnp = exp<br />
recodes an existing numeric or character variable:<br />
[ SET Height = Height / 12 ;<br />
SET City = 'Prince<strong>to</strong>n' ;<br />
SET V(2) = V(1) ]<br />
The expression following <strong>the</strong> equal-sign may be a simple or complex numeric or character expression.<br />
Character constants must be enclosed in single or double quotes.<br />
nn=number variable name/position vn=variable name exp=expression
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.25<br />
<strong>PPL</strong> Opera<strong>to</strong>rs: Logical<br />
Logical opera<strong>to</strong>rs are used in logical selection (IF) clauses. They permit comparisons between two expressions.<br />
The expressions may be ei<strong>the</strong>r both numeric or both character expressions. The evaluation of<br />
<strong>the</strong> comparison is true, false or missing: <strong>the</strong> evaluation is missing when one of <strong>the</strong> expressions is missing.<br />
Ei<strong>the</strong>r <strong>the</strong> character representation of <strong>the</strong> logical opera<strong>to</strong>r ( EQ ) or <strong>the</strong> equivalent symbol ( = ), where it<br />
exists, may be used.<br />
The logical opera<strong>to</strong>rs may be prefaced with “X” for eXact comparisons of character strings — <strong>the</strong>se comparisons<br />
respect <strong>the</strong> case (upper, lower or mixed) of <strong>the</strong> string as well as <strong>the</strong> literal characters. See<br />
Chapter 9, “<strong>PPL</strong>:Date and Time Commands and Functions” for a description of <strong>the</strong> 6 date/time logical<br />
opera<strong>to</strong>rs.<br />
EQ = equal<br />
[ IF City EQ 'Tray', SET City = 'Troy' ]<br />
The result may be true, false or missing. Comparisons of character strings are case independent: “troy”<br />
equals “Troy”. Leading blanks are characters: “ Troy” does not equal “Troy”. O<strong>the</strong>r opera<strong>to</strong>rs that are<br />
supported are: LE, LT, GE, and GT.<br />
XEQ exactly EQ<br />
[ IF Initial XEQ 'R', RETAIN ]<br />
Comparisons of character strings respect case when <strong>the</strong> logical opera<strong>to</strong>r is prefaced with “X”. O<strong>the</strong>r opera<strong>to</strong>rs<br />
that are supported are XLE, XLT, XGE, and XGT.<br />
NE ^= not equal<br />
[ IF Zip NE 11234, DELETE ]<br />
Missing values of <strong>the</strong> variable Zip, in <strong>the</strong> example above, are not deleted. If <strong>the</strong> consequence is TM.DE-<br />
LETE ra<strong>the</strong>r than DELETE, deletion occurs when <strong>the</strong> consequence is ei<strong>the</strong>r true or missing.<br />
XNE exactly NE<br />
[ IF Accept.Reject XNE 'F', SET Score = 'Pass' ]<br />
ALL (vnp list)<br />
tests all <strong>the</strong> values of <strong>the</strong> variables in <strong>the</strong> list:<br />
[ IF ALL ( Test.1 TO Test.5 ) GOOD, RETAIN ]<br />
All <strong>the</strong> relationships must be evaluated as true for <strong>the</strong> clause <strong>to</strong> be true. ALL is equivalent <strong>to</strong> a series of<br />
ANDs.<br />
AMONG (list of values and variables)<br />
tests whe<strong>the</strong>r <strong>the</strong> value of <strong>the</strong> specified variable is among <strong>the</strong> values in <strong>the</strong> list:<br />
[ IF Area AMONG ( 201 609 908 ), SET State = 'NJ' ]<br />
[ IF Name AMONG ( 'A' TO 'Mz' ), TM.DELETE ]<br />
The system variables for missing values (.M., .M1., etc.) may be included in <strong>the</strong> list of values following<br />
AMONG.<br />
vn=variable name exp=expression nn=number variable name/position
XAMONG (list of values and variables)<br />
AND<br />
respects case in testing character values:<br />
[ IF Symp<strong>to</strong>m XAMONG ( 'a', 'A', 'Aa' ), RETAIN ]<br />
links two logical relationships:<br />
[ IF Sex EQ 1 AND Age GE 21, RETAIN ]<br />
Both relationships must evaluate as true, or <strong>the</strong> entire clause is false or missing.<br />
ANY (vnp list)<br />
GOOD<br />
tests <strong>the</strong> values of <strong>the</strong> variables in <strong>the</strong> list, until one relationship is evaluated as true:<br />
[ IF ANY ( Test.1 TO Test.5 ) LT 65, RETAIN ]<br />
ANY is equivalent <strong>to</strong> a series of ORs.<br />
tests for good (non-missing) values. GOOD combines “=” with .G.. , <strong>the</strong> system value for good values.<br />
The following are equivalent:<br />
[ IF ID GOOD , RETAIN ]<br />
[ IF ID EQ .G. , RETAIN ]<br />
INRANGE ( exp, exp )<br />
MISSING<br />
tests whe<strong>the</strong>r an expression is within <strong>the</strong> range expressed by <strong>the</strong> first (low) value and <strong>the</strong> second (high<br />
value).<br />
[ IF TestScore INRANGE [ 91, 100 ], SET Grade = 'A' ]<br />
tests for missing (non-good) values. MISSING combines “=” with .M. , <strong>the</strong> system value for missing or<br />
non-good values. The following are equivalent:<br />
[ IF ID MISSING , DELETE ]<br />
[ IF ID EQ .M. , DELETE ]<br />
NOTAMONG (list of values and variables)<br />
tests whe<strong>the</strong>r <strong>the</strong> values of <strong>the</strong> specified variable are not among <strong>the</strong> values in <strong>the</strong> list:<br />
[ IF Age NOTAMONG ( 5, 7 TO 10 ), DELETE ]<br />
[ IF Sex NOTAMONG ('f', 'female'), DELETE ]<br />
In <strong>the</strong> above examples, cases with values not among <strong>the</strong> specified values are deleted. Cases with missing<br />
values are retained.<br />
XNOTAMONG (list of values and variables)<br />
OR<br />
respects case in testing character string values.<br />
links two logical relationships:<br />
[ IF Sex EQ 2 OR Age LT 21, DELETE ]<br />
Only one of <strong>the</strong> relationships need be true for <strong>the</strong> entire clause <strong>to</strong> be true.
<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.27<br />
OUTRANGE ( exp, exp )<br />
tests whe<strong>the</strong>r an expression is outside that range specified by <strong>the</strong> two expressions.<br />
[ IF Age OUTRANGE ( 19, 65 ), DELETE ]<br />
<strong>PPL</strong> Opera<strong>to</strong>rs: Numeric<br />
Numeric (arithmetic) opera<strong>to</strong>rs are used between numeric values. Paren<strong>the</strong>ses (as well as nested paren<strong>the</strong>ses)<br />
indicate <strong>the</strong> desired order of operations. When paren<strong>the</strong>ses are not used, <strong>the</strong> precedence or order<br />
of operations is: exponentiation, multiplication and division, addition and subtraction. When <strong>the</strong>re is a<br />
series of additions and subtractions or multiplications and divisions, <strong>the</strong>y are performed from left <strong>to</strong> right.<br />
When <strong>the</strong>re is a series of exponentiations, <strong>the</strong>y are performed from right <strong>to</strong> left.<br />
** exponentiation<br />
[ SET Type = Code ** 2 ]<br />
* multiplication<br />
[ GENERATE Circumference = Pi * Diameter ]<br />
/ division<br />
[ IF V(4) NE 0, SET V(6) = 56089 / V(4) ]<br />
+ addition<br />
[ GENERATE F = ( ( 9/5 ) * C ) + 32 ]<br />
- subtraction<br />
[ SET Commission = .25 * Sales - 5 ]<br />
vn=variable name exp=expression nn=number variable name/position
2.28 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />
nn=number variable name/position vn=variable name exp=expression
3<br />
<strong>PPL</strong>:<br />
MODIFY, PROCESS<br />
and PUT<br />
The previous <strong>PPL</strong> chapter covered <strong>the</strong> basics of data modification using <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong><br />
(<strong>PPL</strong>). This chapter provides information about:<br />
• using <strong>the</strong> MODIFY command <strong>to</strong> save <strong>the</strong> results<br />
• “on-<strong>the</strong>-fly” concatenation and modification of multiple files<br />
• repeating cases in a file using <strong>the</strong> REPEAT instruction<br />
• <strong>the</strong> instructions GOTO, PUT, PUTL, QUITFILE, QUITCOMMAND, QUITRUN.<br />
• use of <strong>the</strong> PROCESS and standalone <strong>PPL</strong> commands.<br />
3.1 FILE MODIFICATION<br />
Files are typically modified <strong>to</strong> “clean data” — that is, <strong>to</strong> detect and correct errors, and <strong>to</strong> select and possibly transform<br />
<strong>the</strong> variables needed for analysis. While it is <strong>the</strong>oretically possible <strong>to</strong> code and enter data and <strong>to</strong> make a file<br />
that is satisfac<strong>to</strong>ry for a series of runs, it is unlikely <strong>to</strong> happen. Sometimes <strong>the</strong>re are variables in <strong>the</strong> data which<br />
contain more information than is needed. O<strong>the</strong>r times, <strong>the</strong> variables desired are logical transformations of one or<br />
more of <strong>the</strong> original variables.<br />
If <strong>the</strong> data values are modified during <strong>the</strong> initial stages, <strong>the</strong> original information is no longer readily available.<br />
Thus, it is generally best <strong>to</strong> enter all <strong>the</strong> original data and <strong>the</strong>n, using data modifications and selections, create a<br />
second file that contains only <strong>the</strong> necessary modified variables. It is easier <strong>to</strong> search <strong>the</strong> first P-<strong>STAT</strong> file for any<br />
information that is needed later, than it is <strong>to</strong> go back <strong>to</strong> <strong>the</strong> original input records or coding sheets.<br />
A common sequence in readying a file for analysis is <strong>to</strong> make a P-<strong>STAT</strong> file, examine it, and <strong>the</strong>n <strong>to</strong> clean it<br />
up by modifying it as necessary. Appropriate variables are selected and changed in<strong>to</strong> <strong>the</strong> desired form. New variables<br />
are generated. Consistency checks for possible errors are made.<br />
For example, <strong>the</strong> variable Age, coded in years, could be collapsed in<strong>to</strong> five-year age groups. At <strong>the</strong> same time,<br />
a new variable, Age.Sex.Groups, can be generated with four categories: men under 30, men 30 and over, women<br />
under 30, and women 30 and over. A consistency check can be made <strong>to</strong> see that no males have had pregnancies,<br />
and inconsistent data can be converted <strong>to</strong> one of <strong>the</strong> missing values.<br />
The goal is <strong>to</strong> obtain a good file that can be saved and used as <strong>the</strong> basis for <strong>the</strong> rest of <strong>the</strong> analyses. After data<br />
cleaning, <strong>the</strong> number of transformations and selections needed for any given analysis is minimized. For example,<br />
you may wish <strong>to</strong> select only women for some runs and only respondents with good (non-missing) values on particular<br />
variables for o<strong>the</strong>r runs. Or you may want <strong>to</strong> use <strong>the</strong> natural log of <strong>the</strong> variable <strong>Inc</strong>ome. These selections<br />
and transformations may be done “on-<strong>the</strong>-fly”, as <strong>the</strong> analysis proceeds.<br />
3.2 How Modifications Are Processed<br />
When a P-<strong>STAT</strong> command program reads a case of data, it calls a system routine that reads P-<strong>STAT</strong> files. This<br />
routine reads a case of data, applies any specified modifications <strong>to</strong> <strong>the</strong> case, and sends <strong>the</strong> modified case of data<br />
<strong>to</strong> <strong>the</strong> calling program. In this example, each case of data received by <strong>the</strong> COUNT command contains only two<br />
variables, Age and Sex, in positions 1 and 2, respectively:
3.2 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />
COUNT Survey [ KEEP Age Sex ] $<br />
The file Survey, which contains 50 variables including Age in position 45 and Sex in position 46, remains<br />
unchanged.<br />
Modification clauses are executed in <strong>the</strong> order in which <strong>the</strong>y appear, except for case (row) selection which is<br />
processed first. Variable selection, followed by additional modification clauses, should include all variables needed<br />
by <strong>the</strong> clauses which follow. If <strong>the</strong>y are not included, an error message states that <strong>the</strong> variable was not found.<br />
Modifications apply <strong>to</strong> input files, not <strong>to</strong> output files. The modification clauses (enclosed in brackets) follow<br />
directly after <strong>the</strong> input filename:<br />
MODIFY TestFile [ KEEP ID TO Occupation ], OUT TestNew $<br />
LIST TestNew [ CASES 1 TO 10 ] $<br />
When an output file is produced, any modifications made <strong>to</strong> <strong>the</strong> input cases are reflected in it. File TestNew has<br />
only <strong>the</strong> variables ID through Occupation. The listing of TestNew shows only <strong>the</strong> first ten cases even though all<br />
<strong>the</strong> cases from file TestFile are represented in file TestNew.<br />
3.3 Temporary Modifications<br />
In P-<strong>STAT</strong>, temporary modifications may be done <strong>to</strong> any file when it is read by any P-<strong>STAT</strong> command. However,<br />
unless output files are created, <strong>the</strong>se modifications are not saved in new P-<strong>STAT</strong> files. They are not available for<br />
use during <strong>the</strong> remainder of <strong>the</strong> run or in a subsequent run. For example:<br />
SURVEY S1099<br />
[ SET <strong>Inc</strong>ome = <strong>Inc</strong>ome / 100 ] ;<br />
When <strong>the</strong> cases of data in file S1099 are sent <strong>to</strong> <strong>the</strong> SURVEY command, <strong>the</strong> values of <strong>the</strong> variable <strong>Inc</strong>ome are<br />
divided by 100. SURVEY uses <strong>the</strong>se new values for <strong>Inc</strong>ome when processing <strong>the</strong> data. However, <strong>the</strong> values in<br />
<strong>the</strong> input file S1099 remain in <strong>the</strong>ir original form. If you wish <strong>to</strong> do ano<strong>the</strong>r operation with <strong>Inc</strong>ome similarly modified,<br />
<strong>the</strong> modification must be done again:<br />
LIST S1099<br />
[ SET <strong>Inc</strong>ome = <strong>Inc</strong>ome / 100 ] $<br />
These modifications are sometimes called “on-<strong>the</strong>-fly” modifications because <strong>the</strong>y are done at <strong>the</strong> spur of <strong>the</strong> moment<br />
or just as <strong>the</strong>y are needed. This on-<strong>the</strong>-fly modification:<br />
SURVEY Families<br />
[ GEN Family.<strong>Inc</strong>ome = Fa<strong>the</strong>rs.<strong>Inc</strong>ome + Mo<strong>the</strong>rs.<strong>Inc</strong>ome] ;<br />
STUB Social.Class, BANNER Children, MEANS Family.<strong>Inc</strong>ome $<br />
creates <strong>the</strong> new variable, Family.<strong>Inc</strong>ome, which exists only as each case of <strong>the</strong> file is passed <strong>to</strong> <strong>the</strong> SURVEY command.<br />
It is not available for use after exiting from SURVEY.<br />
3.4 Permanent Modifications and <strong>the</strong> MODIFY Command<br />
If a file with <strong>the</strong> same modifications is <strong>to</strong> be used over and over, it makes sense <strong>to</strong> do <strong>the</strong> modifications only once<br />
and save <strong>the</strong> results as a new P-<strong>STAT</strong> file. An output file reflects <strong>the</strong> modifications done <strong>to</strong> <strong>the</strong> input file or files<br />
used <strong>to</strong> create it. This is true whe<strong>the</strong>r <strong>the</strong> output file comes from <strong>the</strong> MODIFY, CONCAT, SORT or LOOKUP<br />
commands, or any o<strong>the</strong>r P-<strong>STAT</strong> command which produces output files.<br />
The modification procedure is <strong>the</strong> same, regardless of which command is used. However, only <strong>the</strong> MODIFY<br />
command processes <strong>the</strong> specified commands without doing anything else but producing an output file. (The<br />
SORT command sorts <strong>the</strong> cases in addition <strong>to</strong> producing an output file; <strong>the</strong> CONCAT command joins several files<br />
in producing <strong>the</strong> output file, and so on.)<br />
The MODIFY command produces an output file when <strong>the</strong> identifier OUT is used. It also produces a description<br />
file when <strong>the</strong> identifier DES is used. It describes <strong>the</strong> modified or output file. MODIFY usually requires <strong>the</strong>
<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.3<br />
name of <strong>the</strong> input file <strong>to</strong> be modified, various modification clauses, and <strong>the</strong> identifier OUT followed by a name<br />
for <strong>the</strong> new output file. The modifications are permanent because <strong>the</strong>ir results are contained in <strong>the</strong> output file,<br />
which can be used throughout <strong>the</strong> remainder of <strong>the</strong> run and in subsequent runs.<br />
The MODIFY command reads and writes cases of data. Since it receives each case after any modifications<br />
have been done, <strong>the</strong> cases that are written out reflect all changes and selections. The first phrase begins with <strong>the</strong><br />
command name MODIFY and includes <strong>the</strong> input filename and all <strong>the</strong> modification clauses which are <strong>to</strong> be applied<br />
<strong>to</strong> that file. It is <strong>the</strong> comma following <strong>the</strong> final modification clause which signals that <strong>the</strong> phrase is completed.<br />
The format of a phrase is:<br />
MODIFY FileName [ ; ; ; ] [ ; ],<br />
Figure 3.1 illustrates using <strong>the</strong> MODIFY command <strong>to</strong> produce a new output file, which is a permanent modification<br />
of <strong>the</strong> input file. The MODIFY command has two phrases. The first is <strong>the</strong> command MODIFY and its<br />
argument — <strong>the</strong> name of <strong>the</strong> input file followed by <strong>the</strong> modification clauses. The second is “OUT S1099B”,<br />
which supplies <strong>the</strong> name <strong>to</strong> be given <strong>to</strong> <strong>the</strong> output file.<br />
__________________________________________________________________________<br />
Figure 3.1 Permanent Modifications<br />
MODIFY S1099<br />
[ GENERATE Coded.Age;<br />
SET Occupation = Occupation / 100 ;<br />
SET Coded.Age = INT ( Age / 10 ) ;<br />
KEEP Occupation Sex Coded.Age Race Children Siblings ],<br />
OUT S1099B $<br />
__________________________________________________________________________<br />
The modification clauses create <strong>the</strong> new variable Coded.Age, modify Occupation and Coded.Age, and select<br />
specific variables. The cases written in <strong>the</strong> new output file S1099B contain only six variables, including <strong>the</strong> new<br />
values for variable Occupation and <strong>the</strong> newly created variable Coded.Age. At this point, files S1099 and S1099B<br />
are both available <strong>to</strong> any P-<strong>STAT</strong> commands that follow. S1099 contains all <strong>the</strong> original data. S1099B contains<br />
<strong>the</strong> modified data.<br />
3.5 TEMPLATE Files<br />
A template file may be given <strong>to</strong> <strong>the</strong> MODIFY command <strong>to</strong> select <strong>the</strong> desired variables and, perhaps, <strong>to</strong> specify a<br />
changed ordering of those variables. It has much <strong>the</strong> same effect as a KEEP phrase ending <strong>the</strong> <strong>PPL</strong>.<br />
MODIFY Class89<br />
[ IF ANY ( V(1) .ON. ) MISSING, DELETE ],<br />
TEMPLATE Class88,<br />
OUT Classes $<br />
Figure 3.2 shows an input file, a template file, a MODIFY command and <strong>the</strong> resulting output file. The output<br />
file contains all <strong>the</strong> variable in <strong>the</strong> template file in <strong>the</strong> order of <strong>the</strong> template file. Because variable “b” in file Testfile<br />
is not one of <strong>the</strong> variables in <strong>the</strong> template, it is not moved <strong>to</strong> <strong>the</strong> output file. Because variable “d” is a template<br />
file variable that is not present in file Testfile, it is set <strong>to</strong> missing for all <strong>the</strong> cases in <strong>the</strong> output file.<br />
The IF test deletes any case that is missing on any variable in <strong>the</strong> input file. Therefore, <strong>the</strong> second case is not<br />
written <strong>to</strong> <strong>the</strong> output file even though <strong>the</strong> only missing value is on variable “b” which is not one of <strong>the</strong> variables<br />
in <strong>the</strong> template file.
3.4 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />
__________________________________________________________________________<br />
Figure 3.2 Template Files<br />
FILE Testfile<br />
a b c<br />
1 1 1<br />
2 - 2<br />
3 3 3<br />
MAKE Temp, VARS a c d;<br />
$<br />
----------------MAKE completed----------------<br />
| P-<strong>STAT</strong> file temp has been created. |<br />
| It has 0 cases and 3 variables. |<br />
| |<br />
| Two delimiters were used: BLANK and COMMA. |<br />
----------------------------------------------<br />
MODIFY Testfile<br />
[ IF ANY ( V(1) .ON. MISSING, DELETE ],<br />
TEMPLATE Temp,<br />
OUT Newtest $<br />
FILE Newtest<br />
a c d<br />
1 1 -<br />
3 3 -<br />
__________________________________________________________________________<br />
If a file does not already exist with appropriate variable names, a null file, a file with only variable names and<br />
no cases, may be created <strong>to</strong> serve as a template. This MAKE of <strong>the</strong> template file in Figure 3.2 shows both <strong>the</strong><br />
command and <strong>the</strong> report from <strong>the</strong> MAKE command.<br />
Sometimes an additional copy of a file is all that is desired. Here, <strong>the</strong>re are no modification clauses, so file B<br />
is an exact copy of A:<br />
MODIFY A, OUT B $<br />
The number and names of <strong>the</strong> variables, as well as <strong>the</strong> data, are <strong>the</strong> same in both files.<br />
3.6 On-<strong>the</strong>-Fly Concatenation of Files<br />
Multiple files may be read by any P-<strong>STAT</strong> command. The “+” opera<strong>to</strong>r concatenates <strong>the</strong> files “on-<strong>the</strong>-fly” — as<br />
<strong>the</strong>y are read by a command:<br />
MODIFY A + B + C + D, OUT ABCD $<br />
The data from files A, B, C and D are passed <strong>to</strong> <strong>the</strong> MODIFY command, one case after <strong>the</strong> o<strong>the</strong>r.
<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.5<br />
__________________________________________________________________________<br />
Figure 3.3 Renaming All <strong>the</strong> Variables in a File<br />
File F1<br />
VAR1 VAR2 VAR3 VAR4<br />
22 20100 40 f<br />
24 18400 36 f<br />
31 31000 40 m<br />
33 35000 49 f<br />
MAKE Names;<br />
VARS Age <strong>Inc</strong>ome Hours Sex:c;<br />
$<br />
MODIFY Names + F1, OUT F1 $<br />
File F1<br />
Age <strong>Inc</strong>ome Hours Sex<br />
22 20100 40 f<br />
24 18400 36 f<br />
31 31000 40 m<br />
33 35000 49 f<br />
__________________________________________________________________________<br />
On-<strong>the</strong>-fly concatenation of files requires that <strong>the</strong> number of variables be <strong>the</strong> same in all <strong>the</strong> input files, and<br />
that corresponding variables be of <strong>the</strong> same data type (numeric or character). The variables may have different<br />
names. Therefore, it is up <strong>to</strong> you <strong>to</strong> ensure that <strong>the</strong> contents are <strong>the</strong> same. The first file may be a template file,<br />
that is, a file with no cases used only <strong>to</strong> supply variable names for <strong>the</strong> output file. Figure 3.3 illustrates <strong>the</strong> use of<br />
a file with no data records <strong>to</strong> rename <strong>the</strong> variables in an existing P-<strong>STAT</strong> system file.<br />
Often <strong>the</strong> modifications done <strong>to</strong> one file in on-<strong>the</strong>-fly concatenation are necessary for every file. The notation<br />
[ * ] is a shortcut specifying that <strong>the</strong> modifications done <strong>to</strong> <strong>the</strong> previous file be applied <strong>to</strong> <strong>the</strong> current file. This<br />
command:<br />
MODIFY A [ GENERATE Profit = Gross - Expenses ;<br />
KEEP Company TO Zip, Expenses ]<br />
+ B [ GENERATE Profit = Gross - Expenses ;<br />
KEEP Company TO Zip, Expenses ), OUT C $<br />
may be shortened <strong>to</strong> this equivalent one:<br />
MODIFY A [ GENERATE Profit = Gross - Expenses ;<br />
KEEP Company TO Zip, Expenses ]<br />
+ B [ * ], OUT C $<br />
In on-<strong>the</strong>-fly concatenation, when files are referenced without modification or with <strong>the</strong> [ * ] indicating exactly<br />
<strong>the</strong> same modifications, <strong>the</strong> files are treated as one single file. Thus, across-case functions, such as FIRST, LAST,<br />
SPLIT and COLLECT, operate across <strong>the</strong> files. If <strong>the</strong> files have different modifications, across-case functions
3.6 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />
operate within only one file. For example, FIRST and LAST are reset as each file is processed. Across case modification<br />
is discussed in detail in <strong>the</strong> chapter “<strong>PPL</strong>: Across-Case Modifications”.<br />
Sometimes it is useful <strong>to</strong> include <strong>the</strong> cases in a file two or more times. For example, this permits estimation<br />
of <strong>the</strong> time it takes <strong>to</strong> process a large file without actually creating it. On-<strong>the</strong>-fly concatenation of a file with itself<br />
accomplishes this. The file is reread, in effect, joined <strong>to</strong> itself:<br />
MODIFY Y + Y + Y, OUT C $<br />
The MODIFY command receives every case of file Y, followed by every case of Y a second time, followed by<br />
every case of Y a third time.<br />
3.7 Repeating Cases<br />
Cases in a P-<strong>STAT</strong> system file may be read more than once using <strong>the</strong> REPEAT instruction. REPEAT is followed<br />
by an integer, a variable that has an integer value, or an expression that reduces <strong>to</strong> an integer. This instruction<br />
repeats each case in <strong>the</strong> file five times:<br />
[ REPEAT 5 ]<br />
A case is repeated at <strong>the</strong> point REPEAT is encountered in a sequence of <strong>PPL</strong> instructions. Thus, some instructions<br />
may precede <strong>the</strong> REPEAT and some may follow it. The system variable .N., which is <strong>the</strong> case number,<br />
is not changed by repetition. The system variable .HERE., which is <strong>the</strong> count of cases processed at a given point,<br />
is changed by repetition. Thus, it is possible <strong>to</strong> test both <strong>the</strong> input case number and <strong>the</strong> output case number. Figure<br />
3.4 illustrates this procedure with a LIST command.<br />
__________________________________________________________________________<br />
Figure 3.4 Repeating Cases<br />
LIST Subjects<br />
[ CASES 1 TO 3 ; REPEAT 2;<br />
GEN Input.Case = .N., GEN Output.Case = .HERE. ;<br />
IF MOD (Output.Case, 2) = 0,<br />
SET Test.1 = .M., SET Test.2 = .M., SET Test.3 = .M. ] $<br />
Test Test Test Input Output<br />
ID Age Sex .1 .2 .3 Case Case<br />
785001 1 1 94 89 97 1 1<br />
785001 1 1 - - - 1 2<br />
785002 2 1 78 82 85 2 3<br />
785002 2 1 - - - 2 4<br />
785006 1 1 71 70 75 3 5<br />
785006 1 1 - - - 3 6<br />
__________________________________________________________________________<br />
In Figure 3.4, <strong>the</strong> MOD function, which does modulo arithmetic, is used <strong>to</strong> test for an even number — that<br />
is, a second case. Test values are set <strong>to</strong> missing in <strong>the</strong>se cases. Second semester test results could be added <strong>to</strong><br />
<strong>the</strong>se cases as <strong>the</strong>y become available. (Functions and system variables are discussed fully in later chapters of this<br />
manual.) REPEAT may be used with any o<strong>the</strong>r <strong>PPL</strong> instructions and functions except after IF, and in <strong>the</strong> same<br />
command as SPLIT, COLLECT, FIRST or LAST.<br />
A file of random data may be generated by building a file with just one case and repeating it as many times<br />
as desired. Here, <strong>the</strong> single case in <strong>the</strong> file Random is repeated 100 times:
<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.7<br />
MOD Random<br />
[ REPEAT 100 ;<br />
SET Random.Number = ( RANNORM (0) * 2.8 ) + 24 ],<br />
OUT RandomX $<br />
Then, <strong>the</strong> single variable Random.Number is set equal <strong>to</strong> a random number generated by <strong>the</strong> RANNORM function.<br />
The random number is multiplied by 2.8 and added <strong>to</strong> 24 <strong>to</strong> shift <strong>the</strong> standard deviation of 1 and <strong>the</strong> mean<br />
of 0 <strong>to</strong> <strong>the</strong>se values. (Random numbers functions are discussed in more detail later in this manual.)<br />
Simple integer weighting of cases may be done using REPEAT. These instructions weight younger respondents<br />
(cases) twice as much as older ones:<br />
[ GEN #Wt = 1 ;<br />
IF Age LT 21, SET #Wt = 2 ;<br />
REPEAT #Wt ]<br />
The scratch (temporary) variable #Wt is generated equal <strong>to</strong> 1. It it reset <strong>to</strong> 2 for younger respondents. A case is<br />
repeated as many times as <strong>the</strong> value of #Wt for that case. (Scratch variables are explained in more detail in <strong>the</strong><br />
discussion of across case modification later in this manual.)<br />
Weighting is often done when different population subgroups have been sampled and <strong>the</strong>y are not representative<br />
of <strong>the</strong>ir real proportion in <strong>the</strong> population. Using REPEAT is not necessarily good weighting technique,<br />
because only integer weighting of each case is possible. Non-integer weighting is done using <strong>the</strong> WEIGHT identifier<br />
and a weighting variable in commands such as COUNTS and SURVEY. The WEIGHT identifier permits<br />
appropriate fractional weights <strong>to</strong> be applied <strong>to</strong> each case of data as it is processed by a command. Also, using<br />
WEIGHT is faster than using REPEAT <strong>to</strong> weight cases. See <strong>the</strong> description of <strong>the</strong> BALANCE command, which<br />
computes weights for sample balancing, in <strong>the</strong> manual “P-<strong>STAT</strong>: The SURVEY, BALANCE and SAMPLE<br />
Commands:”.<br />
3.8 OTHER INSTRUCTIONS AFTER IF<br />
Of <strong>the</strong> instructions that specify <strong>the</strong> action <strong>to</strong> take as a consequence of an IF test, RETAIN, DELETE, SET, IN-<br />
CREASE and DECREASE, are <strong>the</strong> most useful and common. (These are fully explained in <strong>the</strong> second of <strong>the</strong> <strong>PPL</strong><br />
chapters.) However, <strong>the</strong>re are additional instructions that may ei<strong>the</strong>r follow an IF test as possible consequences,<br />
or, in some situations, used alone.<br />
3.9 GOTO To Process Modifications Selectively<br />
The GOTO instruction (GO TO, with a blank between <strong>the</strong> two words, is a synonym) permits modification clauses<br />
<strong>to</strong> be conditionally omitted or repeated. Clauses are generally omitted when control transfers <strong>to</strong> clauses after <strong>the</strong><br />
current one (downward), and repeated when control transfers <strong>to</strong> clauses before <strong>the</strong> current one (upward). The only<br />
constraint is that <strong>the</strong> omitted section cannot contain phrases such as KEEP, DROP, or GENERATE that change<br />
<strong>the</strong> number or order of <strong>the</strong> variables.<br />
GOTO is generally <strong>the</strong> consequence of an IF test:<br />
[ IF Sex EQ 1, GOTO Male;<br />
GOTO is followed by <strong>the</strong> label of <strong>the</strong> clause <strong>to</strong> which control is <strong>to</strong> pass. If <strong>the</strong> value of Sex is 1, control passes <strong>to</strong><br />
<strong>the</strong> modification clause beginning with <strong>the</strong> label “Male:”:<br />
Male: IF Live.Births GOOD, ... ;<br />
The label must be followed by a colon (:) and it must begin with a modification clause. Figure 3.5 illustrates using<br />
GOTO <strong>to</strong> execute different modification clauses for males and females.<br />
Labels may be followed by an instruction or <strong>the</strong>y may be null labels (not followed by an instruction) as in “Next:<br />
]”, <strong>the</strong> last clause in Figure 3.5. Null labels often provide a destination. Labels may also be simply informative, as<br />
in “Female: ... ”, <strong>the</strong> third clause in Figure 3.5, and not a destination.
3.8 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />
__________________________________________________________________________<br />
Figure 3.5 Using GOTO and PUT<br />
MODIFY People<br />
[ IF Region = 3, RETAIN;<br />
IF Sex = 1, GOTO Male ]<br />
[ Female: IF Occupation AMONG ( 43, 56 TO 59, 72, 78 ),<br />
PUT 'Check Occupation of ' Name '. Occupation: ' Occupation ;<br />
IF Military.Service = 4,<br />
PUT 'Check Service Record of ' Name '.' ;<br />
GOTO Next ]<br />
[ Male: IF Live.Births GOOD,<br />
PUT 'Invalid Live.Births of ' Live.Births ' for ' Name '.',<br />
SET Live.Births = .M3. ;<br />
SET Military.Service = NCOT ( Military.Service, 2 ) ]<br />
[ Next: ], OUT People2 $<br />
__________________________________________________________________________<br />
Using GOTO makes for clearer logic when <strong>the</strong>re are a series of modifications <strong>to</strong> be performed as <strong>the</strong> result of<br />
a specific IF test. In Figure 3.5, left and right brackets have been used within <strong>the</strong> <strong>PPL</strong> <strong>to</strong> emphasize <strong>the</strong> structure<br />
of an IF, a section for cases coded as female, a section for cases coded as male, and a final section.<br />
3.10 Cleaning Data With PUT<br />
The PUT instruction prints informative messages and <strong>the</strong> values of cited variables. This is useful when cleaning<br />
up data prior <strong>to</strong> an analysis or when constructing a report. There is a complete list of <strong>the</strong> PUT control words in<br />
<strong>the</strong> summary section at <strong>the</strong> end of this chapter.<br />
PUT prints text or error messages and <strong>the</strong> values of <strong>the</strong> variables specified:<br />
[ Female: IF Occupation AMONG ( 43, 56 TO 59, 72, 78 ),<br />
PUT Name Occupation ]<br />
If Occupation is equal <strong>to</strong> 57, for example, <strong>the</strong> text and variable values are printed. The message strings, with <strong>the</strong><br />
values of <strong>the</strong> variables Name and Occupation inserted, appear on <strong>the</strong> current output device.<br />
The text is supplied as character strings enclosed in angle brackets or in single or double quotes. It is usually<br />
easier <strong>to</strong> check <strong>the</strong> text for a proper beginning and end when angle brackets are used instead of <strong>the</strong> quotes. You<br />
can see it better and so can <strong>the</strong> scanning program. Use of <strong>the</strong> angle brackets reduces <strong>the</strong> chances for error and is,<br />
<strong>the</strong>refore, highly recommended.<br />
When variable names are used outside of a text string, <strong>the</strong> values are substituted at that point. For example:<br />
Check occupation of Sandy Sweet. Occupation: 57<br />
Using .ALL. causes all <strong>the</strong> variables of a case <strong>to</strong> be written, one after <strong>the</strong> o<strong>the</strong>r. Trailing blanks (on <strong>the</strong> right end)<br />
of <strong>the</strong> text are removed from character values. Note that variable names are not enclosed in quotes or angle brackets.<br />
When you wish <strong>to</strong> use an expression that is more complex than a variable name enclose it in paren<strong>the</strong>ses:<br />
[ PUT ( GNP / 1000. );<br />
Figure 3.5 illustrates using PUT with IF tests and GOTOs <strong>to</strong> locate a series of cases with miscodings and obtain<br />
a printed list of <strong>the</strong> errors. A new output file, containing corrections and recoded values, is also produced.<br />
PUT is commonly used after an IF test, although it may also be used by itself.
<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.9<br />
See also <strong>the</strong> section later in this manual on using .PUT., <strong>the</strong> system variable which keeps a count of <strong>the</strong> puts.<br />
It is possible <strong>to</strong> create a new file containing only <strong>the</strong> cases with questionable values on <strong>the</strong> tested variables. The<br />
values in this error file may <strong>the</strong>n be corrected and <strong>the</strong> UPDATE command used <strong>to</strong> produce a corrected master file.<br />
It is also possible <strong>to</strong> print text without creating any output file. The PROCESS command works just like MODI-<br />
FY, but produces no output file. It is used merely <strong>to</strong> process <strong>PPL</strong> instructions, such as those that produce text<br />
describing erroneous variable values.<br />
3.11 Report Writing Using PUT and PUTL<br />
The PUT instruction writes reports by putting text and variables at specified locations in <strong>the</strong> output line. PUT is<br />
used in <strong>PPL</strong> clauses; thus, a report may be produced any time a file is read by any command.<br />
Figure 3.6 illustrates <strong>the</strong> use of PUT for reports. PUT and any o<strong>the</strong>r <strong>PPL</strong> are enclosed in brackets which follow<br />
<strong>the</strong> filename. Since no output file is desired, <strong>the</strong> PROCESS command is used here <strong>to</strong> process <strong>the</strong> <strong>PPL</strong><br />
instructions.<br />
Text <strong>to</strong> be put in <strong>the</strong> output line is enclosed in paired angle brackets:<br />
.... PUT @5 ....<br />
although single or double quotes can also be used:<br />
.... PUT @5 'The claim for ' ....<br />
The column pointer, <strong>the</strong> “at sign” (@), specifies a column location. “@5” specifies that <strong>the</strong> next string or<br />
expression is <strong>to</strong> be placed starting in column 5. When a column location is not given, <strong>the</strong> text begins in column 1.<br />
Expressions must be placed within paren<strong>the</strong>ses. A variable name by itself or a scratch variable name (like<br />
#Count) is used without paren<strong>the</strong>ses:<br />
[ .... (First.Name /// Last.Name) .... ;<br />
[ .... 'A check for $' Amt.Due .... ;<br />
The value of First.Name is concatenated with <strong>the</strong> value of Last.Name and placed in <strong>the</strong> output line. (The concatenated<br />
names are one expression, and thus need paren<strong>the</strong>ses.) The value of Amt.Due is placed in <strong>the</strong> output line<br />
after <strong>the</strong> appropriate text.<br />
PUT may be an instruction by itself:<br />
PUT > ;<br />
or it may follow an IF test:<br />
IF Claim.Num MISSING OR Claim.Amt MISSING,<br />
PUT @5 (First.Name /// Last.Name )<br />
>, GOTO End;<br />
The first PUT places a blank line in <strong>the</strong> output. The second PUT is done only when <strong>the</strong> IF test is true. If ei<strong>the</strong>r<br />
Claim.Num or Claim.Amt is missing, <strong>the</strong> specified text is put at column 5 in <strong>the</strong> output line, and control passes <strong>to</strong><br />
<strong>the</strong> <strong>PPL</strong> clause with <strong>the</strong> label “End:”. The GOTO is also a consequence of <strong>the</strong> IF because <strong>the</strong> PUT phrase ended<br />
with a comma ra<strong>the</strong>r than a semicolon.<br />
The text produced by PUT continues on<strong>to</strong> subsequent lines as needed. The current line is printed when a given<br />
PUT finishes, unless <strong>the</strong> PUT ended with an @. In this event, <strong>the</strong> next PUT continues on <strong>the</strong> same line:<br />
Thus, subsequent text:<br />
IF Deduct.Amt GT 0,<br />
T.PUT @5 Deduct.Amt @ ,<br />
F.PUT @5 @ )<br />
PUT > Policy.Num <br />
(First.Name /// Last.Name) @ ;
3.10 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />
__________________________________________________________________________<br />
Figure 3.6 Using PUT To Produce a Report<br />
File Insurance:<br />
First Policy Deduct Claim Claim<br />
Name Last Name Num Amt Num Amt<br />
Sharon Wilson 8090564 250 6024 654.25<br />
Claire Mc Donald 7035631 500 8122 -<br />
Neil Haroldson 7469421 0 1005 490.56<br />
The Commands<br />
OUTPUT.WIDTH 70 $<br />
PROCESS Insurance<br />
[ GENERATE Amt.Due = Claim.Amt - Deduct.Amt ;<br />
PUT > ;<br />
IF Claim.Num MISSING OR Claim.Amt MISSING,<br />
PUT @5 ( First.Name /// Last.Name )<br />
>, GOTO End ;<br />
The Report<br />
IF Deduct.Amt GT 0,<br />
T.PUT @5 Deduct.Amt @ ,<br />
F.PUT @5 @ ;<br />
PUT > Policy.Num <br />
(First.Name /// Last.Name) @ ;<br />
PUT Amt.Due<br />
> Claim.Num<br />
> Claim.Amt ;<br />
End: ] $<br />
There is a deductible amount of $250 on Policy Number 8090564,<br />
issued <strong>to</strong> Sharon Wilson. A check for $404.25 is required in payment<br />
of Claim Number 6024 for $654.25.<br />
The claim for Claire Mc Donald is awaiting additional<br />
information.<br />
There is no deductible on Policy Number 7469421, issued <strong>to</strong> Neil<br />
Haroldson. A check for $490.56 is required in payment of Claim<br />
Number 1005 for $490.56.<br />
__________________________________________________________________________
<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.11<br />
follows directly after this text with no intervening spaces. Column references may position <strong>the</strong> column pointer<br />
both backwards and forwards on <strong>the</strong> line. Thus, <strong>the</strong> column reference @40 may be followed by @1, and text will<br />
be placed in column 40 and <strong>the</strong>n in column 1 of <strong>the</strong> same line. The reference @NEXT positions text at <strong>the</strong> beginning<br />
of <strong>the</strong> next line.<br />
System variables may be referenced with PUT:<br />
PUT 'Dated: ' .DATE. ;<br />
System variables do not need <strong>to</strong> be enclosed in paren<strong>the</strong>ses. The current value of <strong>the</strong> system variable .DATE. is<br />
output in <strong>the</strong> report. Using an IF test for <strong>the</strong> first case results in <strong>the</strong> date being output only once, ra<strong>the</strong>r than for<br />
each case. Character values (o<strong>the</strong>r than quoted text) au<strong>to</strong>matically have blanks trimmed from <strong>the</strong> right end.<br />
PUTL puts <strong>the</strong> variable name, as well as <strong>the</strong> variable value, in <strong>the</strong> output line:<br />
PUTL Policy.Num Last.Name Claim.Num ;<br />
The text is on one line:<br />
Policy.Num = 8090564 Last.Name = Wilson Claim.Num = 6024<br />
unless it extends <strong>to</strong> subsequent lines. Placement of <strong>the</strong> variables on separate lines, centered about <strong>the</strong> equal-sign<br />
in column 22:<br />
Policy.Num = 8090564<br />
Last.Name = Wilson<br />
Claim.Num = 6024<br />
may be requested using @EQUAL22:<br />
PUTL @EQUAL22 Policy.Num Last.Name Claim.Num ;<br />
The column location of <strong>the</strong> equal-sign follows directly after <strong>the</strong> @EQUAL. Use of @EQUAL22:50 places two<br />
labeled values per line.<br />
The following puts all values of <strong>the</strong> case, in variable name = value format, three per line, with <strong>the</strong> equals<br />
placed in positions 22, 44, and 66:<br />
PUTL @EQUAL22:44:66 .ALL.;<br />
__________________________________________________________________________<br />
Figure 3.7 Accessing <strong>the</strong> Variable Name Within a Report<br />
PROCESS Rawdata[ CASE 1;<br />
PUT .file.<br />
@SKIP @3 ( VARNAME (1) ) @20 V(1)<br />
@NEXT @3 ( VARNAME (2) ) @20 V(2)<br />
@NEXT @3 ( VARNAME (3) ) @20 v(3)<br />
@NEXT @3 ( VARNAME (4) ) @20 V(4) @SKIP ]$<br />
Values from Rawdata<br />
Age = 13<br />
<strong>Inc</strong>ome = 1350<br />
Hours = -<br />
Sex = m<br />
__________________________________________________________________________<br />
PUTL refers <strong>to</strong> variables (including scratch variables) by name. However, <strong>the</strong> VARNAME function may be<br />
used <strong>to</strong> label values, printed by PUT, that are referenced by position. Figure 3.7 illustrates <strong>the</strong> VARNAME function<br />
in a PUT using <strong>the</strong> PROCESS command <strong>to</strong> access <strong>the</strong> first case of data. When <strong>the</strong>re are only 4 variables <strong>the</strong>
3.12 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />
brute force usage in Figure 3.7 is possibly acceptable. However, <strong>the</strong> chapter “<strong>PPL</strong>: DO LOOPS and IF-THEN-<br />
ELSE Blocks” provides a much easier way when <strong>the</strong>re are many repetitions <strong>to</strong> be done.<br />
3.12 STANDALONE <strong>PPL</strong> COMMANDS AND PROCESS<br />
Standalone <strong>PPL</strong> commands, which have nei<strong>the</strong>r an input nor an output file, are used <strong>to</strong> work with scratch variables,<br />
<strong>the</strong> P vec<strong>to</strong>r or user-defined arrays. The PROCESS command is used when you need information from a<br />
P-<strong>STAT</strong> system file but do not need an output file. This can also be done by using MODIFY with no output files,<br />
but MODIFY provides a brief report of its activity and PROCESS is silent.<br />
3.13 Scratch Variables and Standalone <strong>PPL</strong><br />
The variable #Wt in <strong>the</strong> previous section was created as a “scratch” variable. This is a variable that does not exist<br />
in a file. Since it is independent of <strong>the</strong> file, it can be set and <strong>the</strong>n used within a case, across cases or, in <strong>the</strong> case<br />
of a permanent scratch variable across commands.<br />
Scratch variable Permanent Scratch Variable<br />
Rules for name #scratch ##scratch<br />
Exists for a single command across commands<br />
The scratch variable can be ei<strong>the</strong>r character or numeric<br />
GEN #Wt = SQRT ( Age/10 + Sex );<br />
GEN ##Study:C23 = “Study 1034: August 1994”<br />
If <strong>the</strong> scratch variable is created for use in later commands, it must have <strong>the</strong> double ## as a prefix. Variables of<br />
this type are often used <strong>to</strong> move information between commands. A scratch variable can be moved in<strong>to</strong> a file as<br />
a regular variable by including it in a KEEP. The initial # or ## is removed <strong>to</strong> create a legal name which must not<br />
conflict with <strong>the</strong> names of o<strong>the</strong>r variables in <strong>the</strong> file.<br />
The MODIFY command is designed <strong>to</strong> take an input file, modify it in some way, and produce an output file<br />
which reflects <strong>the</strong> modifications. The PROCESS command is designed <strong>to</strong> take an input file and use <strong>the</strong> values in<br />
<strong>the</strong> cases <strong>to</strong> create scratch variables for use in subsequent commands or as a vehicle for <strong>the</strong> PUT and PUTL commands<br />
<strong>to</strong> create a report. Standalone <strong>PPL</strong> commands are used <strong>to</strong> manipulate elements such as scratch variables<br />
and system variables that are not associated with a file. #Num works here because this is 1 standalone <strong>PPL</strong>.<br />
GEN #Num = 154362;<br />
PUT #Num > ( SQRT(#Num)) $<br />
The <strong>PPL</strong> keywords that can be used as standalone <strong>PPL</strong> commands are:<br />
1. IF<br />
2. SET, INCREASE and DECREASE<br />
3. GENERATE<br />
4. PUT and PUTL<br />
5. BRANCH<br />
6. DIALOG<br />
7. IF-THEN-ELSE-ENDIF.<br />
8. DO LOOPS<br />
The last three are covered in <strong>the</strong> next chapters. The following are examples of standalone <strong>PPL</strong> commands<br />
PUT SQRT ( 13562 ) $<br />
PUT .DATE. $<br />
GENERATE ##COUNTER = 0 $
<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.13<br />
PUT SQRT ( 13562 ) $<br />
PUT .DATE. $<br />
GENERATE ##COUNTER = 0 $<br />
3.14 The PROCESS Command and More PUT Information<br />
The process command is often used <strong>to</strong> accumulate summary information about <strong>the</strong> input file. This information is<br />
s<strong>to</strong>red in scratch variables or <strong>the</strong> permanent vec<strong>to</strong>r and is <strong>the</strong>n available for subsequent commands.<br />
Figure 3.8 shows <strong>the</strong> use of <strong>the</strong> PROCESS command <strong>to</strong> count <strong>the</strong> <strong>to</strong>tal number of cases in <strong>the</strong> file as well as<br />
<strong>the</strong> number of cases with non-missing data. Scratch variables ##cases and ##good are first created with GENER-<br />
ATE used as a stand-alone command. Both of <strong>the</strong>se variables must be created as permanent scratch variables with<br />
<strong>the</strong> double pound (##) sign so that <strong>the</strong>y will exist across commands.<br />
__________________________________________________________________________<br />
Figure 3.8 PROCESS: Counting Cases<br />
File Testfile<br />
a b c<br />
1 - 1<br />
2 2 2<br />
3 3 3<br />
GEN ##cases = 0, GEN ##GOOD = 0$<br />
. PROCESS Testfile<br />
[ INCREASE ##cases;<br />
IF ALL ( V(1) .ON. ) GOOD, INCREASE ##good; ]<br />
$<br />
PUT ##CASES ><br />
@NEXT <br />
##good > $<br />
File Testfile has 3 cases.<br />
There are 2 cases with no missing data.<br />
__________________________________________________________________________<br />
The PROCESS command increases ##cases as each row is read. ##good is only increased when <strong>the</strong> IF test<br />
is true. When <strong>the</strong> PROCESS command is complete, PUT is used <strong>to</strong> write <strong>the</strong> results. Each time a PUT is executed<br />
it starts on a new line unless <strong>the</strong> “@” sign was used <strong>to</strong> end a previous PUT. Each PUT usually continues across<br />
lines until it is complete unless <strong>the</strong> @NEXT instruction is used <strong>to</strong> cause a line change. @SKIP may be used <strong>to</strong><br />
cause a blank line. @PAGE may be used <strong>to</strong> cause a page change. Many of <strong>the</strong> controls that can be used with <strong>the</strong><br />
TEXTWRITER command can also be used with <strong>the</strong> PUT instruction. See <strong>the</strong> chapter TEXTWRITER for more<br />
details.
3.14 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />
3.15 COMMENTS<br />
Comments may be included ei<strong>the</strong>r between commands or as phrases within <strong>the</strong> <strong>PPL</strong>. A comment begins with /*<br />
and ends with */ . For example.<br />
/* Feb. 16, 2010. Clean <strong>the</strong> data<br />
and generate new variables.<br />
*/<br />
MODIFY Myfile [<br />
/* Age recoded in<strong>to</strong> 3 age groups */<br />
GEN Coded.Age = 1;<br />
IF Age GT 20, SET Coded.Age = 2;<br />
IF Age GT 30, SET Coded Age = 3;<br />
/* Note: This assumes Age is never missing */<br />
], OUT Myfile $<br />
The first of <strong>the</strong> three comments in this example occurs between commands. The o<strong>the</strong>r two comments are in <strong>the</strong><br />
<strong>PPL</strong> of <strong>the</strong> MODIFY command. Once <strong>the</strong> /* is found <strong>the</strong> P-<strong>STAT</strong> executive routines look for <strong>the</strong> terminating */<br />
and <strong>the</strong>n blank out <strong>the</strong> entire area including <strong>the</strong> /* and <strong>the</strong> */. Comments can extend across lines as in <strong>the</strong> example<br />
above or <strong>the</strong>y can be part of a line. For example:<br />
/* List <strong>the</strong> output file */ LIST Myfile $<br />
MODIFY Myfile [<br />
GEN Coded.Age; /* 10 year age groups */ Gen Coded.<strong>Inc</strong>ome; ]<br />
are both legal uses of comments.<br />
Because <strong>the</strong> comments are blanked out when a command is executed, <strong>the</strong>y must be entered in<strong>to</strong> a command<br />
stream using an external edi<strong>to</strong>r. If <strong>the</strong>y are entered interactively <strong>the</strong>y disappear when <strong>the</strong> command is executed.<br />
However, because <strong>the</strong>y can be insert freely both within <strong>the</strong> <strong>PPL</strong> and between commands, <strong>the</strong>y provide an excellent<br />
way <strong>to</strong> document a run. Any thing except <strong>the</strong> terminating characters can be entered in <strong>the</strong> comment:<br />
/* The following group of commands might better<br />
be packaged as a macro and executed by using<br />
RUN Mymacro $<br />
with /* style comments <strong>to</strong> document <strong>the</strong> macro<br />
parameters.<br />
*/<br />
3.16 QUITTING A PROCESS<br />
There are three instructions that cause <strong>the</strong> processing of data <strong>to</strong> s<strong>to</strong>p:<br />
1. QUITFILE requests that processing of <strong>the</strong> current file s<strong>to</strong>p<br />
2. QUITCOMMAND requests that processing of <strong>the</strong> current command s<strong>to</strong>p<br />
3. QUITRUN requests that <strong>the</strong> entire P-<strong>STAT</strong> run s<strong>to</strong>p.<br />
The QUIT instructions are typically used after an IF test, although <strong>the</strong>y may be used in a DO loop or alone.<br />
QUITFILE causes processing of a file <strong>to</strong> s<strong>to</strong>p. Only cases prior <strong>to</strong> this point are passed <strong>to</strong> <strong>the</strong> current command.<br />
QUITFILE causes processing <strong>to</strong> s<strong>to</strong>p if <strong>the</strong> result of <strong>the</strong> IF test is true:<br />
LIST Bonded.Personnel<br />
[ IF Bonded EQ 'Yes' and Prison.Record GT 0, QUITFILE ] $<br />
Only bonded employees with a value of 0 on Prison.Record are listed. If a value greater than 0 is found, only<br />
employees prior <strong>to</strong> that case are listed.
<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.15<br />
QUITCOMMAND causes a command <strong>to</strong> be aborted. If QUITCOMMAND was used in <strong>the</strong> prior example,<br />
no listing would be produced if any employee had a value greater than 0 on Prison.Record. The LIST command<br />
would s<strong>to</strong>p without receiving any cases.<br />
QUITRUN causes an entire P-<strong>STAT</strong> run <strong>to</strong> s<strong>to</strong>p. This is most useful when many commands are executed in<br />
succession, possibly from a transfer file or a macro, or in batch mode. Quitting <strong>the</strong> entire run, ra<strong>the</strong>r than continuing<br />
processing, may save resources if a grave error is encountered.
3.16 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />
<strong>PPL</strong><br />
SUMMARY<br />
Data selection and modification may be done <strong>to</strong> any file as it is read by any P-<strong>STAT</strong> command. They<br />
may not be done <strong>to</strong> output files. When any P-<strong>STAT</strong> command is executed, each case of <strong>the</strong> input file is<br />
read and optionally modified before it is passed <strong>to</strong> <strong>the</strong> current command. The current command operates<br />
on <strong>the</strong> modified data while <strong>the</strong> input file remains unchanged. Thus, <strong>the</strong> modifications are temporary “on<strong>the</strong>-fly”<br />
modifications.<br />
MODIFY<br />
Required:<br />
Permanent modifications are usually done using <strong>the</strong> MODIFY command but may be accomplished with<br />
any command that produces an output file incorporating <strong>the</strong> modifications. MODIFY does no particular<br />
statistical or file maintenance procedures, but it produces an output file of <strong>the</strong> data after all modifications<br />
and selections have been completed.<br />
MODIFY File<br />
[ KEEP ID Age Score ;<br />
GENERATE Coded.Age =<br />
RECODE ( Age, 1 TO 17 = 1, 18 TO 100 = 2 ) ],<br />
OUT New.File $<br />
MODIFY Males [ KEEP Test.ID Time Dexterity ]<br />
+ Females [ * ], OUT Students $<br />
Multiple files may be read by MODIFY, as well as <strong>to</strong> o<strong>the</strong>r commands, using <strong>the</strong> “+” opera<strong>to</strong>r. This produces<br />
“on-<strong>the</strong>-fly concatenation” of <strong>the</strong> files. The files should have <strong>the</strong> same number of variables with<br />
corresponding data types <strong>the</strong> same. If <strong>the</strong> names of <strong>the</strong> variables differ, <strong>the</strong> variable names in <strong>the</strong> first<br />
input file are used. Different <strong>PPL</strong> modification phrases may follow each of <strong>the</strong> input files. If <strong>the</strong> same<br />
modifications are desired, as in <strong>the</strong> second example above, an asterisk in brackets should follow <strong>the</strong> additional<br />
file or files.<br />
MODIFY fn<br />
supplies <strong>the</strong> name of <strong>the</strong> required input file. MODIFY is described in more detail in <strong>the</strong> chapter<br />
“<strong>PPL</strong>:MODIFY and COMPARE”.<br />
Optional Identifiers:<br />
OUT fn<br />
provides a name for <strong>the</strong> requested output file. The output file will reflect <strong>the</strong> input file after all selections<br />
and modifications are performed.<br />
TEMPLATE fn<br />
specifies an input file which indicates <strong>the</strong> variables <strong>to</strong> be selected for <strong>the</strong> output file. Additional variables<br />
are ignored:<br />
fn=file name vn=variable name exp=expression
<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.17<br />
MODIFY Diet2 [ ROWS 50 .ON. ],<br />
TEMPLATE Diet1, OUT Diet3 $<br />
If a file does not already exist with appropriate variable names, a null file, a file with only variable names<br />
and no cases, may be created <strong>to</strong> serve as a template.<br />
STANDALONE <strong>PPL</strong> COMMANDS<br />
PUT @PAGE (CVAL(27)) 'G' 'Bold On' $<br />
When <strong>PPL</strong> instructions do not require any information from a P-<strong>STAT</strong> file <strong>the</strong>y can be used as standalone<br />
commands. No input file is required and no output file is produced. These commands are typically used<br />
<strong>to</strong> pass instructions <strong>to</strong> a printer or <strong>to</strong> set values in <strong>the</strong> permanent vec<strong>to</strong>r, scratch variable or user-defined<br />
arrays. These are often tasks that do not require a P-<strong>STAT</strong> file. In <strong>the</strong> example above <strong>the</strong> decimal value<br />
27 is an ASCII ESCAPE character. On some printers <strong>the</strong> combination of ESCAPE and <strong>the</strong> letter “G” is<br />
a signal <strong>to</strong> use a BOLD font. CVAL, a function which converts a number <strong>to</strong> its character equivalent is<br />
described in <strong>the</strong> chapter on character functions.<br />
The <strong>PPL</strong> instructions that can be used as commands are:IF, SET, INCREASE, DECREASE, GENER-<br />
ATE, PUT, PUTL, IF-THEN-ELSE, DIALOG, BRANCH and DO loops.<br />
PROCESS<br />
Required:<br />
PROCESS Hist123a<br />
[ IF Term.Paper MISSING,<br />
PUT Last.Name ><br />
Paper.Due.Date ] $<br />
The PROCESS command processes <strong>PPL</strong> instructions. No output file is produced. It is typically used<br />
when <strong>the</strong> objective is printed text giving information about <strong>the</strong> values of <strong>the</strong> variables in <strong>the</strong> input file.<br />
PROCESS fn<br />
specifies <strong>the</strong> name of <strong>the</strong> required input file.<br />
<strong>PPL</strong> Instructions<br />
The <strong>PPL</strong> instructions DECREASE, DELETE, DROP, GENERATE, IF, INCREASE, KEEP, RETAIN,<br />
ROWS and SET are explained in <strong>the</strong> second <strong>PPL</strong> chapter. The additional instructions GOTO, PUT,<br />
PUTL, QUITFILE, QUITCOMMAND, QUITRUN and REPEAT are summarized below. The list of instructions<br />
which may follow after an IF test includes:<br />
CONTINUE FOR INCREASE QUITFILE SET<br />
DECREASE GENERATE PUT QUITRUN<br />
DELETE GOTO PUTL QUITCOMMAND<br />
GOTO label<br />
directs that <strong>the</strong> <strong>PPL</strong> processor go ei<strong>the</strong>r up or down <strong>to</strong> wherever <strong>the</strong> <strong>PPL</strong> clause with <strong>the</strong> specified label<br />
is located. The label must be at <strong>the</strong> beginning of a <strong>PPL</strong> clause, and it must be followed by a colon (:) .<br />
The bypassed phrases cannot change <strong>the</strong> number, order, or names of <strong>the</strong> variables (i.e., KEEP, DROP,<br />
exp=expression fn=file name vn=variable name
3.18 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />
QUITFILE<br />
GENERATE), nor can it bypass REPEAT, SPLIT or COLLECT. GOTO may be used following an IF<br />
test:<br />
IF Sex EQ 1, GOTO Male;<br />
The label may be followed by an instruction or it may be a “null” label.<br />
[ GENERATE Fac<strong>to</strong>r;<br />
IF Treatment.Group EQ 'placebo', GOTO Not.Drug;<br />
SET Drug = RECODE ( Drug, 1 TO 3 = 1, G = 2 );<br />
SET Fac<strong>to</strong>r = SUM ( Test1 TO Test2 );<br />
GOTO Next.Test;<br />
Not.Drug: SET Drug = 0, SET Fac<strong>to</strong>r = 0 ;<br />
Next.Test: ; ..... ]<br />
specifies that processing of <strong>the</strong> current file s<strong>to</strong>p. Only cases prior <strong>to</strong> this point are passed <strong>to</strong> <strong>the</strong> command<br />
processor. QUITFILE is commonly used following an IF test.<br />
QUITCOMMAND<br />
QUITRUN<br />
specifies that processing of <strong>the</strong> current command s<strong>to</strong>p. The command is aborted at that point. QUIT-<br />
COMMAND is commonly used following an IF test.<br />
specifies that <strong>the</strong> P-<strong>STAT</strong> run s<strong>to</strong>p. QUITRUN is commonly used following an IF test. The entire run<br />
ends at this point.<br />
REPEAT exp<br />
requests that each case be repeated <strong>the</strong> specified number of times. The argument for repeat should be an<br />
expression (constant, variable, function or combination of <strong>the</strong>se) that reduces <strong>to</strong> an integer. A case is repeated<br />
at that point in <strong>the</strong> <strong>PPL</strong> in which <strong>the</strong> REPEAT instruction is encountered. REPEAT is useful in<br />
generating a set of random data:<br />
MOD Random<br />
[ REPEAT 100;<br />
SET Random.Number = ( RANNORM (0) * 2.8 ) + 24 ],<br />
OUT RandomX $<br />
An initial file with one case (Random) is built and <strong>the</strong>n it is modified <strong>to</strong> generate an output file with 100<br />
cases. The variable Random.Number is set equal <strong>to</strong> random numbers with mean 24 and standard deviation<br />
2.8. REPEAT may not be used as a consequent of an IF, within a DO loop, or within an IF-THEN-<br />
ELSE block. It also cannot be used in conjunction with SPLIT, COLLECT, FIRST or LAST.<br />
PUT AND PUTL CONTROL ELEMENTS<br />
The following printing elements can follow a PUT or PUTL:<br />
1. Age . The name of a variable. Its value will be printed. PUTL will also label it (Age = 22)<br />
2. #name or ##name. A scratch variable, also labelled by PUTL<br />
3. V(3). A variable reference with a constant subscript, also labelled by PUTL<br />
4. .ALL. All <strong>the</strong> values of a case, also labelled by PUTL<br />
5. V(#j+2). PUTL does not label this<br />
fn=file name vn=variable name exp=expression
<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.19<br />
6. P(3) or P(#J+2) or (expression) of any complexity<br />
7. 'string' or “string” or .<br />
These control elements can follow PUT or PUTL.<br />
PUT<br />
@NEXT move <strong>to</strong> <strong>the</strong> next line.<br />
@SKIP=3 write <strong>the</strong> current line and <strong>the</strong>n three blank lines.<br />
@PARA write <strong>the</strong> current line, write a blank line, and indent three positions in <strong>the</strong> new line.<br />
@20 moves <strong>the</strong> pointer (which is where <strong>the</strong> next value will be written) <strong>to</strong> that position.<br />
@PLUS=(5) move <strong>the</strong> pointer that far. This can be an expression.<br />
@MINUS=(3) move <strong>the</strong> pointer back that far.<br />
@ can be used as <strong>the</strong> last element in a PUT or PUTL. The line is not flushed, so <strong>the</strong><br />
next PUT or PUTL statement adds <strong>to</strong> it instead of starting a new line.<br />
@BEFORE=40 causes <strong>the</strong> next value <strong>to</strong> be places so it ends at position 40. The string or value must<br />
be <strong>the</strong> next PUT element.<br />
@PLACES=3 causes succeeding numeric values <strong>to</strong> print with 3 places.<br />
@NOPLACES returns <strong>to</strong> <strong>the</strong> default mode, where integers print without places and fractional values<br />
get some number of places depending on <strong>the</strong> actual value.<br />
@COMMAS inserts commas in<strong>to</strong> <strong>the</strong> integer part of numbers.<br />
@NOCOMMAS turns it off. Default is off.<br />
@LABEL turns PUTL mode on. (@NAME is synonym).<br />
@NOLABEL turns PUTL mode off. PUT default is off. PUTL default is on.<br />
@TRIM default. Trims blanks from <strong>the</strong> right end of a character value.<br />
@NOTRIM print it all<br />
@EQUAL=20 when a labelled value (like Age = 40) is about <strong>to</strong> be written, place <strong>the</strong> = at position<br />
20. @EQUAL=20:40 prints 2 values per line with equal signs at positions 20 and 40.<br />
@NOEQUAL turns it back off.<br />
@MISS='string' use <strong>the</strong> string instead of -, --, or --- <strong>to</strong> represent missing values.<br />
@NOMISS resets <strong>to</strong> -, --, or ---.<br />
positions text and variables at specified column locations in <strong>the</strong> output line. Text strings are enclosed in<br />
quotes and variables are simply cited. paired angle brackets, “” may also be used as string<br />
delimiters in PUT statements.<br />
PUT @3 'The client is ' Name '.' @ ;<br />
Locations are specified with @:<br />
@3 at column 3<br />
@NEXT at <strong>the</strong> start of <strong>the</strong> next line<br />
@SKIP write a blank line and move <strong>to</strong> <strong>the</strong> start of <strong>the</strong> next line<br />
@PAGE issue a page change and move <strong>to</strong> <strong>the</strong> start of <strong>the</strong> first line<br />
A final @, at <strong>the</strong> end of a PUT, holds <strong>the</strong> text output location, so that subsequent text may follow directly<br />
after.<br />
exp=expression fn=file name vn=variable name
3.20 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />
PUTL<br />
PUT is often used after an IF test checking for erroneous data values. PUT specifies error messages <strong>to</strong><br />
print:<br />
[ IF ID MISSING,<br />
PUT Last.Name<br />
SS.Num ]<br />
positions variable names as well as variable values in <strong>the</strong> output line. If @EQUAL22 is used:<br />
[ PUTL @EQUAL22 Name SS.Num ]<br />
<strong>the</strong> variable names and values are listed, one per line, centered on <strong>the</strong> equal-sign in column 22 (or any<br />
o<strong>the</strong>r specified column location). @EQUAL22:52 positions both variable names and values on one line,<br />
<strong>the</strong> first centered on <strong>the</strong> equal sign in column 22 and <strong>the</strong> second centered on <strong>the</strong> equal-sign in column 52.<br />
COMMENTS<br />
/* comments can be inserted in <strong>the</strong> command stream<br />
wherever a command can be found. The initial characters<br />
are <strong>the</strong> /*. The terminating characters are <strong>the</strong> asterisk<br />
followed by <strong>the</strong> slash<br />
*/<br />
LIST Myfile $<br />
Comments can also be used in <strong>the</strong> <strong>PPL</strong> of a command as long as each comment is positioned as a <strong>PPL</strong><br />
phrase and not inserted in <strong>the</strong> middle of such a phrase.<br />
MODIFY Myfile [ /* generate coded variables */ GEN Coded.Age;<br />
GEN Coded.<strong>Inc</strong>ome;<br />
/* Age will be recoded in<strong>to</strong> 10 year groups */<br />
SET Age = .... ]<br />
fn=file name vn=variable name exp=expression
4<br />
<strong>PPL</strong>:<br />
NCOT and RECODE<br />
The previous <strong>PPL</strong> chapters covered <strong>the</strong> basics of data modification using <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong><br />
(<strong>PPL</strong>). This chapter provides information about <strong>the</strong> RECODE and NCOT commands. These commands often<br />
provide <strong>the</strong> easiest way <strong>to</strong> do complex recodes.<br />
Values may be changed <strong>to</strong> different values using ei<strong>the</strong>r <strong>the</strong> RECODE or NCOT functions. Both numeric and<br />
character values may be recoded using RECODE; only numeric values may be changed with NCOT. RECODE<br />
permits any arbitrary changes, including <strong>the</strong> recoding of individual values, ranges of values, missing values, character<br />
strings and extra (“left over”) values. XRECODE permits case sensitive recodes of character data. NCOT<br />
recodes ranges of values, specified with “cutting points”, <strong>to</strong> consecutive constants.<br />
The RECODE function is usually used <strong>to</strong> test a single argument, which may be a variable, a constant or a more<br />
complex expression. However, it can also be used <strong>to</strong> test multiple arguments creating a result which is based on<br />
several different arguments such as setting:<br />
Group=1 when Age lt 30 and Sex eq male and <strong>Inc</strong>ome lt 20000<br />
Group=2 when Age ge 30 and Sex eq male and <strong>Inc</strong>ome lt 20000<br />
Group=3 when Age lt 30 and Sex eq female and <strong>Inc</strong>ome lt 20000, etc.<br />
This multi-argument use of RECODE often replaces a lengthy series of complex IF’s with a single statement that<br />
is both easier <strong>to</strong> read and <strong>to</strong> understand.<br />
4.1 The NCOT Function<br />
NCOT recodes numeric variable values <strong>to</strong> numeric constants. It does an N-way dicho<strong>to</strong>mization or division of <strong>the</strong><br />
values <strong>to</strong> be recoded, using cutting points supplied in <strong>the</strong> NCOT instructions. The cutting points divide <strong>the</strong> values<br />
in<strong>to</strong> groups or ranges of values. The ranges are recoded <strong>to</strong> consecutive integers.<br />
Because NCOT is a function, it begins with a left paren<strong>the</strong>sis and ends with a right paren<strong>the</strong>sis. The first element<br />
following <strong>the</strong> left paren<strong>the</strong>sis is <strong>the</strong> NCOT argument which must be a variable name or an expression. This<br />
is followed by additional arguments giving cutting points for <strong>the</strong> values. Each NCOT argument is separated from<br />
<strong>the</strong> next by a comma. NCOT is designed for use when a numeric variable is <strong>to</strong> be divided in<strong>to</strong> groups based on a<br />
series of ascending values or cutting points. It does not work with character values and it cannot be used for complex<br />
recoding.<br />
The cutting points for NCOT can be fractional values. The one restriction is that <strong>the</strong> cutting points must go<br />
in ascending order, from low (which may be negative) <strong>to</strong> high. Given:<br />
[ SET Hours = NCOT ( Hours, 20, 25, 30, 35, 40, 45, 50 ) ]<br />
everything less than or equal <strong>to</strong> <strong>the</strong> first value (20) becomes a “1”, everything above <strong>the</strong> first value, but not above<br />
<strong>the</strong> second value (25) becomes a “2”, and so on. The final value includes all <strong>the</strong> numbers greater than <strong>the</strong> final<br />
cutting point. Thus, <strong>the</strong> number of possible values is always one more than <strong>the</strong> cutting points.<br />
The NCOT function instructions can be abbreviated fur<strong>the</strong>r by providing a step size:<br />
[ SET Hours = NCOT ( Hours, 20, 50/5 ) ]<br />
The 20 is <strong>the</strong> first cutting point, 50 is <strong>the</strong> last cutting point and 5 is <strong>the</strong> step size. Thus, <strong>the</strong> cutting points are 20,<br />
25, 30, 35, 40, 45 and 50. The instructions:
4.2 <strong>PPL</strong>: NCOT and RECODE<br />
[ SET Hours =<br />
NCOT ( Hours, 20, 50/5, 100/10 ) ]<br />
create cutting points at 20, 25, 30, 35, 40, 45 and 50 (steps of 5), and also at 60, 70, 80, 90 and 100 (steps of 10).<br />
A value of 33, which is between <strong>the</strong> 3rd and 4th cutting point, becomes a “4” and a value of 85, which is between<br />
<strong>the</strong> 10th and 11th cutting point becomes an “11”.<br />
__________________________________________________________________________<br />
Figure 4.1 NCOT: Numeric Recodes<br />
File RawData<br />
Age <strong>Inc</strong>ome Hours Sex<br />
13 1350 - m<br />
22 20100 40 f<br />
24 18400 36 f<br />
31 31000 40 m<br />
33 35000 49 f<br />
37 27000 38 m<br />
42 20000 40 f<br />
49 45000 40 m<br />
50 61000 62 m<br />
55 31000 30 m<br />
62 24000 24 f<br />
73 16000 20 m<br />
MODIFY RawData [ GEN Coded.Age, GEN Coded.<strong>Inc</strong>ome, GEN Coded.Hours;<br />
SET Coded.Age = NCOT ( Age, 25, 40, 55 );<br />
SET Coded.<strong>Inc</strong>ome = NCOT ( <strong>Inc</strong>ome, 10000, 100000 / 10000 );<br />
SET Coded.Hours = NCOT ( Hours, 20, 50/5, 100/10 ) ],<br />
OUT NewData $<br />
File NewData<br />
Coded Coded Coded<br />
Age <strong>Inc</strong>ome Hours Sex Age <strong>Inc</strong>ome Hours<br />
13 1350 - m 1 1 -<br />
22 20100 40 f 1 3 5<br />
24 18400 36 f 1 2 5<br />
31 31000 40 m 2 4 5<br />
33 35000 49 f 2 4 7<br />
37 27000 38 m 2 3 5<br />
42 20000 40 f 3 2 5<br />
49 45000 40 m 3 5 5<br />
50 61000 55 m 3 7 8<br />
55 31000 30 m 3 4 3<br />
62 24000 24 f 4 3 2<br />
73 16000 20 m 4 2 1<br />
__________________________________________________________________________<br />
Figure 4.1 illustrates NCOT with three different patterns. The NCOT of Age provides 3 cutting points and<br />
results in 4 values. Values on Age less than or equal <strong>to</strong> 25 are a 1 in Coded.Age. Values greater than 25 and less
<strong>PPL</strong>: NCOT and RECODE 4.3<br />
than or equal <strong>to</strong> 40 are a 2 in Coded.Age. Values greater than 40 and less than or equal <strong>to</strong> 55 are a 3 in Coded.Age.<br />
And finally any value on Age that is greater than 55 is a 4 in Coded.Age.<br />
Coded.<strong>Inc</strong>ome is variable <strong>Inc</strong>ome in groups of 10,000. Coded.Hours is a more complex pattern with cutting<br />
points at 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100. With 12 cutting points <strong>the</strong>re is a possibility of codes between<br />
1 and 13.<br />
NCOT is a very parsimonious and clear way <strong>to</strong> recode numeric values when cutting points, ei<strong>the</strong>r arbitrary or<br />
patterned is required. When <strong>the</strong> recodes <strong>to</strong> be done are not in ascending order, <strong>the</strong> RECODE function is <strong>the</strong> function<br />
<strong>to</strong> use.<br />
4.2 The RECODE Function: Single Argument Usage<br />
RECODE changes (recodes) numeric or character variables. XRECODE, for eXact recodes, respects <strong>the</strong> case of<br />
characters in recoding <strong>the</strong>m <strong>to</strong> o<strong>the</strong>r characters or <strong>to</strong> numbers. This section describes simple recodes, ones with a<br />
single argument. Multiple-argument recodes are described later.<br />
Because RECODE is a function, it begins with a left paren<strong>the</strong>sis and ends with a right paren<strong>the</strong>sis. The first<br />
element following <strong>the</strong> left paren<strong>the</strong>sis is <strong>the</strong> RECODE argument which must be a variable name or an expression.<br />
This is followed by a series of recoding tests, separated by commas.<br />
[ SET Age =<br />
RECODE ( ROUND (Age), 0 TO 20 = 1, 21 TO 100 = 2 ) ]<br />
The format of <strong>the</strong> single argument RECODE function is:<br />
RECODE [ Argument, test, test, test, ..... ]<br />
The argument may be a variable name or a complex expression. Each recode test is composed of a list of one or<br />
more values followed by an “=” sign and <strong>the</strong> new value that replaces <strong>the</strong> values in <strong>the</strong> list.<br />
value.list = new.value, value.list = new.value, ....<br />
The list may be a single value such as “2”, a range of values such as “3 TO 5”, or a combination of single<br />
values and ranges such as “12 TO 15 33 M2”. The list is followed by an equal sign "=" and <strong>the</strong> new value <strong>to</strong> be<br />
used. Each recode test is separated from <strong>the</strong> next by a comma. After <strong>the</strong> recoding, <strong>the</strong> new values must all be of<br />
one type, ei<strong>the</strong>r numeric or character, and <strong>the</strong>y must be <strong>the</strong> same data type as <strong>the</strong> variable being set or generated.<br />
Single values may be recoded <strong>to</strong> new values:<br />
[ GEN Gender:c;<br />
SET Sex = RECODE ( Sex, 0 = 1, 1 = 2 ) ;<br />
SET Gender = RECODE ( Sex, 1 = 'male', 2 = 'female') ;<br />
SET Gender =<br />
RECODE ( Gender, 'male' = 'boy', 'female' = 'girl' ) ]<br />
The first recode changes <strong>the</strong> values of <strong>the</strong> numeric variable Sex; “zeros” become “ones” and “ones” become<br />
“twos”. The second recode provides a value for a new character variable, Gender. Its values are recodes of <strong>the</strong><br />
numeric values of <strong>the</strong> existing variable Sex. “Ones” become “male” and “twos” become “female”. (Notice that<br />
character strings in <strong>the</strong> recode tests are enclosed in quotes.) The third recode changes <strong>the</strong> values of <strong>the</strong> character<br />
variable Gender; “male” becomes “boy” and “female” become “girl”.<br />
RECODE tests may recode values in any of <strong>the</strong>se combinations: numeric <strong>to</strong> numeric, character <strong>to</strong> character,<br />
numeric <strong>to</strong> character, or character <strong>to</strong> numeric. However, <strong>the</strong> resultant values must all be one data type, and that<br />
type must correspond with that of <strong>the</strong> variable being recoded or generated.<br />
Ranges of values may be recoded <strong>to</strong> new values:<br />
[ SET <strong>Inc</strong>ome = RECODE ( INT (<strong>Inc</strong>ome),<br />
0 TO 50000 = 1, 50001 TO 100000 = 2 ) ]
4.4 <strong>PPL</strong>: NCOT and RECODE<br />
This example recodes <strong>the</strong> integer portion of <strong>the</strong> values of <strong>Inc</strong>ome; values from zero through 50,000 become<br />
“ones”, values from 50,001 through 100,000 become “twos”.<br />
Ranges can also be used with character variables:<br />
[ GEN Session = RECODE ( Last.Name,<br />
'A' TO 'MZZZ' = 1, 'N' TO 'ZZZZ' = 2 ) ]<br />
This example generates <strong>the</strong> numeric variable Session, whose values are based on those of <strong>the</strong> character variable<br />
Last.Name. Cases with last names from “A” through “MZZZ” have “ones” for <strong>the</strong> value of Session, and cases<br />
with names from “N” through “ZZZZ” have “twos”. In RECODE, case does not matter. Therefore, 'a' TO 'MZZZ'<br />
and 'A' TO 'mzzz' are equivalent.<br />
Any non-missing values left over after <strong>the</strong> recoding is complete, may be recoded using G (for Good). There<br />
may be only one G in a RECODE instruction:<br />
[ GENERATE Test.Score =<br />
RECODE ( ROUND ( (Correct / Total) * 100 ),<br />
65 TO 74 = 1, 75 TO 84 = 2, 85 TO 94 = 3,<br />
95 TO 100 = 4, G = 0 ) ]<br />
In this example, <strong>the</strong> RECODE argument is a complex expression — <strong>the</strong> number of correct items (Correct) divided<br />
by <strong>the</strong> <strong>to</strong>tal number of items (Total) and multiplied by 100. That value is rounded <strong>to</strong> a whole number (ROUND)<br />
and recoded <strong>to</strong> <strong>the</strong> specified values. If a non-missing value is not included in <strong>the</strong> recoding tests ( G ), Test.Score<br />
is zero. Thus, a value of 42 yields a Test.Score value of 0. The recode testing is done in a strict left <strong>to</strong> right order.<br />
Therefore, you should put any G= recode AFTER all recode tests that expect a good value.<br />
A common mistake in using RECODE with ranges of values is <strong>to</strong> specify <strong>the</strong> recode tests using integers, without<br />
provision for values which fall between <strong>the</strong> ranges. For example, <strong>the</strong> value 50.5 falls between 50 and 51:<br />
[ GENERATE Score = RECODE ( XA, 1 TO 50 = 1, 51 TO 100 = 2 ) ]<br />
Any such numbers are not recoded. To solve this problem, <strong>the</strong> recode tests may be specified as any of <strong>the</strong><br />
following:<br />
[ GEN Score =<br />
RECODE ( XA, 1 TO 50 = 1, 51 TO 100 = 2, G = M3 ) ]<br />
[ GEN Score =<br />
RECODE ( XA, 1 TO 50 = 1, 50 TO 100 = 2, G = M ) ]<br />
[ GEN Score =<br />
RECODE ( XA, .01 TO 50 = 1, 50.0001 TO 100 = 2, G = 3 ) ]<br />
The first example uses G <strong>to</strong> detect non-missing values that fall between <strong>the</strong> ranges in <strong>the</strong> recode tests — 51.5<br />
yields a value of MISSING3 for Score. The second example uses overlapping ranges <strong>to</strong> avoid gaps in <strong>the</strong><br />
ranges — values of 50 are recoded <strong>to</strong> 1, <strong>the</strong> first test in which 50 appears. The third example specifies all-inclusive<br />
ranges.<br />
It is a good idea <strong>to</strong> also use G in <strong>the</strong> recode tests <strong>to</strong> include any non-missing values that may have been overlooked.<br />
For example, if G is used <strong>to</strong> set any overlooked values <strong>to</strong> MISSING3 (M3), <strong>the</strong> user need only search for<br />
MISSING3 in <strong>the</strong> output <strong>to</strong> locate any values that have not been recoded. When <strong>the</strong> recoding transforms character<br />
<strong>to</strong> numeric variables or numeric <strong>to</strong> character variables, ei<strong>the</strong>r all possible values must be recoded or G must be<br />
used <strong>to</strong> avoid an error situation.<br />
Missing values are not recoded unless <strong>the</strong> recode instructions specify how <strong>the</strong>y should be recoded. (G refers<br />
<strong>to</strong> only non-missing extra values.) The three different types of missing values can be explicitly referenced using<br />
M1, M2 and M3:
<strong>PPL</strong>: NCOT and RECODE 4.5<br />
LIST Kittens<br />
[ SET Sex = RECODE ( Sex,<br />
'm' = 'male', 'f' = 'female',<br />
G = M1, M2 = 'neuter' ) ] $<br />
In this listing, values of Sex that are MISSING2 are recoded <strong>to</strong> “neuter”, and extra values are recoded <strong>to</strong><br />
MISSING1. Any values of MISSING1 or MISSING3 remain <strong>the</strong> same.<br />
M by itself recodes any of MISSING1, MISSING2 or MISSING3 in <strong>the</strong> original value <strong>to</strong> a single new value.<br />
When M by itself is used as a new value, it is assumed <strong>to</strong> be MISSING1:<br />
LIST File1<br />
[ GENERATE New = RECODE ( Old, M = 0, 99 = M ) ] $<br />
Old New<br />
- 0<br />
-- 0<br />
--- 0<br />
99 -<br />
77 77<br />
All types of missing are recoded as “zeros”. However, since a variable can be only one type of missing at a time,<br />
using 99=M is treated as 99=M1.<br />
The number “99” is recoded as M1. Since no test is given for 77 and since G= was not used, <strong>the</strong> 77 is not changed.<br />
(Note that <strong>the</strong> RECODE function references <strong>the</strong> system variables for <strong>the</strong> different types of missing data using a<br />
simplified notation. The regular notation for system variables may also be used — .M., .M1., .M2. and .M3.)<br />
XRECODE is an eXact recode — it works just like RECODE except that <strong>the</strong> case (upper, lower or mixed)<br />
of <strong>the</strong> recoding instructions is respected:<br />
LIST File2<br />
[GEN Num = XRECODE ( Char, 'a' = 1, 'A' = 9 ) ] $<br />
Char Num<br />
a 1<br />
A 9<br />
In o<strong>the</strong>r aspects, XRECODE operates in <strong>the</strong> same manner that RECODE does.<br />
In summary, <strong>the</strong>re are RECODE instructions for:<br />
• Numbers;<br />
• Character strings;<br />
• Missing values (M, M1, M2, M3);<br />
• Any good (non-missing) value left over (G).<br />
The data type of all <strong>the</strong> recoded values for a given variable must be <strong>the</strong> same, and it must agree with that of <strong>the</strong><br />
modified or newly generated variable.<br />
Note: On an ASCII character set computer like a PC, you cannot use an XRECODE test of 'a' <strong>to</strong> 'Z' because<br />
'a' is 97 and 'Z' is 90, and <strong>the</strong> test is backwards. That would, however, be legal in EBCDIC on an IBM mainframe,<br />
where 'a' is 129 and 'Z' is 233.
4.6 <strong>PPL</strong>: NCOT and RECODE<br />
4.3 COMPLEX RECODES<br />
The multiple argument usage of RECODE is exactly like <strong>the</strong> single argument usage in <strong>the</strong> way that <strong>the</strong> arguments<br />
are organized and in <strong>the</strong> values that can be supplied and tested. The RECODE syntax is:<br />
1. RECODE (<br />
2. <strong>the</strong> arguments <strong>to</strong> be used in <strong>the</strong> recode. If <strong>the</strong>re are multiple arguments <strong>the</strong>y are separated by a vertical<br />
bar, "|". A comma follows <strong>the</strong> final argument.<br />
3. optional definitions for a set of values. The definitions provide a label for a set of values that can<br />
<strong>the</strong>n be referenced by that label in <strong>the</strong> recode tests that follow. These definitions are enclosed in paren<strong>the</strong>ses<br />
and are described below.<br />
4. one or more recoding tests which are executed in left <strong>to</strong> right order. If <strong>the</strong>re are multiple arguments<br />
<strong>the</strong> sections of <strong>the</strong> test are separated by a vertical bar.<br />
5. a right paren<strong>the</strong>sis, ")"<br />
4.4 RECODE: The Arguments<br />
The composition of <strong>the</strong> arguments in a complex recode is exactly <strong>the</strong> same as in <strong>the</strong> single argument situation. An<br />
argument is often simply <strong>the</strong> name of a variable, but it can be a complex expression.<br />
RECODE ( Age,<br />
all tests are made using <strong>the</strong> single variable age. When <strong>the</strong>re are two arguments:<br />
RECODE ( Age | <strong>Inc</strong>ome,<br />
test values must be supplied for both variables Age and <strong>Inc</strong>ome. A three argument example:<br />
RECODE ( Age | Husband.<strong>Inc</strong>ome + Wife.<strong>Inc</strong>ome | Region<br />
requires 3 values for each test. The first is a value for Age. The second is a value for <strong>the</strong> sum of variables Husband.<strong>Inc</strong>ome<br />
and Wife.<strong>Inc</strong>ome. The third value is for variable Region. The arguments and <strong>the</strong> first test for this<br />
RECODE might look like:<br />
RECODE ( Age |Husband.<strong>Inc</strong>ome + Wife.<strong>Inc</strong>ome | Region,<br />
le 30 | le 35000 | 'East' = 1, ...]<br />
The arguments can be numeric or character or a mixture of <strong>the</strong> two. Each value in a test must be <strong>the</strong> same<br />
data type as <strong>the</strong> corresponding argument. In <strong>the</strong> previous RECODE “30” and “35000” are appropriate values for<br />
<strong>the</strong> two numeric arguments and “East” is an appropriate value for <strong>the</strong> third argument, <strong>the</strong> character variable<br />
Region.<br />
4.5 The RECODE Tests<br />
There are two general tests which do not use a test segment for each argument. One is for missing arguments, <strong>the</strong><br />
o<strong>the</strong>r is for good arguments.<br />
M = result<br />
is successful when ANY of <strong>the</strong> recode arguments is missing.<br />
G = result<br />
is successful when ALL of <strong>the</strong> recode arguments are good (non-missing).<br />
M=, when used, is usually placed at ei<strong>the</strong>r <strong>the</strong> beginning or end of <strong>the</strong> tests. G=when used is usually placed<br />
at <strong>the</strong> end of <strong>the</strong> tests. Since tests are processed in a strict left <strong>to</strong> right order placing both of <strong>the</strong>m at <strong>the</strong> beginning<br />
causes all <strong>the</strong> rest of <strong>the</strong> tests <strong>to</strong> be ignored. Processing of a recode s<strong>to</strong>ps as soon as <strong>the</strong>re is a successful test. Any
<strong>PPL</strong>: NCOT and RECODE 4.7<br />
set of arguments is ei<strong>the</strong>r all good (G=) or has some missing values (M=). When both of <strong>the</strong>m are placed at <strong>the</strong><br />
beginning of <strong>the</strong> tests one of <strong>the</strong>m will be successful and <strong>the</strong> remaining tests will never be processed.<br />
[ GEN Group:c = RECODE ( Age | Region,<br />
M = M3,<br />
LT 30 | 'east' = 'one',<br />
GE 30 | NE 'east' = 'two',<br />
G = 'three' ) ]<br />
Each test, except for M= and G=, is composed of as many test segments as <strong>the</strong>re are arguments. The vertical<br />
bar is used <strong>to</strong> separate <strong>the</strong> test segments within each test. In <strong>the</strong> example above <strong>the</strong>re are 4 tests. The first test<br />
“M=”is not segmented. It returns missing 3 when ei<strong>the</strong>r Age or Region is missing. The next two tests have 2<br />
segments, one for each of <strong>the</strong> two arguments Age and Region. Finally “G=”, an unsegmented test, assigns <strong>to</strong><br />
Group three any remaining case with non-missing values on both Age and Region<br />
A test segment consists of one or more comparisons. A comparison consists of:<br />
1. An optional logical opera<strong>to</strong>r such as<br />
LT (less than) LE (less than or equal),<br />
EQ (equal), NE (not equal),<br />
GE (greater than or equal) GT (greater than).<br />
EQ is assumed. LT, LE, GT, GE can only be used with a single numeric or character constant.<br />
2. The values <strong>to</strong> be tested. These are usually one or more constants but can also be a definition. Definitions<br />
are discussed below.<br />
The things that can be tested:<br />
1. numeric constant such as 12.7, 13 or 55<br />
2. numeric range such as 1 <strong>to</strong> 8<br />
3. character constant such as 'east'<br />
4. character range such as 'a' <strong>to</strong> 'e'<br />
5. G - any good value<br />
6. M - any missing value<br />
7. M1, M2, or M3 for MISSING1, MISSING2 or MISSING3<br />
8. (nnn) provides <strong>the</strong> number, for example “(123)”, of a definition containing <strong>the</strong> values <strong>to</strong> be tested.<br />
Definitions are described below.<br />
There can be several comparisons in a test segment. The test segment for an argument is successful when any<br />
one of <strong>the</strong> comparisons is successful. An example of a multiple comparison:<br />
LT 20, GT 30 = 11<br />
The segment is successful for arguments less than 20 or greater than 30. This is <strong>the</strong> same as:<br />
NE 20 TO 30 = 11<br />
In a long series of tests, it is often necessary <strong>to</strong> repeat a test segment several times. This repetition can be<br />
minimized by making use of <strong>the</strong> fact that a null segment au<strong>to</strong>matically repeats <strong>the</strong> previous test for that segment.<br />
[ SET <strong>Inc</strong>ome.Groups = RECODE<br />
( area code | income,<br />
609 908 201 215 | LE 30000 = 1,<br />
| GT 30000 = 2,<br />
NE 609 908 201 215 | LE 30000 = 3,<br />
| GT 30000 = 4 ) }
4.8 <strong>PPL</strong>: NCOT and RECODE<br />
is <strong>the</strong> same as<br />
[ SET <strong>Inc</strong>ome.Groups = RECODE<br />
( area code | income,<br />
609 908 201 215 | LE 30000 = 1,<br />
609 908 201 215 | GT 30000 = 2,<br />
NE 609 908 201 215 | LE 30000 = 3,<br />
NE 609 908 201 215 | GT 30000 = 4 ) ]<br />
Note: “NE 609 908 201 215” is true when <strong>the</strong> area code value is not equal <strong>to</strong> ANY of <strong>the</strong>m.<br />
4.6 Defining a Set of Constants<br />
When a set of constants is used repeatedly, <strong>the</strong>y can be defined as a group, given an integer label (from 1 <strong>to</strong><br />
999999), and referred <strong>to</strong> by using that label.<br />
[ SET <strong>Inc</strong>ome.Groups = recode (<br />
area code | income,<br />
( DEFINE 101 = 609 908 201 215),<br />
(101) | LE 30000 = 1,<br />
| GT 30000 = 2,<br />
NE (101) | LE 30000 = 3,<br />
GT 30000 = 4 ) ]<br />
There can be many such definitions within a single RECODE. They must follow <strong>the</strong> list of arguments and<br />
precede <strong>the</strong> tests. The format is ei<strong>the</strong>r “DEFINE” or “DEF” followed by a numeric label, an equal sign (=) and a<br />
list of values. The entire definition is in paren<strong>the</strong>ses and is followed by a comma. The numeric labels in <strong>the</strong> definitions<br />
must be unique. The following definitions cause an error because both use <strong>the</strong> label “1”:<br />
( DEFINE 1 = 609 908 201 ),<br />
( DEFINE 1 = 215 412 610 717 814 ),<br />
A given definition set can be referenced many times, and can be used for any of <strong>the</strong> arguments. A given test<br />
segment can reference several definition sets and use additional values as well. Figure 4.2 contains both <strong>the</strong> command<br />
with <strong>PPL</strong> for a multiple-variable RECODE and <strong>the</strong> resulting output file Because <strong>the</strong> recode action<br />
progresses from left <strong>to</strong> right. It is easy <strong>to</strong> flag as errors <strong>the</strong> cases which have conflicting postal zip codes and telephone<br />
area codes. Definitions 1 and 101 represent <strong>the</strong> area codes and zip codes for New Jersey. Definitions 2 and<br />
102 represent <strong>the</strong> area codes and zip codes for Pennsylvania.<br />
( DEFINE 1 = 201 609 908 ),<br />
( DEFINE 2 = 215 412 610 717 814 ),<br />
( DEFINE 101 = '07000' TO '07900' '08001' TO '08990' ),<br />
( DEFINE 102 = '15201' TO '19980' ),<br />
Given <strong>the</strong>se two definitions, <strong>the</strong>se two tests are <strong>the</strong> same:<br />
(1) M1 | (101) = 'New Jersey'<br />
201 609 908 M1 | '07000' TO '07900' '08001' <strong>to</strong> '08990' = 'New Jersey'<br />
Any case with a value on Area.code that is ei<strong>the</strong>r included in definition 1 or is missing, and with a value on variable<br />
Zip that is included in definition 101 is given a value of “New Jersey” on variable State:<br />
(1) M1 | (101) = 'New Jersey',<br />
A case is also coded as “New Jersey” if it has a value on Area.code that is one of <strong>the</strong> definition 1 values and is<br />
ei<strong>the</strong>r missing or has a zip code that is among <strong>the</strong> values for definition 101:<br />
(1) | (101) M1 = 'New Jersey',
<strong>PPL</strong>: NCOT and RECODE 4.9<br />
__________________________________________________________________________<br />
Figure 4.2 Multi-Variable RECODE With Definitions<br />
MODIFY States [<br />
GEN State:c = RECODE ( Area.code | Zip,<br />
( DEFINE 1 = 201 609 908 ),<br />
( DEFINE 2 = 215 412 610 717 814 ),<br />
( DEFINE 101 = '07000' TO '07900' '08001' TO '08990' ),<br />
( DEFINE 102 = '15201' TO '19980' ),<br />
(1) M | (101) = 'New Jersey',<br />
(1) | (101) M = 'New Jersey',<br />
(2) M | (102) = 'Pennsylvania',<br />
(2) | (102) M = 'Pennsylvania',<br />
(1) (2) | (101) (102) = 'ERROR',<br />
M='Undefined', G='O<strong>the</strong>r' ) ], OUT States $<br />
File States<br />
Area<br />
code Zip State<br />
201 - New Jersey<br />
313 30225 O<strong>the</strong>r<br />
609 08525 New Jersey<br />
215 08525 ERROR<br />
412 16030 Pennsylvania<br />
- 19340 Pennsylvania<br />
215 - Pennsylvania<br />
__________________________________________________________________________<br />
The same procedure is used for <strong>the</strong> area codes and zip codes used <strong>to</strong> set State <strong>to</strong> “Pennsylvania”. Any case<br />
that has not passed one of <strong>the</strong>se 4 tests, and has a non-missing value for area code that is among <strong>the</strong> values in ei<strong>the</strong>r<br />
of definitions 1 or 2 with a zip code that is in ei<strong>the</strong>r of <strong>the</strong> definitions 101 or 102 has a coding problem: ei<strong>the</strong>r a<br />
New Jersey area code and a Pennsylvania zip code or a Pennsylvania area code and a New Jersey zip code. The<br />
value for State on <strong>the</strong>se cases is set <strong>to</strong> “ERROR”.<br />
Any case that has good values on both variables that are not among any of <strong>the</strong> lists in <strong>the</strong> definitions, is caught<br />
by <strong>the</strong> 'G=' test and is set <strong>to</strong> “O<strong>the</strong>r”. Any case which has an undefined good value on one of <strong>the</strong> variables and<br />
missing on <strong>the</strong> o<strong>the</strong>r variable is caught by <strong>the</strong> 'M=” test and is set <strong>to</strong> “Undefined”, as are cases that are missing on<br />
both variables.<br />
4.7 The Result Values<br />
The results values can be any of <strong>the</strong> following:<br />
1. M1 or M missing 1<br />
2. M2 missing 2<br />
3. M3 missing 3
4.10 <strong>PPL</strong>: NCOT and RECODE<br />
4. nn a numeric constant such as 22 or 1.543<br />
5. 'ccccc' a character constant<br />
6. #tt a temporary scratch variable<br />
7. ##tt a permanent scratch variable<br />
8. ARGn <strong>the</strong> value of <strong>the</strong> cited argument. If <strong>the</strong> recode has 2 arguments, <strong>the</strong>y are referred<br />
<strong>to</strong> as ARG1 and ARG2.<br />
All <strong>the</strong> result values in a given recode must be <strong>the</strong> same type. In o<strong>the</strong>r words, you cannot use a numeric constant<br />
and a character scratch value as results in <strong>the</strong> same recode.<br />
Using scratch variables allows a recode <strong>to</strong> access o<strong>the</strong>r variables in a case.<br />
[ GENERATE #n = d;<br />
SET XYX = RECODE ( a|b|c, 1|2|3 = #n, etc.<br />
When none of <strong>the</strong> tests is successful and G= and M= are not used, <strong>the</strong> result depends on <strong>the</strong> number and type<br />
of <strong>the</strong> arguments. If <strong>the</strong> recode has one argument:<br />
1. if <strong>the</strong> argument and result types are <strong>the</strong> same, <strong>the</strong> argument is used as <strong>the</strong> result. This is useful when<br />
some values are <strong>to</strong> be changed, but <strong>the</strong> rest should remain <strong>the</strong> same.<br />
2. if <strong>the</strong> argument and result types differ, and <strong>the</strong> argument is missing, <strong>the</strong> result is set <strong>to</strong> <strong>the</strong> same kind<br />
of missing.<br />
3. if <strong>the</strong> argument and result types differ, and <strong>the</strong> argument is not missing, an error occurs.<br />
If <strong>the</strong> recode has more than one argument:<br />
4. if any argument is missing, <strong>the</strong> result is set <strong>to</strong> <strong>the</strong> same kind of missing.<br />
5. if all arguments are non-missing, an error occurs.<br />
In all situations except (possibly) <strong>the</strong> first, it is good practice <strong>to</strong> use M= and G=, so that <strong>the</strong> recode is fully<br />
defined.<br />
4.8 RECODE or IF/SET<br />
Using RECODE is usually clearer and faster than using a series of IFs and SETs <strong>to</strong> do <strong>the</strong> same thing. For example,<br />
file AAA has variables AGE and REGION. We need a new variable named SECTOR <strong>to</strong> be created from <strong>the</strong><br />
values on age and region. We want SECTOR <strong>to</strong> be:<br />
• M1 if ei<strong>the</strong>r age or region is missing<br />
• 1 if age LT 30 and region EQ 'east'<br />
• 2 if age LT 30 and region EQ 'central'<br />
• 3 if age LT 30 and region EQ 'west'<br />
• 4 if age GE 30 and region EQ 'east'.<br />
• 5 if age GE 30 and region EQ 'central'<br />
• 6 if age GE 30 and region EQ 'west'<br />
• M2 if age and region have GOOD values, but have not matched a previous test.<br />
Figure 4.3 contains <strong>the</strong> <strong>PPL</strong> statements first <strong>to</strong> do this recode using IF and SET and <strong>the</strong>n using a multi-argument<br />
RECODE<br />
If file AAA had age and region as shown, ei<strong>the</strong>r of <strong>the</strong> MODIFY commands in Figure 4.3 would produce <strong>the</strong><br />
following results:
<strong>PPL</strong>: NCOT and RECODE 4.11<br />
__________________________________________________________________________<br />
Figure 4.3 RECODE or IF/SET<br />
Using IF/SET:<br />
MODIFY aaa<br />
[ GENERATE sec<strong>to</strong>r = .m1. ;<br />
IF age good and region good, SET sec<strong>to</strong>r = .m2.;<br />
IF age lt 30 and region EQ 'east' SET sec<strong>to</strong>r = 1;<br />
IF age lt 30 and region EQ 'central' SET sec<strong>to</strong>r = 2;<br />
IF age lt 30 and region EQ 'west' SET sec<strong>to</strong>r = 3;<br />
Using RECODE:<br />
IF age ge 30 and region EQ 'east' SET sec<strong>to</strong>r = 4;<br />
IF age ge 30 and region EQ 'central' SET sec<strong>to</strong>r = 5;<br />
IF age ge 30 and region EQ 'west' SET sec<strong>to</strong>r = 6;<br />
], out bbb $<br />
MODIFY aaa<br />
[ GENERATE sec<strong>to</strong>r = RECODE ( age|region,<br />
M = m1,<br />
lt 30| 'east' = 1,<br />
| 'central' = 2,<br />
| 'west' = 3,<br />
ge 30| 'east' = 4,<br />
| 'central' = 5,<br />
| 'west' = 6,<br />
G = m2 ) ], out bbb$<br />
__________________________________________________________________________<br />
Age Region Sec<strong>to</strong>r<br />
22 -- -<br />
23 central 2<br />
44 west 6<br />
19 south --<br />
30 east 4<br />
4.9 RECODE Pointers<br />
If you are doing a very complex recode with many variables and values, <strong>the</strong>re may be some combinations that are<br />
far more likely than o<strong>the</strong>rs. You can improve <strong>the</strong> speed of <strong>the</strong> command by arranging your recodes so that <strong>the</strong><br />
most common results are among <strong>the</strong> early tests. Suppose you are recoding 60 country names in<strong>to</strong> integers. One<br />
approach would be <strong>to</strong> organize <strong>the</strong> tests alphabetically, so that 'albania=22' precedes 'china'=12. If, however, half<br />
of <strong>the</strong> cases come from just five countries, <strong>the</strong> recode will be faster if those five tests are placed before <strong>the</strong> fiftyfive<br />
o<strong>the</strong>rs.<br />
In <strong>the</strong> same manner, putting M=m1 or such at <strong>the</strong> beginning of <strong>the</strong> tests will be faster when many of <strong>the</strong> cases<br />
have a missing value on <strong>the</strong> recode argument, but will be slightly slower when no cases have missing values on<br />
<strong>the</strong> recode argument.<br />
When you are dealing with missing values, it should be noted that M= and M|M|M= are different. Consider<br />
a three value recode:
4.12 <strong>PPL</strong>: NCOT and RECODE<br />
M= is successful when ANY argument is missing,<br />
M|M|M= is successful when ALL arguments are missing.<br />
When you are using EQ (equal) and NE (not equal) <strong>the</strong> phrase<br />
EQ 2 5 <strong>to</strong> 9 11<br />
should be thought of as<br />
The phrase:<br />
EQ 2, OR EQ 5 <strong>to</strong> 9, OR EQ 11.<br />
NE 2 5 <strong>to</strong> 9 11<br />
should be thought of as<br />
NE 2, AND NE 5 <strong>to</strong> 9, AND NE 11.<br />
Figure 4.4 shows successful EQ comparisons for arguments of 1, 2, m1, m2 and m3 when compared <strong>to</strong> test<br />
constants of 1, 2, m1, m2, m3, m and g. S means a successful comparison.<br />
__________________________________________________________________________<br />
Figure 4.4 EQ and NE Comparisons<br />
---EQ comparisons with---<br />
argument 1 2 M1 M2 M3 M G<br />
1 S . . . . . S<br />
2 . S . . . . S<br />
M1 . . S . . S .<br />
M2 . . . S . S .<br />
M3 . . . . S S .<br />
---NE comparisons with---<br />
argument 1 2 M1 M2 M3 M G<br />
1 . S S S S S .<br />
2 S . S S S S .<br />
M1 . . . S S . S<br />
M2 . . S . S . S<br />
M3 . . S S . . S<br />
__________________________________________________________________________<br />
4.10 XRECODE<br />
The X in Xrecode means eXact comparisons. Consider:<br />
RECODE( 'aBc', 'ABC'=1, 'aBc'=2, etc.<br />
XRECODE( 'aBc', 'ABC'=1, 'aBc'=2, etc.<br />
The RECODE returns 1, because recode ignores upper/lower case differences. Therefore, <strong>the</strong> argument value of<br />
aBc is matched by ABC. The XRECODE does not match aBc with ABC because <strong>the</strong> cases differ, and proceeds<br />
<strong>to</strong> <strong>the</strong> aBc test which succeeds, and returns 2.<br />
Suppose, however, you want <strong>to</strong> do a recode using two character arguments, REGION with case-independent<br />
comparisons (RECODE) and CODE with case specific comparisons (XRECODE). This can be done using XRE-<br />
CODE by:
<strong>PPL</strong>: NCOT and RECODE 4.13<br />
1. converting values of REGION <strong>to</strong> upper case as <strong>the</strong> recode begins, and <strong>the</strong>n<br />
2. using uppercase constants in its test segments.<br />
For example:<br />
XRECODE( UPPER(region)| code, 'EAST' | 'aBc' = 1, etc.
4.14 <strong>PPL</strong>: NCOT and RECODE<br />
<strong>PPL</strong> Functions:<br />
NCOT (exp, ncot instructions)<br />
SUMMARY<br />
does N-way dicho<strong>to</strong>mizations (divisions) of numeric values and recodes those values according <strong>to</strong> instructions<br />
given in <strong>the</strong> second argument. The arguments for NCOT must be enclosed in paren<strong>the</strong>ses.<br />
The first argument is an expression which may be a simple variable name or a complex expression. This<br />
is followed by cutting points and possibly a step size. All values less than or equal <strong>to</strong> <strong>the</strong> first cutting<br />
point become a “1”. All values greater than <strong>the</strong> first cutting point and less than or equal <strong>to</strong> <strong>the</strong> second<br />
cutting point become a “2”.<br />
MODIFY File1<br />
[ SET Age = NCOT ( Age, 14 ) ;<br />
GENERATE NN = NCOT ( FRAC (T1), .3, .6, .9 ) ;<br />
SET ZZ = NCOT ( ZZ, 20, 50/5, 90/10 ) ],<br />
OUT File2 $<br />
The final value includes all <strong>the</strong> numbers greater than <strong>the</strong> final cutting point. Thus, <strong>the</strong>re will always be<br />
one more possible output value than <strong>the</strong>re are cutting points. The instruction “20, 50/5” defines cutting<br />
points from 20 through 50 in steps of 5. This is a shorthand way of providing <strong>the</strong> cutting points 25, 30,<br />
35, and so on. In <strong>the</strong> example above, cutting points for <strong>the</strong> variable ZZ will occur at 20, 25, 30, 35, 40,<br />
45, 50, 60, 70, 80, and 90. The resulting values will be from 1 <strong>to</strong> 12.<br />
RECODE (exp, recode instructions)<br />
recodes <strong>the</strong> numeric or character variable specified by <strong>the</strong> expression according <strong>to</strong> <strong>the</strong> instructions given<br />
in <strong>the</strong> second argument:<br />
MODIFY Nursery<br />
[ GENERATE Coded.Age =<br />
RECODE ( ROUND (Age), LE 4 = 1, 5 = 2, GE 6 = 3 ) ;<br />
SET Race =<br />
RECODE ( Race, 0 3 = 2, M3 = 1 ) ;<br />
GENERATE Gender:C =<br />
RECODE ( Sex, 1 = 'Boy', 2 = 'Girl', G = '?' ) ],<br />
OUT Nursery $<br />
RECODE is a function and its arguments must be enclosed within paren<strong>the</strong>ses. The first argument following<br />
<strong>the</strong> RECODE may be a variable name or a complicated expression.<br />
Recoding may be applied <strong>to</strong> numeric values:<br />
1 TO 5 = 1 one through five become one<br />
6 = 'F' sixes become F<br />
7 TO 9 14 = 3 seven, eight, nine, and fourteen<br />
become three
<strong>PPL</strong>: NCOT and RECODE 4.15<br />
Recoding may be applied <strong>to</strong> character values:<br />
'male' = 1 values of male become 1<br />
'male' = 'm' values of male become m<br />
'A' TO 'DZ' = 3 A through DZ become three<br />
After <strong>the</strong> recode, <strong>the</strong> new values must be all of one data type; that is, <strong>the</strong>y must be ei<strong>the</strong>r all numeric or<br />
all character. Recoding may be applied <strong>to</strong> missing values:<br />
M = 3 missing values become threes<br />
M1 = 'DK' missing one becomes DK<br />
Recoding can be applied <strong>to</strong> what is left over after <strong>the</strong> o<strong>the</strong>r recodes are completed:<br />
G = 4 unrecoded good values become 4<br />
G = '?' unrecoded good values become ?<br />
When <strong>the</strong> recoding transforms character <strong>to</strong> numeric variables or numeric <strong>to</strong> character variables, ei<strong>the</strong>r all<br />
possible values must be recoded or G must be used <strong>to</strong> avoid an error situation. There may be only one G<br />
= in a RECODE.<br />
RECODE multiple argument<br />
recode values are based on several variables or expressions.<br />
[ GEN Group:c = RECODE ( Age | Region,<br />
M = M3,<br />
LT 30 | 'east' = 'one',<br />
GE 30 | NE 'east' = 'two',<br />
G = 'three' ) ]<br />
In this example <strong>the</strong> recode is based on <strong>the</strong> combined values of Age and Region. There are four possible<br />
results. Variable Group = ’one’ when Age is less than 30 and Region = 'East'. Variable Group = 'two'<br />
when age is greater or equal <strong>to</strong> thirty and Region does not equal 'east'. Any case that is missing on ei<strong>the</strong>r<br />
variable is set <strong>to</strong> missing type 3 on variable Group. Any case with good values on both Age and Region<br />
that was not mentioned in <strong>the</strong> previous tests has a value of 'three' on variable Groups.<br />
XRECODE(exp, recode instructions)<br />
recodes character strings eXactly — that is, respecting <strong>the</strong> specified case (lower, upper or mixed) of <strong>the</strong><br />
string:<br />
[ GEN Symp<strong>to</strong>m = XRECODE (Note, 'a' = 1, 'b' = 2, 'A' = 0) ;<br />
XRECODE works like RECODE with regard <strong>to</strong> o<strong>the</strong>r aspects. Character strings may be exactly recoded<br />
<strong>to</strong> numbers or o<strong>the</strong>r character strings. XRECODE can be used for both simple and complex recodes.
5<br />
<strong>PPL</strong>:<br />
DO LOOPS and<br />
IF-THEN-ELSE Blocks<br />
The first section of this chapter documents DO loops. DO loops enable you <strong>to</strong> do repetitive operations easily. The<br />
second section covers <strong>the</strong> use of DO loops <strong>to</strong> generate or rename groups of variables. The last section covers <strong>the</strong><br />
use of IF-THEN-ELSE blocks <strong>to</strong> handle complex logic. (The use of a simple IF was covered in <strong>the</strong> previous<br />
chapters.)<br />
5.1 DO LOOPS<br />
DO loops specify repetitive instructions. They are useful when it is necessary <strong>to</strong> do <strong>the</strong> same modification on a<br />
number of different variables. Repetitive actions can of course be done one at a time, repeating <strong>the</strong> modification<br />
clauses and changing <strong>the</strong> variable names or positions as many times as necessary:<br />
LIST F1 [ SET V(1) = RECODE ( V(1), 6 TO 9 = 5 ) ;<br />
SET V(2) = RECODE ( V(2), 6 TO 9 = 5 ) ;<br />
SET V(3) = RECODE ( V(3), 6 TO 9 = 5 ) ;<br />
SET V(5) = RECODE ( V(5), 6 TO 9 = 5 ) ] $<br />
However, this is tedious and may be done more easily using a DO loop:<br />
LIST F1 [<br />
DO #J USING V(1) TO V(3) V(5);<br />
SET V(#J) = RECODE ( V(#J), 6 TO 9 = 5 );<br />
ENDDO ] $<br />
The DO statement above has five components:<br />
1. DO which is followed by<br />
2. #J a temporary or permanent numeric scratch variable. The value of this changes<br />
each time <strong>the</strong> loop is traversed<br />
3. USING which indicates that a list of variables follows<br />
4. V(1) TO V(3) V(5) a list of variables associated with <strong>the</strong> loop<br />
5. ; ends <strong>the</strong> list of variables and <strong>the</strong>refore ends <strong>the</strong> DO statement.<br />
The USING list has four variables; <strong>the</strong>refore <strong>the</strong> statements up <strong>to</strong> <strong>the</strong> ENDDO will be done four times. The<br />
scratch variable #J is set <strong>to</strong> <strong>the</strong> POSITION of <strong>the</strong> next variable in <strong>the</strong> list each time <strong>the</strong> loop repeats. Thus, in <strong>the</strong><br />
four iterations, it takes on <strong>the</strong> values 1, 2, 3, and 5.<br />
The V vec<strong>to</strong>r, as always, holds <strong>the</strong> data of <strong>the</strong> current case. In <strong>the</strong> SET statement, variable V(#J) is recoded.<br />
Therefore we recode variables 1, 2, 3, and 5 in <strong>the</strong> four iterations. Note that in V(#J) usage, <strong>the</strong> subscript expression<br />
(here, just <strong>the</strong> #J) must result in an integer that is within <strong>the</strong> range of variables in <strong>the</strong> file. In o<strong>the</strong>r words,<br />
fractional or negative subscripts like V(#J+.6) and V(-#J) would be errors.<br />
The DO loop always ends with <strong>the</strong> ENDDO instruction.
5.2 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
The USING loop is one form of DO loop. The o<strong>the</strong>r forms is a range loop. In this form:<br />
DO #J = 5, 13, 1;<br />
<strong>the</strong> DO scratch variable takes its value from <strong>the</strong> DO range which begins with <strong>the</strong> first number (5) and, increases<br />
through <strong>the</strong> second number (13) in steps of <strong>the</strong> third number (1). Here #J begins with 5, becomes 6, <strong>the</strong>n 7, etc.<br />
The final time through <strong>the</strong> loop #J has <strong>the</strong> value 13 and <strong>the</strong> loop has been executed 9 times. When <strong>the</strong> stepsize is<br />
1, it may be omitted.<br />
There is no limit <strong>to</strong> <strong>the</strong> number of <strong>PPL</strong> instructions that may be done between <strong>the</strong> DO statement and <strong>the</strong> END-<br />
DO statement. The DO can contain o<strong>the</strong>r DOs or IF/THEN/ELSE blocks (described later in this chapter). Figure<br />
5.1 contains <strong>the</strong> input, <strong>the</strong> LIST command and <strong>the</strong> resulting prin<strong>to</strong>ut for a simple DO loop with a list of variables.<br />
__________________________________________________________________________<br />
Figure 5.1 Simple DO Loop with a List of Variables<br />
FILE F1<br />
VAR1 VAR2 VAR3 VAR4 VAR5 VAR6<br />
1 9 2 7 6 8<br />
6 4 5 2 7 3<br />
LIST F1 [<br />
DO #J USING V(1) TO V(3) V(5);<br />
SET V(#J) = RECODE ( V(#J), 6 TO 9 = 5 );<br />
ENDDO ] $<br />
VAR1 VAR2 VAR3 VAR4 VAR5 VAR6<br />
1 5 2 7 5 8<br />
5 4 5 2 5 3<br />
__________________________________________________________________________<br />
5.2 DO USING a Variable List<br />
DO USING specifies a list of variables or values <strong>to</strong> which <strong>the</strong> subsequent instructions are <strong>to</strong> be applied. Both<br />
variable names and positions may be used in <strong>the</strong> list of variables. A user-supplied scratch variable must be provided.<br />
This scratch variable is <strong>the</strong>n available for general use within <strong>the</strong> loop.<br />
[ DO #J USING V(1) <strong>to</strong> V(3);<br />
The scratch variable is “#J” in this example, but it may be any legal temporary or permanent numeric scratch variable<br />
name. In this example <strong>the</strong> range of #J is from 1 through 3, <strong>the</strong> positions of variables V(1), V(2) and V(3).<br />
The subsequent modification instructions:<br />
SET V(#J) = RECODE ( V(#J), 6 TO 9 = 5 )<br />
are done three times, first when #J has <strong>the</strong> value 1,<br />
SET V(1) = RECODE ( V(1), 6 TO 9 = 5 )<br />
and <strong>the</strong>n when it has <strong>the</strong> values 2 and 3:<br />
SET V(2) = RECODE ( V(2), 6 TO 9 = 5 )<br />
SET V(3) = RECODE ( V(3), 6 TO 9 = 5 )<br />
In effect, a “loop” is set up — <strong>the</strong> first value of #J is used in <strong>the</strong> instruction, <strong>the</strong>n <strong>the</strong> next, and so on, until <strong>the</strong> last<br />
value of #J is used. The loop s<strong>to</strong>ps when all <strong>the</strong> instructions have been processed for <strong>the</strong> last value of #J
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.3<br />
Variable names and wildcards may be used in <strong>the</strong> DO list. If names are used in a DO instruction — for<br />
example:<br />
DO #J USING Math.Test TO English.Test;<br />
<strong>the</strong> positions of <strong>the</strong> variables in <strong>the</strong> file defines <strong>the</strong> range of <strong>the</strong> scratch variable. If Math.Test is <strong>the</strong> first variable<br />
in <strong>the</strong> file and English.Test is <strong>the</strong> third, #J has <strong>the</strong> values 1 through 3. If Math.Test is <strong>the</strong> sixth variable in <strong>the</strong> file<br />
and English.Test is <strong>the</strong> ninth, #J has <strong>the</strong> values 6 through 9. In ei<strong>the</strong>r case, <strong>the</strong> appropriate variables are referenced<br />
in <strong>the</strong> instruction following <strong>the</strong> DO:<br />
[ DO #J USING Math.Test TO English.Test ;<br />
SET V(#J) = RECODE ( V(#J), 6 TO 9 = 5 );<br />
ENDDO ]<br />
This DO phrase may be interpreted: for <strong>the</strong> scratch variable #J, which is initially <strong>the</strong> position of Math.Test<br />
and subsequently <strong>the</strong> positions of <strong>the</strong> o<strong>the</strong>r variables in <strong>the</strong> list, set each variable in turn <strong>to</strong> <strong>the</strong> RECODE function<br />
of itself, setting any value between 6 and 9 <strong>to</strong> 5. As <strong>the</strong> DO loop is processed, <strong>the</strong> variable position represented<br />
by V(#J) changes. Initially, <strong>the</strong> variable position is that of first variable in <strong>the</strong> list. With each new loop <strong>the</strong> scratch<br />
variable takes as its value <strong>the</strong> position of <strong>the</strong> next variable in <strong>the</strong> USING list.<br />
DO #J USING *;<br />
requests <strong>the</strong> re-use of <strong>the</strong> USING list from <strong>the</strong> most recent DO USING loop.<br />
__________________________________________________________________________<br />
Figure 5.2 DO With Two Scratch Variables<br />
File cold<br />
p<br />
Stuffy<br />
Date Headache Fever Nose Cough<br />
011593 1 0 0 0<br />
012293 0 1 1 1<br />
020993 0 1 0 1<br />
021093 1 0 0 0<br />
MODIFY Cold [ DO #J #N USING headache <strong>to</strong> Cough;<br />
IF V(#J) EQ 1, SET V(#J) = #N,<br />
F.SET V(#J) = .M1.;<br />
ENDDO ],<br />
OUT Cold $<br />
LIST Cold $<br />
Stuffy<br />
Date Headache Fever Nose Cough<br />
011593 1 - - -<br />
012293 - 2 3 4<br />
020993 - 2 - 4<br />
021093 1 - - -<br />
__________________________________________________________________________<br />
Ei<strong>the</strong>r form of DO loop may have a second scratch variable which has as its value <strong>the</strong> number of times <strong>the</strong><br />
DO is executed. Figure 5.2 illustrates this usage. The file has a series of dummy (0/1) variables. The purpose of
5.4 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
<strong>the</strong> MODIFY command is <strong>to</strong> convert <strong>the</strong> zeros <strong>to</strong> missing and <strong>the</strong> ones <strong>to</strong> <strong>the</strong> position of <strong>the</strong> variable in <strong>the</strong> DO<br />
list. This is an easy way <strong>to</strong> convert a series of multiple response questions which are coded as dummy variables<br />
in<strong>to</strong> <strong>the</strong> 1 through n type of code which <strong>the</strong> SURVEY command expects for multiple response banner (column)<br />
variables.<br />
MODIFY Cold [ DO #j #n USING headache <strong>to</strong> Cough;<br />
The first scratch variable, #j, takes on <strong>the</strong> positions (2-5) of <strong>the</strong> 4 variables in <strong>the</strong> USING list. The second<br />
scratch variable takes on <strong>the</strong> value 1 <strong>the</strong> first time through <strong>the</strong> loop, 2 <strong>the</strong> second time through <strong>the</strong> loop, 3 <strong>the</strong> third<br />
time through, and 4 in <strong>the</strong> final loop.<br />
5.3 DO Stepping Through a Range<br />
The second form of <strong>the</strong> DO uses a range of numeric constants or expressions.<br />
DO #K = 15, 24;<br />
The range of <strong>the</strong> scratch variable #K is 15 through 24. Because <strong>the</strong>re is no third argument, a stepsize of 1 is assumed<br />
and <strong>the</strong> values of #K are 15, 16, 17, etc.<br />
DO #K = 15, 24, 2;<br />
Here <strong>the</strong> stepsize is 2 and #K takes on <strong>the</strong> value 15, 17, 19, etc. The constants and <strong>the</strong> stepsize can be any numeric<br />
value or expression that is available at that moment. They can, in o<strong>the</strong>r words, use values which change from case<br />
<strong>to</strong> case. The values can be real numbers with a fractional part. The only exception <strong>to</strong> this is when <strong>the</strong> DO is used<br />
<strong>to</strong> generate or rename a list of variables. The values in a GENERATE or RENAME loop must be available at <strong>the</strong><br />
beginning of <strong>the</strong> command and must have integer values.<br />
__________________________________________________________________________<br />
Figure 5.3 DO: Range and Stepsize<br />
File Tests<br />
pre post pre post pre post<br />
.1 .1 .2 .2 .3 .3<br />
68 75 92 94 89 88<br />
73 73 84 93 85 89<br />
78 79 72 80 73 75<br />
MODIFY Tests<br />
[ DO #j = 2, 6, 2;<br />
SET V(#j) = V(#j) - V(#j-1);<br />
ENDDO ],<br />
OUT Test2 $<br />
File Test2<br />
pre post pre post pre post<br />
.1 .1 .2 .2 .3 .3<br />
68 7 92 2 89 -1<br />
73 0 84 9 85 4<br />
78 1 72 8 73 2<br />
__________________________________________________________________________
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.5<br />
Figure 5.3 contains a small data set with 3 sets of values: a pre score and a post score. The MODIFY command<br />
is used <strong>to</strong> change <strong>the</strong> post values <strong>to</strong> <strong>the</strong> difference between <strong>the</strong> pre value and <strong>the</strong> corresponding post value.<br />
DO #J = 2, 6, 2;<br />
has a value for #J that is 2 <strong>the</strong> first time through <strong>the</strong> loop. Because <strong>the</strong> stepsize is 2, <strong>the</strong> second time through <strong>the</strong><br />
loop <strong>the</strong> value of #J is 4. The final time through <strong>the</strong> loop <strong>the</strong> value of #J is 6. Because <strong>the</strong> subscripts for <strong>the</strong> V<br />
vec<strong>to</strong>r can be expressions, <strong>the</strong> use of V(#j-1) points <strong>to</strong> each of <strong>the</strong> pre variables in turn as it takes on <strong>the</strong> values 1,<br />
3, and 5.<br />
The DO numbers can be fractional values.<br />
DO #J = .5, .8, .1;<br />
This loop will have 4 iterations with #j as .5, .6, .7, and .8 . The range can go backwards with ei<strong>the</strong>r a supplied<br />
negative value or a default -1.<br />
DO #J = 3, -3, -2;<br />
The arguments for <strong>the</strong> DO can be expressions. If you wish <strong>to</strong> loop with a step argument through a list of variables<br />
and you know <strong>the</strong> variable names but not <strong>the</strong> locations you can <strong>to</strong> <strong>the</strong> following:<br />
DO #J = loc(pre.1) <strong>to</strong> loc(pre.3), 2;c<br />
Figure 5.4 illustrates <strong>the</strong> difference between <strong>the</strong> types of DO loops. In <strong>the</strong> first command #J takes on <strong>the</strong> positions<br />
of <strong>the</strong> variables in <strong>the</strong> USING list. In <strong>the</strong> second command #J begins with 2, <strong>the</strong> value of VAR1 and ends<br />
with 6, <strong>the</strong> value of VAR3.<br />
__________________________________________________________________________<br />
Figure 5.4 DO Loops: An Example of Each Type<br />
File XX<br />
VAR1 VAR2 VAR3<br />
2 4 6<br />
The Commands The Output<br />
PROCESS XX [ DO USING var1 TO var3; #J= 1 positions<br />
PUT #J; ENDDO ] $ #J= 2 of USING<br />
#J= 3 variables<br />
PROCESS XX [ DO #J = var1, var3; #J= 2 value var1<br />
PUT #J; #J= 3<br />
ENDDO ] $ #J= 4<br />
#J= 5<br />
#J= 6 value var3<br />
PROCESS XX [ DO #J = positions<br />
(loc)var1, (loc)var3, 2; #J= 1 var1<br />
PUT #J; #J= 3 var3<br />
ENDDO ] $<br />
__________________________________________________________________________
5.6 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
5.4 DO Loops: O<strong>the</strong>r Features<br />
Figure 5.5 illustrates <strong>the</strong> optional features of <strong>the</strong> DO. A DO can reference a label. For example:<br />
DO testloop #J ....<br />
This label is <strong>the</strong>n used as a statement label on <strong>the</strong> ENDDO statement:<br />
testloop: ENDDO;<br />
__________________________________________________________________________<br />
Figure 5.5 Labelled DO, EXITDO and NEXTDO<br />
File Myfile<br />
QA1 QA2 QA3 QB1 QB2 QB3<br />
2 3 4 - 1 1<br />
3 1 8 2 1 1<br />
7 3 6 1 1 1<br />
TEXT;<br />
First RECODE variables QA1 through QA3 in<strong>to</strong> <strong>the</strong> values 0-4. Then<br />
compute <strong>the</strong> average of QA1, QA2 and QA3. However, if <strong>the</strong> QB variable<br />
that corresponds <strong>to</strong> <strong>the</strong> QA variable is missing, exit <strong>the</strong> DO (EXITDO)<br />
and ignore <strong>the</strong> remaining values. If <strong>the</strong> QB variable that corresponds<br />
<strong>to</strong> <strong>the</strong> QA variable is a 2, move immediately (NEXTDO) <strong>to</strong> <strong>the</strong> next<br />
element in <strong>the</strong> loop and do not include <strong>the</strong> current value.<br />
$<br />
LIST Myfile [ GEN #Total = 0, GEN N = 0, GEN Average = .M.;<br />
...............<br />
DO testloop #j USING QA1 TO QA3;<br />
SET V(#j) = RECODE<br />
( V(#J), 0=M, 1 2=2, 3 TO 5=1, 6 8 9=3, G=4 );<br />
IF V(#J+3) MISSING, EXITDO;<br />
IF V(#J+3) EQ 2, NEXTDO;<br />
INCREASE #Total BY V(#J), INCREASE N;<br />
testloop: ENDDO;<br />
SET Average = #Total / N ] $<br />
QA1 QA2 QA3 QB1 QB2 QB3 N Average<br />
1 3 4 - 1 1 0 -<br />
2 1 3 2 1 1 2 2<br />
4 2 3 1 1 1 3 3<br />
__________________________________________________________________________<br />
Any statement that has a label can be used as <strong>the</strong> target of a GOTO. GOTO, which is discussed later in this chapter,<br />
provides a way <strong>to</strong> selectively execute <strong>the</strong> <strong>PPL</strong>. The label in a DO is also useful when <strong>the</strong>re are nested DO's.
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.7<br />
DO loop1 #J ....;<br />
<strong>PPL</strong> here;<br />
DO Loop2 #K ....;<br />
More <strong>PPL</strong> here;<br />
loop2: ENDDO;<br />
And yet more <strong>PPL</strong>;<br />
loop1: ENDDO;<br />
You can exit from a DO loop at any time by using <strong>the</strong> EXITDO <strong>PPL</strong> statement. EXITDO has <strong>the</strong> effect of a<br />
branch <strong>to</strong> <strong>the</strong> <strong>PPL</strong> (if any) after its ENDDO. <strong>PPL</strong> processing continues <strong>the</strong>re. NEXTDO, on <strong>the</strong> o<strong>the</strong>r hand, is a<br />
branch <strong>to</strong> <strong>the</strong> ENDDO statement where <strong>the</strong> DO counters are incremented. In Figure 5.5 <strong>the</strong> last 5 lines of <strong>the</strong> LIST<br />
command could have been written as:<br />
IF V(#J+3) MISSING, GOTO NEXT;<br />
IF V(#J+3) EQ 2, GOTO testloop;<br />
INCREASE #Total BY V(#J), INCREASE N;<br />
testloop: ENDDO;<br />
NEXT: SET Average = #Total / N ] $<br />
In Figure 5.5, QA2 and QA3 for <strong>the</strong> first case are not recoded. This is because QB1 is missing. As soon as<br />
QA1 is recoded <strong>the</strong> statement<br />
IF V(#J+3) MISSING, EXITDO;<br />
is executed. Since QB1 is missing, <strong>the</strong> loop is exited without processing <strong>the</strong> remaining variables for that case.<br />
Case 2 has it average calculated on just <strong>the</strong> last two values. This is because QB1 on that case is a 2. The statement:<br />
IF V(#J+3) EQ 2, NEXTDO;<br />
causes a branch <strong>to</strong> <strong>the</strong> ENDDO without including QA1 in <strong>the</strong> <strong>to</strong>tals.<br />
The DO scratch variable or variables are still defined when a DO exits. They remain set <strong>to</strong> whatever values<br />
<strong>the</strong>y had in <strong>the</strong> final DO iteration that was done.<br />
EXITDO and NEXTDO can be used in phrases like:<br />
IF Age GT 14, T.NEXTDO, F.EXITDO;<br />
EXITDO and NEXTDO can be followed by a DO statement label. Here, we exit all three loops from <strong>the</strong> innermost<br />
loop:<br />
DO aaa #J = 1, 2;<br />
DO bbb #K = 3, 4;<br />
DO ccc #M - 5, 6;<br />
EXITDO aaa;<br />
ccc: ENDDO;<br />
bbb: ENDDO;<br />
aaa: ENDDO;<br />
Even though we are out of <strong>the</strong> loops, #J and #K and #M can be used; <strong>the</strong>y have <strong>the</strong> values 1, 3, and 5. If <strong>the</strong> above<br />
EXITDO had no label, it would have exited only <strong>the</strong> DO ccc loop.<br />
5.5 GENERATE AND RENAME<br />
GENERATE and RENAME use <strong>the</strong> same conventions in creating variable names. When a single variable is involved<br />
<strong>the</strong>re is no need for a complex mask:
5.8 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
[ GENERATE Family.<strong>Inc</strong>ome;<br />
RENAME Test1 TO Math121;<br />
RENAME V(2) TO Chem34 ]<br />
RENAME requires <strong>the</strong> existing name, TO, and <strong>the</strong> new name, which must be a unique name in <strong>the</strong> file. GENER-<br />
ATE requires only <strong>the</strong> variable name.<br />
If a list of variables is <strong>to</strong> be generated or renamed, a DO loop may be used. A DO which contains a GEN-<br />
ERATE or RENAME may not contain o<strong>the</strong>r <strong>PPL</strong> statements. Also, if <strong>the</strong> DO uses a range (as in DO #J = 1, 5),<br />
<strong>the</strong> control values must be integer constants or integer scratch variables whose values are known when <strong>the</strong> command<br />
begins. It is necessary <strong>to</strong> know <strong>the</strong> range of values before any cases are read in order <strong>to</strong> properly set up <strong>the</strong><br />
renames or generates.<br />
5.6 Using GENERATE in DO Loops<br />
Typically, when GENERATE is used <strong>to</strong> create a new variable, a name for <strong>the</strong> variable is provided by <strong>the</strong> user or,<br />
if <strong>the</strong> “?” has been used, P-<strong>STAT</strong> generates a name. When GENERATE is used in a DO loop, multiple variables<br />
are created and unique names need <strong>to</strong> be provided or generated for <strong>the</strong>m. The format of a GENERATE within a<br />
DO loop is one of <strong>the</strong> following:<br />
GENERATE ? = value;<br />
GENERATE ? (mask) = value;<br />
GENERATE V(#J) (mask) = value;<br />
GENERATE V(##K) (mask) = value;<br />
If <strong>the</strong> variables are character, <strong>the</strong> :C or :C20 or such directly follows <strong>the</strong> mask or, if <strong>the</strong>re is no mask, <strong>the</strong> “?”.<br />
Masks are described below. The “= value” is optional; if not supplied, <strong>the</strong> variable is set <strong>to</strong> missing.<br />
When <strong>the</strong> “?” is used:<br />
[ DO #K USING Q3 TO Q5; GENERATE ? = SQRT ( V(#K) );<br />
ENDDO ]<br />
names for <strong>the</strong> three new variables are generated by P-<strong>STAT</strong>. If <strong>the</strong>re are ten variables in <strong>the</strong> file, <strong>the</strong> new variables<br />
are VAR11 (<strong>the</strong> square root of <strong>the</strong> variable named Q3), VAR12 (<strong>the</strong> square root of <strong>the</strong> variable named Q4) and<br />
VAR13 (<strong>the</strong> square root of <strong>the</strong> variable named Q5). The same format is used <strong>to</strong> generate a list of character<br />
variables:<br />
[ DO #K USING Q3 TO Q5;<br />
GENERATE ?:C = CHARACTER ( V(#K) ); ENDDO ]<br />
The ? is followed by “:C”. The length may be specified:<br />
GENERATE ?:C32<br />
A mask containing a prefix or suffix may be provided for <strong>the</strong> names being generated. The mask is enclosed<br />
in paren<strong>the</strong>ses and an ampersand (&) is used <strong>to</strong> represent <strong>the</strong> name of <strong>the</strong> current DO loop variable:<br />
[ DO #K USING Q3 TO Q5;<br />
GENERATE V(#K) ( 'Sqrt.' & ) = SQRT ( V(#K) ); ENDDO ]<br />
The new variable names are composed of <strong>the</strong> prefix “Sqrt.” followed by one of <strong>the</strong> names of <strong>the</strong> variables in <strong>the</strong><br />
DO list — <strong>the</strong> variable currently in <strong>the</strong> DO loop. Since <strong>the</strong> names of <strong>the</strong> variables in <strong>the</strong> DO list are Q3, Q4 and<br />
Q5, <strong>the</strong> names for <strong>the</strong> new variables are “Sqrt.Q3”, “Sqrt.Q4” and “Sqrt.Q5”. A suffix is created by moving <strong>the</strong><br />
“&” in <strong>the</strong> mask so that it precedes <strong>the</strong> string.<br />
[ DO #K USING Q3 TO Q5;<br />
GENERATE V(#K) ( & '.Sqrt' ) = SQRT ( V(#K) );<br />
ENDDO ]<br />
This creates new names “Q3.Sqrt”, “Q4.Sqrt” and “Q5.Sqrt”.
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.9<br />
If <strong>the</strong> new name is longer than 16 characters, <strong>the</strong> prefix or suffix is left intact and <strong>the</strong> current variable name<br />
is truncated. This may cause an error due <strong>to</strong> a repeated name.<br />
When GENERATE is used in a DO loop, <strong>the</strong> loop can have no o<strong>the</strong>r statements; i.e., it can have only DO,<br />
GENERATE and ENDDO.<br />
5.7 Using RENAME in DO Loops<br />
In <strong>the</strong> simplest form of RENAME, like <strong>the</strong> GENERATE illustrated above, all <strong>the</strong> renamed variables have <strong>the</strong><br />
specified prefix (or suffix) in <strong>the</strong>ir names:<br />
[ DO #J USING Item1 TO Item10; RENAME V(#J) ( 'Test.' & ); ENDDO]<br />
Here, variables Item1 through Item10 are renamed by prefixing <strong>the</strong>ir names with “Test.”. The variable previously<br />
named “Item1” is renamed “Test.Item1”, “Item2” is renamed “Test.Item2”, and so on. If <strong>the</strong> prefix plus <strong>the</strong> original<br />
name contains more than 16 characters, <strong>the</strong> entire prefix is used and characters are removed from <strong>the</strong> end of<br />
<strong>the</strong> original name until a 16-character name results.<br />
The format for RENAME within a DO loop is <strong>the</strong> following:<br />
1. RENAME<br />
2. a V(#J) usage. This identifies <strong>the</strong> variable <strong>to</strong> be renamed. It also provides its current name <strong>to</strong> <strong>the</strong><br />
mask.<br />
3. a mask in paren<strong>the</strong>ses which contains strings in quotes <strong>to</strong> be used exactly as entered. It also contains<br />
special characters such as <strong>the</strong> “&” which are used <strong>to</strong> select or omit letters from <strong>the</strong> input label and <strong>to</strong><br />
supply numbers using <strong>the</strong> DO loop scratch variable.<br />
4. a semicolon, ending <strong>the</strong> statement.<br />
This is an example of a simple mask:<br />
[ DO #j=21,35; RENAME V(#j) (XOOXX); ENDDO ]<br />
Here, a mask of (XOOXX) is supplied. The initial X says use <strong>the</strong> first input character, <strong>the</strong> OO says omit <strong>the</strong> next<br />
two characters, and <strong>the</strong> XX says use <strong>the</strong> next two (characters 4 and 5). This mask would rename VAR31 in<strong>to</strong><br />
V31.<br />
[ DO #n USING pre? ;<br />
RENAME V(#n) ( 'test' OOO & );<br />
ENDDO]<br />
In this loop, each name that starts with 'pre' is renamed. Each new name begins with 'test'. The first 3 characters<br />
of each old name, which are known <strong>to</strong> be 'pre', are bypassed (indicated by ooo), and <strong>the</strong> rest of <strong>the</strong> old name (indicated<br />
by &) is copied in<strong>to</strong> <strong>the</strong> new name area after 'test'. We are replacing 3 characters with 4. The & opera<strong>to</strong>r<br />
truncates if needed, so if a name started with 16 characters, it would get 'test' followed by characters 4 <strong>to</strong> 15. The<br />
OOO opera<strong>to</strong>r caused <strong>the</strong> & opera<strong>to</strong>r <strong>to</strong> start with character 4.<br />
When RENAME is used in a DO loop, <strong>the</strong> loop can have no o<strong>the</strong>r statements; ie.e, it can have only DO, RE-<br />
NAME and ENDDO.<br />
5.8 Masks for RENAME and GENERATE<br />
A mask is used <strong>to</strong> create a name for a variable, ei<strong>the</strong>r by modifying <strong>the</strong> ? or V(#J) name preceding it or by<br />
creating a <strong>to</strong>tally different name. The mask activity begins with a pointer on <strong>the</strong> initial character of <strong>the</strong> input name.<br />
The pointer is moved on<strong>to</strong> <strong>the</strong> next character after each usage of X, O, c or C. Fur<strong>the</strong>r use of X-O-c-C is ignored<br />
when <strong>the</strong> pointer is beyond <strong>the</strong> final input character.
5.10 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
__________________________________________________________________________<br />
Figure 5.6 Rename Examples<br />
FILE myfile<br />
Id Item1 Item2 Item3 Item4 Item5<br />
1 1 2 3 4 5<br />
LIST Myfile [ DO #j USING Item2 TO Item5; Select <strong>the</strong> 1st<br />
RENAME v(#j) ( XOOOX ) ; ENDDO ] $ and 5th characters<br />
Id Item1 I2 I3 I4 I5<br />
1 1 2 3 4 5<br />
LIST Myfile [ DO #J USING Item2 TO Item5; Provide a prefix<br />
RENAME V(#j) ( 'Q.' d ); ENDDO; ] $ and use <strong>the</strong> DO<br />
loop number (d)<br />
Id Item1 Q.3 Q.4 Q.5 Q.6<br />
1 1 2 3 4 5<br />
LIST Myfile [ DO #J USING Item2 TO Item5; Provide a prefix<br />
RENAME V(#j) ( 'Question.' n ); ENDDO; ] $ and use <strong>the</strong> DO<br />
loop counter (n)<br />
Question Question Question Question<br />
Id Item1 .1 .2 .3 .4<br />
1 1 2 3 4 5<br />
LIST Myfile [ DO #J USING Item2 TO Item5; Use <strong>the</strong> original<br />
RENAME V(#j) ( & '.' d ); ENDDO; ] $ name and <strong>the</strong> DO<br />
loop number.<br />
Item2 Item3 Item4 Item5<br />
Id Item1 .3 .4 .5 .6<br />
1 1 2 3 4 5<br />
LIST Myfile [ DO #J USING Item2 TO Item5; Use 'Q.', <strong>the</strong><br />
RENAME V(#j) ( 'Q.' & '.' n ); ENDDO; ] $ original name,<br />
'.' and <strong>the</strong> DO<br />
Q. Q. Q. Q. loop counter<br />
Item2 Item3 Item4 Item5<br />
Id Item1 .1 .2 .3 .4<br />
1 1 2 3 4 5<br />
__________________________________________________________________________
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.11<br />
1. x or X takes <strong>the</strong> current input character, if usable.<br />
2. o or O omits <strong>the</strong> current input character. NOTE: <strong>the</strong> digit '0' is also usable.<br />
3. c takes <strong>the</strong> current input character, if usable, and, if it is a letter, puts it in<strong>to</strong> lower case. C takes <strong>the</strong><br />
current input character, if usable, and, if it is a letter, puts it in<strong>to</strong> upper case.<br />
4. & takes all remaining usable input characters that can fit, starting at <strong>the</strong> current location of <strong>the</strong><br />
pointer.<br />
5. @4 places <strong>the</strong> pointer on<strong>to</strong> <strong>the</strong> 4th input character. (@4 xxx) and (ooo xxx) are identical.<br />
6. @-5 places <strong>the</strong> pointer on <strong>the</strong> 5th character from <strong>the</strong> right hand end.<br />
A character that has been <strong>the</strong> subject of any of <strong>the</strong> X-O-c-C-& opera<strong>to</strong>rs is no longer usable by any subsequent<br />
opera<strong>to</strong>r. As a result, ( @-4 OOOO @1 & 'post' ) could be used <strong>to</strong> inactivate <strong>the</strong> rightmost 4 characters, take <strong>the</strong><br />
rest, and add 'post' <strong>to</strong> it. In o<strong>the</strong>r words, <strong>the</strong> mask has replaced an existing 4-character suffix with a new one.<br />
Blanks are ignored in masks, which can markedly improve readability. For example:<br />
(XXXXXXXXX) and<br />
(XXX XXX XXX) are identical,<br />
O<strong>the</strong>r features of <strong>the</strong> mask are:<br />
1. 'ab.cde' moves <strong>the</strong> string contents in<strong>to</strong> <strong>the</strong> new name.<br />
2. D or d inserts <strong>the</strong> V subscript. This is based on <strong>the</strong> current value of <strong>the</strong> DO scratch variable. If<br />
V(#j) is used, 17 is inserted when #j=17. If V(#j+10) is used, 27 is inserted when #j=17. DD is like<br />
D, but forces 2 characters; 07 is used instead of 7. If DDD is used, three numbers are inserted in <strong>the</strong><br />
new name and a 7 bonds <strong>to</strong> <strong>the</strong> new label as 007.<br />
3. N or n inserts <strong>the</strong> current iteration count of <strong>the</strong> DO loop. If this is <strong>the</strong> third trip through <strong>the</strong> loop, 3<br />
is inserted. NN provides 2 digits, NNN 3 digits. You do not have <strong>to</strong> use a counter scratch variable<br />
in <strong>the</strong> DO statement in order <strong>to</strong> use 'N'.<br />
The following are some examples of a name, a mask and <strong>the</strong> resulting labels<br />
Suppose we have:<br />
current name mask result<br />
abcdefg (xx @-4 xoxx ) abdfg<br />
abcdeF (Cccc @-4 cccc ) Abcdef<br />
abc12345def ('Var' @4 xxxxx) Var12345<br />
abc12345def ('Var' @4 & ) Var12345def<br />
abcd (@-6 xxxxxx ) abcd<br />
[ DO #j=11,13; RENAME v(#j) (a mask); ENDDO ]<br />
When #j is 12, meaning its <strong>the</strong> second iteration, and <strong>the</strong> name of v(12) is “ abcdef”, <strong>the</strong> masks behave as follows:<br />
abcdef (“item.” NN ) item.02<br />
abcdef ('PreTest.' DDD) PreTest.012<br />
abcdef ( xxx '.' D ) abc.12<br />
Figure 5.6 contains 5 examples of RENAME masks and illustrates <strong>the</strong> use of 'X' and 'O', text strings, <strong>the</strong> original<br />
variable name and both of <strong>the</strong> DO scratch variable.<br />
Figure 5.7 illustrates <strong>the</strong> difference between <strong>the</strong> use of ? and V(#scratch) in <strong>the</strong> DO LOOP GENERATE. File<br />
work has four variables. The ? uses <strong>the</strong> generated labels as <strong>the</strong> labels on which <strong>to</strong> base any changes. In <strong>the</strong> first<br />
example in Figure 5.7 <strong>the</strong>se labels are VAR5 through VAR8. When a scratch variable is used, <strong>the</strong> labels provided<br />
<strong>to</strong> <strong>the</strong> mask are <strong>the</strong> current DO loop variables. In <strong>the</strong> second example in Figure 5.7 <strong>the</strong>se variables are V1 through
5.12 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
V4. The only reason for ever using <strong>the</strong> V vec<strong>to</strong>r and a scratch variable in a RENAME or GENERATE is <strong>to</strong> use<br />
some of <strong>the</strong> characters found in <strong>the</strong> original variable name.<br />
__________________________________________________________________________<br />
Figure 5.7 GENERATE: Generated Versus Original<br />
LIST work [ DO #j = 1, 4;<br />
GEN ? ( 'New.' & );<br />
ENDDO ] $<br />
New New New New<br />
V1 V2 V3 V4 VAR5 VAR6 VAR7 VAR8<br />
1 2 3 4 - - - -<br />
LIST work [ DO #j = 1, 4;<br />
GEN v(#j) ( 'New.' & );<br />
ENDDO ] $<br />
New New New New<br />
V1 V2 V3 V4 V1 V2 V3 V4<br />
1 2 3 4 - - - -<br />
__________________________________________________________________________<br />
DO #J #N USING Pre?;<br />
SET PRE?(#N) = ..... ;<br />
ENDDO;<br />
If <strong>the</strong>re are 5 variables beginning with “pre”, <strong>the</strong> loop will be exercised 5 times and <strong>the</strong> scratch variable #N will<br />
take on <strong>the</strong> values 1, 2, 3, 4, and 5. Here, using V(#J) is <strong>the</strong> same as using PRE?(#N).<br />
__________________________________________________________________________<br />
Figure 5.8 Dynamic Array, Wildcard, Prefix and GENERATE<br />
LIST Tests<br />
[ DO #P #N USING pre?;<br />
GEN ? ( 'Diff.' n ) = post?(#N) - pre?(#N) ;<br />
ENDDO ] $<br />
pre post pre post pre post Diff Diff Diff<br />
.1 .1 .2 .2 .3 .3 .1 .2 .3<br />
68 75 92 94 89 88 7 2 -1<br />
73 73 84 93 85 89 0 9 4<br />
78 79 72 80 73 75 1 8 2<br />
__________________________________________________________________________<br />
When variables have a common prefix, <strong>the</strong> combination of <strong>the</strong> DO and a dynamic vec<strong>to</strong>r created using a wildcard<br />
can be a powerful <strong>to</strong>ol. A dynamic vec<strong>to</strong>r is created any time that a wildcard is used in a variable name list.
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.13<br />
Age Sex Pre.1 Post.1 Pre.2 Post.2 Pre.3 Post.3 Aptitude<br />
Given <strong>the</strong>se variable names, <strong>the</strong> use of<br />
Pre?<br />
creates a dynamic vec<strong>to</strong>r containing <strong>the</strong> three variables in <strong>the</strong> list which begin with <strong>the</strong> characters “pre”. These<br />
variables can now be referenced in <strong>the</strong> same way that <strong>the</strong> variables in <strong>the</strong> V vec<strong>to</strong>r are referenced:<br />
__________________________________________________________________________<br />
Figure 5.9 Complex MASK: Generate Variable Names<br />
Given a file with variables “Variable01” “Variable02” and “Variable03”<br />
DO #j = 2, 3;<br />
GEN v(#j)<br />
( C D @3c 'mMm' @-4C NN o & 'zz' dd )= v(#j);<br />
ENDDO;<br />
The variable name created from input variable “Variable02” is:<br />
V2rmMmL0102zz02<br />
The mask C produces V upper-case letter from input<br />
D produces 2 <strong>the</strong> value of #J in <strong>the</strong> DO loop<br />
@3c produces r 3rd character of input in lower case<br />
'mMm' add mMm strings enclosed in quotes are added<br />
@-4Ce produces L 4th character from <strong>the</strong> endof <strong>the</strong><br />
input variable name, upper case<br />
NN adds 01 iteration count in <strong>the</strong> DO <strong>to</strong> 2 places<br />
o omit <strong>the</strong> next letter in input variable name: skip <strong>the</strong> “e”<br />
& include 02 use <strong>the</strong> rest of <strong>the</strong> input variable name<br />
'zz' adds zz mask can have multiple strings<br />
dd adds 02 same as DD <strong>the</strong> value of #J <strong>to</strong> 2 places<br />
Spaces are used in masks <strong>to</strong> make <strong>the</strong>m easier <strong>to</strong> follow. They are not<br />
required:<br />
(CD@3c'mMm'@-4CNNo&'zz'dd)<br />
The characters in <strong>the</strong> new name associated with scratch variable #J are;<br />
D NN dd<br />
V 2 rmMmL 01 02zz 02<br />
The characters in <strong>the</strong> new name taken from <strong>the</strong> original name are:<br />
C @3c @-4C o&<br />
V2 r mMm L 01 02 zz02<br />
Character strings added <strong>to</strong> <strong>the</strong> new name<br />
V2r mMm L0102 zz 02<br />
___________________________________________________________________________
5.14 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
Figure 5.8 contains <strong>the</strong> command and <strong>the</strong> resulting output file for a DO/GENERATE using dynamic arrays.<br />
The mask for <strong>the</strong> variable names ( 'Diff' n ) asks for names beginning with <strong>the</strong> string “Diff.” followed by <strong>the</strong> DO<br />
loop counter “n”<br />
Figure 5.9 illustrates a complex rename which uses all <strong>the</strong> possible mask codes. The variable named<br />
“Variable02” in <strong>the</strong> input file is used in <strong>the</strong> creation of a new variable “V2rl02mmm01zz02'. The mask used<br />
in this example is: “(CD@3c@-4co&’mmm’NN’zz’dd )”<br />
5.9 IF-THEN-ELSE BLOCKS<br />
Figure 5.10 contains an example of a series of IF/SET statements contrasted with a complex RECODE. These<br />
same IF/SET statements could have been written as an IF-THEN-ELSE block.<br />
The IF-THEN-ELSE block makes <strong>the</strong> logic easier <strong>to</strong> follow when <strong>the</strong>re is more than a single condition. The IF-<br />
THEN-ELSE block has additional advantages because <strong>the</strong>re can be any number of <strong>PPL</strong> statements and actions in<br />
<strong>the</strong> block, including nested IF-THEN-ELSE blocks and DO loops.<br />
__________________________________________________________________________<br />
Figure 5.10 IF or IF-THEN-ELSE<br />
MODIFY aaa<br />
[ GENERATE sec<strong>to</strong>r = .m1. ;<br />
IF age good and region good, SET sec<strong>to</strong>r = .m2.;<br />
IF age lt 30 and region EQ 'east' SET sec<strong>to</strong>r = 1;<br />
IF age lt 30 and region EQ 'central' SET sec<strong>to</strong>r = 2;<br />
IF age lt 30 and region EQ 'west' SET sec<strong>to</strong>r = 3;<br />
IF age ge 30 and region EQ 'east' SET sec<strong>to</strong>r = 4;<br />
IF age ge 30 and region EQ 'central' SET sec<strong>to</strong>r = 5;<br />
IF age ge 30 and region EQ 'west' SET sec<strong>to</strong>r = 6;<br />
], OUT bbb $<br />
MODIFY aaa<br />
[ GENERATE sec<strong>to</strong>r = .m1. ;<br />
IF age good and region good, SET sec<strong>to</strong>r = .m2.;<br />
IF age lt 30 THEN;<br />
IF region EQ 'east' SET sec<strong>to</strong>r = 1;<br />
IF region EQ 'central' SET sec<strong>to</strong>r = 2;<br />
IF region EQ 'west' SET sec<strong>to</strong>r = 3;<br />
F.ELSE;<br />
IF region EQ 'east' SET sec<strong>to</strong>r = 4;<br />
IF region EQ 'central' SET sec<strong>to</strong>r = 5;<br />
IF region EQ 'west' SET sec<strong>to</strong>r = 6;<br />
ENDIF ], out bbb $<br />
__________________________________________________________________________<br />
IF-THEN-ELSE-ENDIF blocks can be nested 9 deep. They can occur within a DO loop, as long as <strong>the</strong> block<br />
is ENTIRELY within <strong>the</strong> DO loop. The block begins with an IF statement. The IF statement begins with IF. It<br />
can also have OR and AND. A THEN ends <strong>the</strong> statement. The THEN, just like a consequence in a simple IF<br />
statement can be preceded with FMT qualification, like
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.15<br />
For example:<br />
M.THEN<br />
LIST X [ GEN Newvar = 99;<br />
IF Age GE 21 OR age EQ 19,<br />
THEN;<br />
SET oldvar = 22 ;<br />
SET abcdef = 33 ;<br />
ELSE;<br />
SET xyz = 44;<br />
ENDIF ] $<br />
The ELSE section is executed whenever <strong>the</strong> section before <strong>the</strong> ELSE is not executed. I.e., given THEN, <strong>the</strong><br />
statements before ELSE are executed when <strong>the</strong> IF is true, and <strong>the</strong> statements after <strong>the</strong> ELSE are executed when<br />
<strong>the</strong> IF is false or missing. Using FM.THEN would reverse this and <strong>the</strong> statements following <strong>the</strong> FM.THEN will<br />
be executed when <strong>the</strong> result of <strong>the</strong> IF was ei<strong>the</strong>r false or missing while <strong>the</strong> statements following <strong>the</strong> ELSE will be<br />
executed whenever <strong>the</strong> result of <strong>the</strong> IF is true. In o<strong>the</strong>r words <strong>the</strong> previous is example is exactly <strong>the</strong> same as:<br />
LIST X [ GEN Newvar = 99;<br />
IF Age GE 21 OR age EQ 19,<br />
FM.THEN;<br />
SET xyz = 44;<br />
ELSE;<br />
SET oldvar = 22 ;<br />
SET abcdef = 33 ;<br />
ENDIF ] $<br />
5.10 IF-THEN-ELSE: O<strong>the</strong>r Features.<br />
F.ELSE and M.ELSE sections can be used <strong>to</strong> provide greater control. These allow a true 3-way logic in <strong>the</strong><br />
IF-THEN blocks. T.ELSE is also allowed; however it is only useful when M.THEN or F.THEN begins <strong>the</strong> block.<br />
Alternate names TELSE, FELSE and MELSE are recognized.<br />
There are some restrictions. GENERATE cannot be used within an IF block. The ELSE section, if used, must<br />
be <strong>the</strong> last section.<br />
Figure 5.11 illustrates <strong>the</strong> use of an IF-THEN block with F.ELSE and M.ELSE. The example illustrates a<br />
way of estimating a missing value from <strong>the</strong> mean of previous values. This is sometimes referred <strong>to</strong> as a hot deck<br />
approach. The results change with <strong>the</strong> data. For purposes of this example, it was decided <strong>to</strong> use <strong>the</strong> average of<br />
<strong>the</strong> previous 10 non-missing values as <strong>the</strong> substitute value. These values are s<strong>to</strong>red in <strong>the</strong> first 10 locations of <strong>the</strong><br />
P vec<strong>to</strong>r.<br />
Because we have chosen <strong>to</strong> use previous values, <strong>the</strong>re is a problem if any of <strong>the</strong> first ten cases has a missing<br />
value. Once ten non-missing cases have been read, <strong>the</strong> problem disappears. In this example we have decided <strong>to</strong><br />
use whatever information is available. Given <strong>the</strong> following data values for variable Age:<br />
33 9 15 20 73 - 44 23 18 54 62 29 - 50 82 19 - 29 39<br />
we use <strong>the</strong> 5 values prece<br />
ding <strong>the</strong> first missing value <strong>to</strong> produce a result value of 35. If <strong>the</strong>re were no good values available, <strong>the</strong> result would<br />
be set <strong>to</strong> missing type 3.<br />
When <strong>the</strong> first case in <strong>the</strong> file is processed we set <strong>the</strong> 10 locations in <strong>the</strong> permanent vec<strong>to</strong>r that we are going<br />
<strong>to</strong> use <strong>to</strong> -1. This permits us <strong>to</strong> test for positive values when we calculate <strong>the</strong> substitute value.<br />
MODIFY Ages [ IF FIRST ( .FILE. )<br />
THEN;<br />
DO #j = 1, 10 ;
5.16 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
SET P(#j) = -1;<br />
ENDDO;<br />
ENDIF;<br />
The 3-way logic is determined in <strong>the</strong> Age LT 10 test:<br />
1. True if Age is non-missing and less than 10. This is considered an error and a simple<br />
report with <strong>the</strong> case number is written.<br />
IF Age LT 10<br />
THEN;<br />
PUT .N. >;<br />
GO TO NEXT;<br />
2. False if Age is non-missing and greater or equal <strong>to</strong> 10. The next location in <strong>the</strong> P vec<strong>to</strong>r is calculated<br />
and that value is replaced by <strong>the</strong> current good value on Age. Thus <strong>the</strong> contents of <strong>the</strong> P vec<strong>to</strong>r continually<br />
change as good values are processed. When 10 good values have been found, <strong>the</strong>re are no<br />
longer any negative numbers (-1) remaining.<br />
F.ELSE;<br />
IF ##Ploc EQ 10, SET ##Ploc = 0;<br />
__________________________________________________________________________<br />
Figure 5.11 IF-THEN with F.ELSE and M.ELSE in a Simple Hot Deck Example<br />
GEN ##Ploc = 10, GEN ##Total = 0, GEN ##N=0 $<br />
MODIFY Ages [ IF FIRST ( .FILE. )<br />
THEN;<br />
DO #j = 1, 10 ;<br />
SET P(#j) = -1;<br />
ENDDO;<br />
ENDIF;<br />
IF Age LT 10<br />
THEN;<br />
PUT .N. >;<br />
GO TO NEXT;<br />
F.ELSE;<br />
IF ##Ploc EQ 10, SET ##Ploc = 0;<br />
INCREASE ##Ploc;<br />
SET P(##Ploc) = Age;<br />
GO TO NEXT;<br />
M.ELSE;<br />
SET ##Total = 0, SET ##N = 0;<br />
DO #J = 1, 10;<br />
IF ( P(#j) LT 0 ) EXITDO;<br />
/* increase count of good P values */<br />
INCREASE ##N;
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.17<br />
/* increase <strong>to</strong>tals of good P values */<br />
INCREASE ##Total BY P(#j);<br />
ENDDO;<br />
IF ##N GT 0 THEN;<br />
SET Age = INT ( ##Total / ##N ) ;<br />
PUT .N. > Age<br />
>;<br />
ELSE;<br />
SET Age = .M3.;<br />
ENDIF;<br />
ENDIF;<br />
NEXT: ], OUT Ages $<br />
__________________________________________________________________________<br />
INCREASE ##Ploc;<br />
SET P(##Ploc) = Age;<br />
GO TO NEXT;<br />
3. Missing if Age is unknown. The current contents of <strong>the</strong> P vec<strong>to</strong>r are <strong>to</strong>taled and <strong>the</strong> average is calculated.<br />
This average is based on <strong>the</strong> number of good values currently available in <strong>the</strong> P vec<strong>to</strong>r. This<br />
average is substituted for Age in <strong>the</strong> file and a report is printed giving <strong>the</strong> case number and <strong>the</strong> new<br />
value.<br />
M.ELSE;<br />
SET ##Total = 0, SET ##N = 0;<br />
A DO loop is used <strong>to</strong> examine <strong>the</strong> 10 values currently in <strong>the</strong> P vec<strong>to</strong>r. If a negative number is found<br />
<strong>the</strong> P vec<strong>to</strong>r is not yet s<strong>to</strong>cked with <strong>the</strong> full complement of 10 values and we can exit from <strong>the</strong> DO<br />
loop with ##N set <strong>to</strong> <strong>the</strong> current number of good values and ## Total <strong>to</strong> <strong>the</strong> sum of those values.<br />
DO #J = 1, 10;<br />
IF ( P(#j) LT 0 ) EXITDO;<br />
INCREASE ##N;<br />
INCREASE ##Total BY P(#j);<br />
ENDDO;<br />
If <strong>the</strong>re is at least 1 good P value we can now calculate an average, set Age <strong>to</strong> that value and write<br />
<strong>the</strong> appropriate information in <strong>the</strong> report.<br />
IF ##N GT 0 THEN;<br />
SET Age = INT ( ##Total / ##N ) ;<br />
PUT .N. > Age<br />
>;<br />
If <strong>the</strong>re have been no good values as this case is processed, it is set <strong>to</strong> missing 3.<br />
ELSE;<br />
SET Age = .M3.;<br />
Given <strong>the</strong> following 19 values for variable Age:<br />
33 9 15 20 73 - 44 23 18 54 62 29 - 50 82 19 - 29 39<br />
<strong>the</strong> output file contains:<br />
33 9 15 20 73 35 44 23 18 54 62 29 37 50 82 19 45 29 39
5.18 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
And <strong>the</strong> report is:<br />
Case 2 is <strong>to</strong>o young.<br />
Case 6 given 35 as <strong>the</strong> current hot deck value of Age.<br />
Case 13 given 37 as <strong>the</strong> current hot deck value of Age.<br />
Case 17 given 45 as <strong>the</strong> current hot deck value of Age.<br />
In both Figures 5.11 and 5.13, <strong>the</strong> scratch variables that are needed are generated before <strong>the</strong> command<br />
in which <strong>the</strong>y are used. If all <strong>the</strong> scratch variables are predefined, <strong>the</strong>re is no need <strong>to</strong> worry about <strong>the</strong> restrictions<br />
on generating variables in ei<strong>the</strong>r an IF-THEN-ELSE block or a DO.<br />
5.11 IF-THEN-ELSE: Ano<strong>the</strong>r Example<br />
Figure 5.12 contains a small data set and <strong>the</strong> resulting report. Figure 5.13 contains <strong>the</strong> commands which produced<br />
<strong>the</strong> report. The commands contain IF-THEN-ELSE blocks within IF-THEN-ELSE blocks as well as a DO loop<br />
inside <strong>the</strong> blocks. The data set mimics a survey in which <strong>the</strong> respondents were asked about <strong>the</strong>ir computer hardware<br />
and software. The software questions Appl.1 through Appl.3 coded 1 for an edi<strong>to</strong>r or report writer, 2 for a<br />
database and 3 for an analysis or statistics program. The character variables Wappl.1, Wappl.2 and Wappl.3 are<br />
character variables containing <strong>the</strong> name of <strong>the</strong> program associated with <strong>the</strong> usage in <strong>the</strong> Appl? questions.<br />
__________________________________________________________________________<br />
Figure 5.12 IF-THEN-ELSE: The Data and <strong>the</strong> Report<br />
File Compute<br />
Appl Wappl Appl Wappl Appl Wappl<br />
OS Chip .1 .1 .2 .2 .3 .3<br />
DOS 386 1 Word Perfect 2 Dbase III -<br />
MVS 386 2 Excel 3 P-<strong>STAT</strong> 1 Kedit<br />
Unix Spark 1 P-<strong>STAT</strong> 2 Informix 3 P-<strong>STAT</strong><br />
The Report<br />
error on case 2<br />
PC users 1<br />
Unix users 1<br />
P-<strong>STAT</strong> users 1<br />
P-<strong>STAT</strong> usages 2<br />
__________________________________________________________________________<br />
The purpose of this possibly daunting example is <strong>to</strong> show <strong>the</strong> generality of use of IF-THEN-ELSE blocks and<br />
DO loops. When we say 'if #Puse EQ 1' within <strong>the</strong> DO loop we have:<br />
1. a simple IF<br />
2. within an IF-THEN block (it has no ELSE)<br />
3. within a DO loop<br />
4. within an IF-THEN-ELSE block<br />
5. within an IF-THEN-ELSE block.<br />
The report contains a counter of <strong>the</strong> number of people using a PC operating system or a Unix operating system,<br />
using P-<strong>STAT</strong> for any single purpose and a count of <strong>the</strong> <strong>to</strong>tal times that P-<strong>STAT</strong> was cited. Before any <strong>to</strong>tals
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.19<br />
are accumulated, <strong>the</strong> data are check for validity and if <strong>the</strong> answers seem inappropriate <strong>the</strong> information in <strong>the</strong> case<br />
is not used in <strong>the</strong> report.<br />
The first step in generating <strong>the</strong> report is <strong>to</strong> set up a series of scratch variables. This is done as stand-alone <strong>PPL</strong>.<br />
GEN ##Error = 0, GEN ##Numuse = 0, GEN ##PCuse = 0,<br />
GEN ##Users = 0, GEN ##Aps = 0 $<br />
It could instead be included at <strong>the</strong> beginning of <strong>the</strong> PROCESS command:<br />
PROCESS Computers [ IF FIRST ( .FILE. ),GEN ##Error = 0,<br />
GEN ##Numuse = 0, GEN ##PCuse = 0, GEN ##Users = 0, GEN ##Aps = 0;<br />
After generating a single temporary scratch variable, <strong>the</strong> first step in <strong>the</strong> PROCESS command in Figure 5.13<br />
is <strong>to</strong> set up <strong>the</strong> major IF-THEN-ELSE block.<br />
IF OS MATCHES ' ( DOS | Windows | NT | OS/2 ) * '<br />
THEN;<br />
The MATCHES function is described in detail in <strong>the</strong> chapter “<strong>PPL</strong>: Modification of Character Variables”. Here<br />
it is used <strong>to</strong> see if <strong>the</strong> operating system is any of <strong>the</strong> common operating systems for Intel Chip machines. If <strong>the</strong> IF<br />
is true, a second IF-THEN-ELSE block is used <strong>to</strong> see if <strong>the</strong> computer chip is one of <strong>the</strong> Intel chips:<br />
__________________________________________________________________________<br />
Figure 5.13 IF-THEN-ELSE Block with Nested IF and a DO Loop<br />
GEN ##Error = 0, GEN ##Numuse = 0, GEN ##PCuse = 0,<br />
GEN ##Users = 0, GEN ##Aps = 0 $<br />
PROCESS Compute<br />
[ GEN #Puse = 0; IF OS MISSING OR Chip MISSING GOTO Err;<br />
IF OS MATCHES ' ( DOS | Windows | NT | OS/2 ) * '<br />
THEN;<br />
if Chip AMONG ( '286' '386' '486' 'Pentium' )<br />
<strong>the</strong>n;<br />
INCREASE ##PCuse;<br />
ELSE;<br />
else;<br />
PUT Chip > OS<br />
> .n. ;<br />
SET ##Error = 1;<br />
GO TO Err;<br />
endif;<br />
if OS NE 'UNIX' THEN;<br />
SET ##Error = 2;<br />
GO TO Err;<br />
else;<br />
INCREASE ##Users;<br />
DO #AP #N USING Appl?;<br />
IF V(#AP) MISSING, NEXTDO;
5.20 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
IF Wappl?(#N) AMONG ( 'P-<strong>STAT</strong>' 'P<strong>STAT</strong>' ) THEN;<br />
INCREASE ##Aps, INCREASE #Puse;<br />
IF #Puse EQ 1, INCREASE ##Numuse;<br />
ENDIF;<br />
ENDDO;<br />
endif;<br />
ENDIF;<br />
GO TO Report;<br />
Err: PUT @SKIP2 .n. @NEXT ; GO TO Next;<br />
Report: IF LAST ( .FILE. )<br />
PUT @20 ##PCuse @next<br />
> @20 ##Users @next<br />
> @20 ##Numuse @next<br />
> @20 #Puse;<br />
Next: ] $<br />
__________________________________________________________________________<br />
if Chip AMONG ( '286' '386' '486' 'Pentium' )<br />
<strong>the</strong>n;<br />
INCREASE ##PCuse;<br />
If <strong>the</strong> replies <strong>to</strong> <strong>the</strong> questions about <strong>the</strong> operating system and <strong>the</strong> chip agree, <strong>the</strong> scratch variable ##PCuse is increased.<br />
If it is false <strong>the</strong>re is a possible error indicated by <strong>the</strong> PUT statement and a branch <strong>to</strong> <strong>the</strong> statement labelled<br />
“Err”.<br />
else;<br />
PUT Chip > OS<br />
> .n. ;<br />
SET ##Error = 1;<br />
GO TO Err;<br />
endif;<br />
The endif completes <strong>the</strong> nested IF-THEN-ELSE block and also <strong>the</strong> THEN portion of <strong>the</strong> major block.<br />
If <strong>the</strong> first IF-THEN is false and we have a computer that appears <strong>to</strong> be running an operating system o<strong>the</strong>r<br />
than <strong>the</strong> standard PC operating systems we will now process <strong>the</strong> “ELSE”.<br />
ELSE;<br />
if OS NE 'UNIX' THEN;<br />
SET ##Error = 2;<br />
GO TO Err;<br />
This starts a nested IF-THEN-ELSE block <strong>to</strong> eliminate and print an error report for any respondents who, like<br />
<strong>the</strong> second case in <strong>the</strong> data in Figure 5.12, are not using <strong>the</strong> UNIX operating system. The final section of <strong>the</strong> command<br />
is used <strong>to</strong> examine <strong>the</strong> applications for all cases like <strong>the</strong> third case in Figure 5.12 who are running UNIX.<br />
INCREASE ##Users;<br />
DO #AP #N USING Appl?;<br />
IF V(#AP) MISSING, NEXTDO;<br />
IF Wappl?(#N) AMONG ( 'P-<strong>STAT</strong>' 'P<strong>STAT</strong>' ) THEN;<br />
INCREASE ##Aps, INCREASE #Puse;<br />
IF #Puse EQ 1, INCREASE ##Numuse;<br />
ENDIF;<br />
ENDDO;
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.21<br />
The scratch variable ##Users, our counter of UNIX users is immediately incremented. Next a DO is used <strong>to</strong> examine<br />
<strong>the</strong> list of applications and increment <strong>the</strong> remaining counters that are needed for <strong>the</strong> report.<br />
DO #AP #N USING Appl?;<br />
Two scratch variables are created by <strong>the</strong> DO. #AP takes on <strong>the</strong> values of <strong>the</strong> positions of any variable beginning<br />
with <strong>the</strong> characters “Appl”. Thus <strong>the</strong> first time through <strong>the</strong> loop #AP = 3. The second time #AP = 5. The final<br />
time #AP = 7.<br />
IF V(#AP) MISSING, NEXTDO;<br />
If <strong>the</strong> value of <strong>the</strong> application variable is missing, <strong>the</strong> remaining steps in <strong>the</strong> DO are bypassed. If <strong>the</strong>re are no more<br />
loop iterations <strong>to</strong> be done control moves past <strong>the</strong> ENDDO statement. If this is not missing, <strong>the</strong> next test is done<br />
<strong>to</strong> determine if P-<strong>STAT</strong> is <strong>the</strong> name given for <strong>the</strong> application:<br />
IF Wappl?(#N) AMONG ( 'P-<strong>STAT</strong>' 'P<strong>STAT</strong>' ) THEN;<br />
#N in <strong>the</strong> DO loop takes on <strong>the</strong> values 1, 2, and 3 as <strong>the</strong> loop progresses. The use of <strong>the</strong> wildcard <strong>to</strong> set up a dynamic<br />
vec<strong>to</strong>r results in Wappl?(#N) tests variable Wappl.1 when #N is a 1, and Wappl.2 when #N is a 2 and<br />
Wappl.3 when #N is a 3. The third case in <strong>the</strong> file contains:<br />
Unix Spark 1 P-<strong>STAT</strong> 2 Informix 3 P-<strong>STAT</strong><br />
The first time through <strong>the</strong> loop Wappl.1 has <strong>the</strong> value “P-<strong>STAT</strong>', <strong>the</strong>refore, <strong>the</strong> IF is true and <strong>the</strong> rest of <strong>the</strong> four<br />
line IF-ENDIF is executed.<br />
We increase <strong>the</strong> permanent scratch variable ##Aps, which is used for a <strong>to</strong>tal of all P-<strong>STAT</strong> applications, and also<br />
increase #Puse, a temporary scratch variable that is reset <strong>to</strong> 0 as each case starts. Thus when “P-<strong>STAT</strong> is found<br />
again in <strong>the</strong> third loop #Puse becomes 2 and we do not increase ##Numuse a second time for <strong>the</strong> same case.<br />
The work is now all done. It is only necessary <strong>to</strong> end <strong>the</strong> open IF blocks and write out <strong>the</strong> error messages and<br />
<strong>the</strong> reports:<br />
endif;<br />
ENDIF;<br />
GO TO Report;<br />
Err: PUT @SKIP2 .n. @NEXT ; GO TO Next;<br />
Report: IF LAST ( .FILE. )<br />
PUT @20 ##PCuse @next<br />
> @20 ##Users @next<br />
> @20 ##Numuse @next<br />
> @20 #Puse;<br />
Next: ] $
5.22 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
DO LOOPS and IF-THEN-ELSE BLOCKS<br />
DO #J USING vn TO vn;<br />
SUMMARY<br />
specifies a scratch variable and <strong>the</strong>n, after <strong>the</strong> USING, a list of variable names or positions. The list can<br />
also use TO, .ON. and wildcards like PRE? . The remaining <strong>PPL</strong> in <strong>the</strong> loop is executed once for each<br />
variable in <strong>the</strong> list. The scratch variable is set <strong>to</strong> <strong>the</strong> LOCATION of <strong>the</strong> current variable; <strong>the</strong>refore, <strong>the</strong><br />
scratch variable is different in each iteration.<br />
Thus, #L, in <strong>the</strong> loop below, is set <strong>to</strong> <strong>the</strong> location (not <strong>the</strong> value) of Test1 in <strong>the</strong> first iteration, and <strong>to</strong> <strong>the</strong><br />
location of Test10 in <strong>the</strong> last iteration.<br />
[ DO #L USING Test1 TO Test10 );<br />
SET V(#L) = SQRT ( V(#L) ); ENDDO ]<br />
[ DO #J USING v(1) Height v(5) TO v(7) );<br />
INCREASE V(#J); ENDDO ]<br />
The variables in <strong>the</strong> DO list can be tested <strong>to</strong> ensure that numeric operations are not preformed on character<br />
variables or visa-versa. The opera<strong>to</strong>rs CHARACTER, NUMERIC, MISSING and GOOD can be<br />
used.<br />
[ DO #QQ USING SS.Number <strong>to</strong> ZIP;<br />
IF V(#QQ) NUMERIC, NEXTDO;<br />
SET V(#QQ) = LEFT ( V(#QQ); ENDDO ]<br />
[ DO #I USING V(1) TO V(25) V(28);<br />
IF V(#I) CHARACTER OR V(#I) GOOD, NEXTO;<br />
SET V(#I) = .M3.; ENDDO ]<br />
There is no limit <strong>to</strong> <strong>the</strong> number of <strong>PPL</strong> instructions that may be included in a DO loop. DO's may include<br />
o<strong>the</strong>r DO loops and IF-THEN-ELSE blocks.<br />
DO #J = nn, nn, nn;<br />
EXITDO<br />
specifies a scratch variable and <strong>the</strong>n, after <strong>the</strong> '=', a start expression, an end expression and an optional<br />
stepsize expression. The scratch variable takes <strong>the</strong> values of <strong>the</strong> numbers from <strong>the</strong> start value through<br />
<strong>the</strong> end value as incremented by <strong>the</strong> stepsize. If <strong>the</strong> stepsize is not supplied, 1 is assumed. The scratch<br />
variable is usable in <strong>the</strong> <strong>PPL</strong> within <strong>the</strong> loop.<br />
[ DO #Vars = 1, 3 ;<br />
SET V(#Vars) = V(#Vars) / 12 ; ENDDO ]<br />
In this example, “#Vars” is <strong>the</strong> user-supplied scratch variable. It is used in <strong>the</strong> SET instruction as <strong>the</strong><br />
subscript of V, <strong>the</strong> vec<strong>to</strong>r of variables in <strong>the</strong> file. Each of <strong>the</strong> first 3 variables in <strong>the</strong> file has its value<br />
divided by 12.<br />
causes <strong>the</strong> DO loop <strong>to</strong> be exited immediately even if all <strong>the</strong> loop instructions have not been completed.<br />
vn=variable name nn=number exp=expression
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.23<br />
NEXTDO<br />
ENDDO<br />
causes a jump <strong>to</strong> <strong>the</strong> ENDDO statement where <strong>the</strong> DO loop counter is evaluated. If <strong>the</strong>re are no more<br />
iterations, <strong>the</strong> loop terminates.<br />
Defines <strong>the</strong> end of <strong>the</strong> DO loop domain. Then ENDDO is processed, <strong>the</strong> current DO value is evaluated.<br />
If <strong>the</strong> loop is not complete, <strong>the</strong> counters are incremented and <strong>the</strong> commands in <strong>the</strong> DO domain are executed<br />
with <strong>the</strong> new values.<br />
GENERATE Within a DO Loop<br />
A new variable can be generated in each iteration of a DO-GENERATE loop. When GENERATE is used<br />
in a DO loop, <strong>the</strong> loop can have no o<strong>the</strong>r statements; i.e., it can have only DO, GENERATE and ENDDO.<br />
The format of a GENERATE within a DO loop is one of <strong>the</strong> following:<br />
GENERATE ? = value;<br />
GENERATE ? (mask) = value;<br />
GENERATE V(#J) (mask) = value;<br />
GENERATE V(##K) (mask) = value;<br />
If <strong>the</strong> variables are character, <strong>the</strong> :C or :C20 or such directly follows <strong>the</strong> mask or, if <strong>the</strong>re is no mask, <strong>the</strong><br />
“?”. Masks are described below. The “= value” is optional; if not supplied, <strong>the</strong> variable is set <strong>to</strong> missing.<br />
If <strong>the</strong> file currently has 20 variables, <strong>the</strong> ? causes <strong>the</strong> name of VAR21 <strong>to</strong> be created. It can <strong>the</strong>n be<br />
masked. Use of V(#J) must be followed by a mask since that name already exists.<br />
RENAME Within a DO Loop<br />
A group of variables can be renamed in a DO loop. Each iteration renames a different variable. When<br />
RENAME is used in a DO loop, <strong>the</strong> loop can have no o<strong>the</strong>r statements; i.e., it can have only DO, RE-<br />
NAME and ENDDO.<br />
[ RENAME Social.S.Num TO SS.Number ]<br />
The format for RENAME within a DO loop is <strong>the</strong> following:<br />
1. RENAME<br />
2. a V(#J) usage. This identifies <strong>the</strong> variable <strong>to</strong> be renamed. It also provides its current name <strong>to</strong> <strong>the</strong><br />
mask.<br />
3. a mask in paren<strong>the</strong>ses which contains strings in quotes <strong>to</strong> be used exactly as entered. It also contains<br />
special characters such as <strong>the</strong> “&” which are used <strong>to</strong> select or omit letters from <strong>the</strong> input<br />
label and <strong>to</strong> supply numbers using <strong>the</strong> DO loop scratch variable.<br />
4. a semicolon, ending <strong>the</strong> statement.<br />
Examples of DO-RENAME loops with masks:<br />
[ DO #J USING Q1 TO Q23;<br />
RENAME V(#J) ( 'Survey.' & );<br />
ENDDO]<br />
“Survey.” is a prefix. Variables Q1 through Q23 will be renamed by prefixing <strong>the</strong>ir names with “Survey.”<br />
The new names will be “Survey.Q1”, “Survey.Q2”, and so on. This is an example of a simple mask:<br />
exp=expression vn=variable name nn=number
5.24 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />
[ DO #j=21,35;<br />
RENAME V(#j) (XOOXX);<br />
ENDDO ]<br />
Here, a mask of (XOOXX) is supplied. The initial X says use <strong>the</strong> first input character, <strong>the</strong> OO says omit<br />
<strong>the</strong> next two characters, and <strong>the</strong> XX says use <strong>the</strong> next two (characters 4 and 5). This mask would rename<br />
VAR31 in<strong>to</strong> V31.<br />
MASKS for RENAME and GENERATE<br />
A mask is used <strong>to</strong> create a name for a variable, ei<strong>the</strong>r by modifying <strong>the</strong> ? or V(#J) name preceding it or<br />
by creating a <strong>to</strong>tally different name. The mask activity begins with a pointer on <strong>the</strong> initial character of<br />
<strong>the</strong> input name. The pointer is moved on<strong>to</strong> <strong>the</strong> next character after each usage of X, O, c or C. Fur<strong>the</strong>r<br />
use of X-O-c-C is ignored when <strong>the</strong> pointer is beyond <strong>the</strong> final input character.<br />
1. x or X takes <strong>the</strong> current input character, if usable.<br />
2. o or O omits <strong>the</strong> current input character. NOTE: <strong>the</strong> digit '0' is also usable.<br />
3. c takes <strong>the</strong> current input character, if usable, and, if it is a letter, puts it in<strong>to</strong> lower case. C takes<br />
<strong>the</strong> current input character, if usable, and, if it is a letter, puts it in<strong>to</strong> upper case.<br />
4. & takes all remaining usable input characters that can fit, starting at <strong>the</strong> current location of <strong>the</strong><br />
pointer.<br />
5. @4 places <strong>the</strong> pointer on<strong>to</strong> <strong>the</strong> 4th input character. (@4 xxx) and (ooo xxx) are identical.<br />
6. @-5 places <strong>the</strong> pointer on <strong>the</strong> 5th character from <strong>the</strong> right hand end.<br />
A character that has been <strong>the</strong> subject of any of <strong>the</strong> X-O-c-C-& opera<strong>to</strong>rs is no longer usable by any subsequent<br />
opera<strong>to</strong>r. As a result, ( @-4 OOOO @1 & 'post' ) could be used <strong>to</strong> inactivate <strong>the</strong> rightmost 4<br />
characters, take <strong>the</strong> rest, and add 'post' <strong>to</strong> it. In o<strong>the</strong>r words, <strong>the</strong> mask has replaced an existing 4-character<br />
suffix with a new one.<br />
Blanks are ignored in masks, which can markedly improve readability. For example:<br />
(XXXXXXXXX) and<br />
(XXX XXX XXX) are identical,<br />
O<strong>the</strong>r features of <strong>the</strong> mask are:<br />
1. 'ab.cde' moves <strong>the</strong> string contents in<strong>to</strong> <strong>the</strong> new name.<br />
2. D or d inserts <strong>the</strong> V subscript. This is based on <strong>the</strong> current value of <strong>the</strong> DO scratch variable. If<br />
V(#j) is used, 17 is inserted when #j=17. If V(#j+10) is used, 27 is inserted when #j=17. DD is<br />
like D, but forces 2 characters; 07 is used instead of 7. If DDD is used, three numbers are inserted<br />
in <strong>the</strong> new name and a 7 bonds <strong>to</strong> <strong>the</strong> new label as 007.<br />
3. N or n inserts <strong>the</strong> current iteration count of <strong>the</strong> DO loop. If this is <strong>the</strong> third trip through <strong>the</strong> loop,<br />
3 is inserted. NN provides 2 digits, NNN 3 digits. You do not have <strong>to</strong> use a counter scratch<br />
variable in <strong>the</strong> DO statement in order <strong>to</strong> use 'N'.<br />
IF-THEN-ELSE<br />
IF-THEN-ELSE blocks may include any <strong>PPL</strong> statements including o<strong>the</strong>r IF-THEN-ELSE blocks and DO<br />
LOOPS. GOTO may also be used as long as <strong>the</strong> target label is not in <strong>the</strong> middle of ano<strong>the</strong>r block.<br />
vn=variable name nn=number exp=expression
<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.25<br />
IF Age GE 14, THEN;<br />
ELSE;<br />
ENDIF;<br />
The IF of an IF-THEN can be complex (using OR and AND) but it can only be followed by 'THEN;'.<br />
The IF-THEN statement is followed by all <strong>the</strong> <strong>PPL</strong> statements <strong>to</strong> be executed when <strong>the</strong> IF clause is true.<br />
The directions of this logic can be changed by using <strong>the</strong> M/F prefixes. M.THEN would be followed by<br />
<strong>PPL</strong> statements <strong>to</strong> be executed if <strong>the</strong> IF statement result is missing. F.THEN is executed only if <strong>the</strong> IF<br />
statements evaluate <strong>to</strong> FALSE.<br />
[ IF Age GE 14, THEN;<br />
PUT 'over 14';<br />
ELSE;<br />
PUT 'failed';<br />
ENDIF; ]<br />
ELSE; is followed by all <strong>the</strong> <strong>PPL</strong> statements <strong>to</strong> be executed when <strong>the</strong> IF clause is not true. Not true includes<br />
results that are ei<strong>the</strong>r false or missing. ELSE is optional. M.ELSE or F.ELSE can also be<br />
specified.<br />
ENDIF is required <strong>to</strong> denote <strong>the</strong> end of <strong>the</strong> IF block.<br />
exp=expression vn=variable name nn=number
6<br />
<strong>PPL</strong>:<br />
Functions and System Variables<br />
This chapter explains P-<strong>STAT</strong> functions. Functions evaluate or transform one or more arguments and yield a numeric<br />
or character value. This chapter also contains a complete list and description of <strong>the</strong> P-<strong>STAT</strong> system<br />
variables. System variables are special variables whose values are set by P-<strong>STAT</strong>, but may be accessed by <strong>the</strong><br />
user.<br />
Numeric functions and functions that transform ei<strong>the</strong>r numeric or character arguments are explained in this<br />
chapter. The final <strong>PPL</strong> chapter covers character (string) functions. The prior <strong>PPL</strong> chapters cover <strong>the</strong> basics of<br />
<strong>PPL</strong> modification — case and variable selection, changing existing variables, creating new variables, logical selection,<br />
positional notation, DO loops, <strong>the</strong> two recoding functions, NCOT and RECODE, and IF-THEN-ELSE<br />
blocks.<br />
Most data modification is done on a single case. The case is retained, deleted, or modified, depending on <strong>the</strong><br />
values of variables found in that case or on <strong>the</strong> value of some system variable such as .N. , <strong>the</strong> case number. This<br />
is within-case modification. The functions and system values described in this chapter are primarily applicable <strong>to</strong><br />
modification of a single case. The next <strong>PPL</strong> chapter covers across-case modification, that is, modification of multiple<br />
cases grouped <strong>to</strong>ge<strong>the</strong>r because of a common relationship.<br />
6.1 ONE-EXPRESSION FUNCTIONS<br />
There are four basic types of functions in <strong>the</strong> P-<strong>STAT</strong> programming language:<br />
1. functions that evaluate a single expression;<br />
2. functions that evaluate a list of expressions;<br />
3. special functions that evaluate <strong>the</strong> first expression in <strong>the</strong> argument list, using <strong>the</strong> additional arguments<br />
<strong>to</strong> define <strong>the</strong> function more precisely; and<br />
4. distribution functions that give <strong>the</strong> probability of obtaining a random deviate less than a specified<br />
value.<br />
One-expression functions evaluate a single numeric expression enclosed in paren<strong>the</strong>ses. The expression may<br />
be a variable name or position, a constant, or a complex expression. Complex expressions are nested expressions,<br />
expressions containing arithmetic opera<strong>to</strong>rs, and combinations of both of <strong>the</strong>se.<br />
The function is used in a <strong>PPL</strong> clause containing an instruction or a logical test and its consequence:<br />
SET Age = INT (Age);<br />
IF <strong>Inc</strong>ome GOOD, SET <strong>Inc</strong>ome = ROUND (<strong>Inc</strong>ome);<br />
Paren<strong>the</strong>ses enclose <strong>the</strong> expression that <strong>the</strong> function evaluates. In <strong>the</strong> first example just given, <strong>the</strong> INT (i.e., integer)<br />
function evaluates Age and yields <strong>the</strong> integer portion of Age. Age is set <strong>to</strong> this integer value. In <strong>the</strong> second<br />
example, if <strong>Inc</strong>ome is GOOD (non-missing), <strong>the</strong> ROUND function evaluates <strong>Inc</strong>ome and yields a value rounded<br />
<strong>to</strong> <strong>the</strong> nearest whole number. <strong>Inc</strong>ome is set <strong>to</strong> this rounded value.<br />
Functions may be nested within functions. For example:<br />
SET Root.<strong>Inc</strong>ome = ROUND ( SQRT ( <strong>Inc</strong>ome ));<br />
The square-root of <strong>Inc</strong>ome is rounded and s<strong>to</strong>red in variable Root.<strong>Inc</strong>ome.
6.2 <strong>PPL</strong>: Functions and System Variables<br />
The functions that evaluate a single numeric expression are:<br />
ABS ( exp ) absolute value<br />
COS ( exp ) cosine<br />
ACOS ( exp ) arc cosine<br />
EXP ( exp ) exponential (e raised <strong>to</strong> this exponent)<br />
FACTORIAL (exp) <strong>the</strong> fac<strong>to</strong>rial value of <strong>the</strong> argument<br />
FRAC ( exp ) fractional part<br />
INT ( exp ) integer part<br />
LOC ( vn ) location (of a variable)<br />
LOG ( exp ) natural logarithm (base e)<br />
LOG10 ( exp ) common logarithm (base 10)<br />
ROUND ( exp ) rounds <strong>to</strong> nearest integer<br />
CEIL (exp) smallest integer greater than or equal <strong>to</strong> <strong>the</strong> input value<br />
FLOOR (exp) largest integer that is less than or equal <strong>to</strong> <strong>the</strong> input value<br />
SIN ( exp ) sine<br />
ASIN ( exp ) arc sine<br />
SQRT ( exp ) square root<br />
TAN ( exp ) tangent<br />
ATAN ( exp ) arc tangent<br />
6.2 Rounding Functions<br />
The FRAC, INT and ROUND functions yield rounded values (of sorts). The original signs of <strong>the</strong> numbers are preserved.<br />
The ABS (absolute value) function yields <strong>the</strong> original value of a number without any sign. Examples of<br />
<strong>the</strong>se functions, using <strong>the</strong> same value as <strong>the</strong> argument for each, highlight <strong>the</strong> differences among <strong>the</strong> functions:<br />
Function Result<br />
FRAC ( -621.87 ) -0.87<br />
INT ( -621.87 ) -621<br />
ROUND ( -621.87 ) -622<br />
ABS ( -621.87 ) 621.87<br />
6.3 Floor and Ceiling<br />
FLOOR is a function that takes a numeric input and produces <strong>the</strong> largest integer that is less than or equal <strong>to</strong> <strong>the</strong><br />
input value. Thus:<br />
FLOOR(-4.1) = -5<br />
FLOOR( 2 ) = 2<br />
FLOOR( 2.9) = 2<br />
CEIL is a function that takes a numeric input and produces <strong>the</strong> smallest integer that is greater than or equal <strong>to</strong> <strong>the</strong><br />
input value. Thus:<br />
CEIL (-4.7) = -4<br />
CEIL ( 2 ) = 2<br />
CEIL ( 2.1) = 3
<strong>PPL</strong>: Functions and System Variables 6.3<br />
6.4 Exponential and Trigonometric Functions<br />
The SQRT function yields <strong>the</strong> square root of a number. The square of a number is obtained using <strong>the</strong> numeric<br />
opera<strong>to</strong>r ** (see <strong>the</strong> first <strong>PPL</strong> chapter). The LOG and LOG10 functions yield <strong>the</strong> natural and common logarithms<br />
of values, <strong>to</strong> base e and base 10, respectively. The EXP (exponential) function raises e <strong>to</strong> <strong>the</strong> value given as its<br />
argument (“undoing” <strong>the</strong> effect of <strong>the</strong> LOG function). Similarly, raising 10 <strong>to</strong> <strong>the</strong> value produced by LOG10 “undoes”<br />
that function.<br />
Function Result<br />
LOG ( 12094.5 ) 9.40051<br />
EXP ( 9.40051 ) 12094.5<br />
LOG10 ( 12094.5 ) 4.08259<br />
10 ** 4.08259 12094.5<br />
The SIN, COS and TAN functions yield <strong>the</strong> sine, cosine and tangent of <strong>the</strong>ir numeric argument. The ASIN,<br />
ACOS and ATAN functions yield <strong>the</strong> arc sine, arc cosine and arc tangent. Using <strong>the</strong>se functions in conjunction<br />
with <strong>the</strong> numeric opera<strong>to</strong>rs permits calculation of a variety of trigonometric expressions.<br />
6.5 The Fac<strong>to</strong>rial Function<br />
The FACTORIAL function yields <strong>the</strong> fac<strong>to</strong>rial value of <strong>the</strong> argument. This is often shown as N!. The argument<br />
should be a non-negative integer. If <strong>the</strong> argument is zero, <strong>the</strong> result is one. If <strong>the</strong> argument is an integer from 1<br />
through 169 or so, <strong>the</strong> result is <strong>the</strong> product of integers from one through that argument.<br />
Function Result<br />
FACTORIAL (0) 1<br />
FACTORIAL (5) 120<br />
FACTORIAL (169) 0.4269068E305<br />
FACTORIAL (200) Missing 1 (<strong>the</strong> result would be <strong>to</strong>o large)<br />
FACTORIAL (-12) Missing 3 (argument is negative)<br />
FACTORIAL (3.5) Missing 3 (argument not an integer)<br />
6.6 Creating Dummy Variables with <strong>the</strong> LOC Function<br />
The LOC function yields <strong>the</strong> location of a variable. Thus, it is slightly different from <strong>the</strong> o<strong>the</strong>r simple functions<br />
because it is not purely a numeric function. The value it returns is numeric, but <strong>the</strong> variable given as its argument<br />
may be a character or numeric one. See <strong>the</strong> explanation for EXPAND later in this chapter for ano<strong>the</strong>r way <strong>to</strong> generate<br />
several variables from one or more input variables.<br />
Function Result<br />
LOC ( Name ) 6 (when Name is <strong>the</strong> 6th variable)<br />
LOC ( Age ) 10 (when Age is <strong>the</strong> 10th variable)<br />
LOC is often used when <strong>the</strong> location of a variable, referenced by position, is not known:<br />
SET V ( LOC ( North.East ) + 1 ) = 100 ;<br />
The location of <strong>the</strong> variable named “North.East” plus 1 defines a value; if North.East is <strong>the</strong> fourth variable in <strong>the</strong><br />
file, that value is 5. This value is <strong>the</strong> subscript or index of V (<strong>the</strong> vec<strong>to</strong>r of variables in <strong>the</strong> file) and V(5) is set <strong>to</strong><br />
100.<br />
In Figure 6.1, four variables are created, one for each of <strong>the</strong> possible values of Region. These variables are<br />
set <strong>to</strong> 0 or 1 depending on <strong>the</strong> value of Region for that case. This is sometimes referred <strong>to</strong> as creating dummy
6.4 <strong>PPL</strong>: Functions and System Variables<br />
variables, a technique often used in setting up data for regression or analysis of variance. With only four variables,<br />
it may be easier <strong>to</strong> understand what is being done if you use:<br />
ra<strong>the</strong>r than:<br />
[ IF Region EQ 1, SET North.East = 1 ;<br />
IF Region EQ 2, SET North.West = 1 ;<br />
IF Region EQ 3, SET South.East = 1 ;<br />
IF Region EQ 4, SET South.West = 1 ]<br />
SET V ( LOC ( North.East ) + Region - 1 ) = 1;<br />
However, with many more variables and corresponding IF statements, <strong>the</strong> use of this calculated expression becomes<br />
more desirable.<br />
When a variable is created with GENERATE, it takes <strong>the</strong> next position at <strong>the</strong> right end of <strong>the</strong> file. Thus, it is<br />
easy <strong>to</strong> calculate <strong>the</strong> location of each variable in turn, if <strong>the</strong> location of <strong>the</strong> first new one is known. Since <strong>the</strong>re are<br />
three variables in file Regional, <strong>the</strong> variable North.East will be in position four. The LOC function returns <strong>the</strong><br />
location of a variable. Therefore LOC ( North.East ) has a value of 4:<br />
Region LOC (North.East) V (LOC (North.East) + Region-1)<br />
1 4 4 + 1 - 1 = 4<br />
2 4 4 + 2 - 1 = 5<br />
3 4 4 + 3 - 1 = 6<br />
4 4 4 + 4 - 1 = 7<br />
When Region is 4, <strong>the</strong>n <strong>the</strong> variable in position 7, South.West, is set <strong>to</strong> 1.<br />
__________________________________________________________________________<br />
Figure 6.1 Calculating Variable Positions<br />
FILE Regional:<br />
Age Sex Region<br />
52 1 1<br />
31 2 2<br />
65 1 3<br />
27 2 4<br />
LIST Regional<br />
[ GENERATE North.East = 0, GENERATE North.West = 0,<br />
GENERATE South.East = 0, GENERATE South.West = 0;<br />
SET V ( LOC ( North.East ) + Region - 1 ) = 1 ] $<br />
North North South South<br />
Age Sex Region East West East West<br />
52 1 1 1 0 0 0<br />
31 2 2 0 1 0 0<br />
65 1 3 0 0 1 0<br />
27 2 4 0 0 0 1<br />
__________________________________________________________________________
<strong>PPL</strong>: Functions and System Variables 6.5<br />
If <strong>the</strong> value of Region is outside <strong>the</strong> expected range, an error condition could occur, or <strong>the</strong> value of some o<strong>the</strong>r<br />
existing variable could be changed. The use of an AMONG test ensures that <strong>the</strong> value of Region will be used only<br />
if it is non-missing and between 1 and 4:<br />
IF Region AMONG (1 TO 4),<br />
SET V ( LOC ( North.East ) + Region - 1 ) = 1;<br />
When a calculation might produce a value o<strong>the</strong>r than an integer, <strong>the</strong> INT or ROUND function may be used:<br />
IF Region AMONG (1 TO 4),<br />
SET V ( LOC ( North.East ) + INT ( Region ) - 1 ) = 1;<br />
The creation of dummy variables may be simplified if <strong>the</strong> original order of <strong>the</strong> variables does not need <strong>to</strong> be<br />
preserved. KEEP rearranges <strong>the</strong> new variables after <strong>the</strong>y are created:<br />
GENERATE North.East = 0, GENERATE North.West = 0,<br />
GENERATE South.East = 0, GENERATE South.West = 0;<br />
KEEP .NEW. Age Sex Region;<br />
SET V (Region) = 1;<br />
When Region is a 3, using V(region) is equivalent <strong>to</strong> using V(3). Note <strong>the</strong> use of ".NEW." a system variable which<br />
refers <strong>to</strong> all <strong>the</strong> variables created in <strong>the</strong> current command.<br />
6.7 Creating a Single Variable from Dummy Variables<br />
Sometimes <strong>the</strong> data are already entered as a series of variables coded 0 and 1, and you would like <strong>to</strong>, in effect,<br />
“undummy” <strong>the</strong>m; that is, you would like <strong>to</strong> create a new variable which has its value based on <strong>the</strong> location of <strong>the</strong><br />
one variable in <strong>the</strong> series which has a value of 1. Given cases in a file such as this case:<br />
North North South South<br />
Age Sex East West East West<br />
52 1 1 0 0 0<br />
These <strong>PPL</strong> clauses create <strong>the</strong> new variable Region:<br />
GENERATE Region = .M1.;<br />
DO #J USING North.East TO South.West;<br />
IF V(#J) EQ 1, SET Region = #J + 1 - LOC(North.East);<br />
ENDDO;<br />
The DO loop scratch variable is“#J. #J takes on <strong>the</strong> values 3, 4, 5 and 6 as <strong>the</strong> DO loop is processed. The<br />
LOC of North.East always has a value of 3. If V(3) is 1 when #J = 3, <strong>the</strong> new variable Region is set <strong>to</strong> 1:<br />
3 (#J) + 1 (a constant) - 3 (location of North.East) = 1<br />
If V(4) = 1 when #J = 4, Region is set <strong>to</strong> 2 which is:<br />
and so on.<br />
4 (#J) + 1 (a constant) - 3 (location of North.East) = 2<br />
Again, <strong>the</strong> calculations of position may be simplified by using a second scratch variable in <strong>the</strong> DO loop:<br />
GENERATE Region = .M1.;<br />
DO #J #N USING North.East TO South.West;<br />
IF V(#J) EQ 1, SET Region = #N;<br />
ENDDO;<br />
#N takes on a value which corresponds <strong>to</strong> <strong>the</strong> number of times through <strong>the</strong> loop. Thus it will be a 1 when <strong>the</strong><br />
current DO is positioned at North.East and a 4 when it is positioned at South.West.
6.6 <strong>PPL</strong>: Functions and System Variables<br />
6.8 LIST FUNCTIONS<br />
These functions evaluate a list of variables given as <strong>the</strong>ir arguments. The variables may be referenced by names<br />
or positions, or a combination of both. Ranges of variables and wildcards may be included in <strong>the</strong> list. The numeric<br />
list functions are:<br />
MAX ( vnp list ) maximum value of variables<br />
MAX.GOOD ( vnp list )<br />
MEAN ( vnp list ) mean of variables<br />
MEAN.GOOD ( vnp list )<br />
MIN ( vnp list ) minimum value of variables<br />
MIN.GOOD ( vnp list )<br />
SDEV ( vnp list ) standard deviation of variables<br />
SDEV.GOOD ( vnp list )<br />
SUM ( vnp list ) sum of variables<br />
SUM.GOOD ( vnp list )<br />
The list functions that evaluate ei<strong>the</strong>r numeric or character arguments are:<br />
COUNT.GOOD ( vnp list ) number of non-missing values<br />
FIRST.GOOD ( vnp list ) value of first non-missing var<br />
LAST.GOOD ( vnp list ) value of last non-missing var<br />
These function can be used quite generally:<br />
GENERATE Check = 0;<br />
IF MIN ( Test1 TO Test8 ) EQ MAX ( Test1 TO Test8 ),<br />
SET Check = 1;<br />
GENERATE Average = MEAN.GOOD ( Test1 TO Test8 );<br />
6.9 Numeric List Functions<br />
The arguments for <strong>the</strong> numeric list functions are enclosed in paren<strong>the</strong>ses. Individual variable names and positions,<br />
wildcards, and ranges of variables may be specified.<br />
The numeric list functions may be suffixed with “.GOOD” <strong>to</strong> specify that <strong>the</strong>y apply only <strong>to</strong> good (non-missing)<br />
values. “.GOOD” may be abbreviated <strong>to</strong> “.G” if desired. The difference between <strong>the</strong> function MEAN and<br />
MEAN.GOOD is that MEAN gives <strong>the</strong> mean of all <strong>the</strong> variables in <strong>the</strong> list, whereas MEAN.GOOD gives <strong>the</strong><br />
mean of only <strong>the</strong> good variables in <strong>the</strong> list. If MEAN is used and any one of <strong>the</strong> variables in <strong>the</strong> list is missing,<br />
<strong>the</strong> result is missing. If MEAN.GOOD is used and any of <strong>the</strong> variables in <strong>the</strong> list is missing, <strong>the</strong> mean is computed<br />
using only whatever good values are available.<br />
A teacher computing final grades could use <strong>the</strong> function MEAN and give students who have not completed<br />
all tests a missing or incomplete grade. Given this file,<br />
FILE Students:<br />
MidTerm.1 Final.1 MidTerm.2 Final.2<br />
2 3 4 4<br />
3 - 2 1<br />
<strong>the</strong>se instructions compute both <strong>the</strong> mean of all values and <strong>the</strong> mean of non-missing values:
<strong>PPL</strong>: Functions and System Variables 6.7<br />
LIST Students [<br />
GENERATE Average.Good =<br />
MEAN.GOOD ( MidTerm.1 TO Final.2 );<br />
GENERATE Average.All =<br />
MEAN ( MidTerm.1 TO Final.2 )] $<br />
MidTerm Final MidTerm Final Average Average<br />
.1 .1 .2 .2 Good All<br />
2 3 4 4 3.25 3.25<br />
3 - 2 1 2.00 -<br />
A doc<strong>to</strong>r looking at average blood pressure readings for his or her patients might use MEAN.GOOD, which uses<br />
only <strong>the</strong> available good information. SUM.GOOD, MAX.GOOD, MIN.GOOD, and SDEV.GOOD all use only<br />
<strong>the</strong> good data and ignore <strong>the</strong> missing values:<br />
GENERATE Low.Score = MIN.GOOD ( Test10 TO Test15, V(33) ) ;<br />
6.10 Character and Numeric List Functions<br />
The COUNT.GOOD, FIRST.GOOD and LAST.GOOD functions detect or count non-missing data. The arguments<br />
for <strong>the</strong>se functions may be character or numeric variable name and position lists. However, numeric and<br />
character values cannot be combined in one list.<br />
COUNT.GOOD yields a numeric value:<br />
IF COUNT.GOOD ( Course.1 TO Course.8 )<br />
NOTAMONG ( 4 TO 6 ), SET Special = 1;<br />
FIRST.GOOD and LAST.GOOD yield <strong>the</strong> value of ei<strong>the</strong>r <strong>the</strong> first or last non-missing variable in <strong>the</strong> argument<br />
list. This may be ei<strong>the</strong>r a character or numeric value:<br />
GEN Last.Course:C =<br />
LAST.GOOD ( Course.1 TO Course.8 );<br />
Thus, when a variable is being generated or recoded, its data type must agree with that of <strong>the</strong> value returned by<br />
LAST.GOOD.<br />
The FIRST and LAST functions access ei<strong>the</strong>r <strong>the</strong> first or last cases in a file, or <strong>the</strong> first and last cases in subgroups.<br />
These across-case functions are explained in <strong>the</strong> next <strong>PPL</strong> chapter, and <strong>the</strong>y are also described briefly in<br />
<strong>the</strong> summary in this chapter.<br />
6.11 SPECIAL FUNCTIONS<br />
Most of <strong>the</strong> special functions require two arguments. The first is <strong>the</strong> actual argument for <strong>the</strong> function. his is followed<br />
by a second argument that provides extra information and controls how <strong>the</strong> function operates. The special<br />
functions and <strong>the</strong>ir arguments are:<br />
CHAREX ( expression, mask )<br />
COMBINATIONS ( expression, expression )<br />
DIF ( expression, constant )<br />
LAG ( expression, constant )<br />
MOD ( expression, constant )<br />
NCOT ( expression, instructions )
6.8 <strong>PPL</strong>: Functions and System Variables<br />
NUMEX ( expression, mask )<br />
PLACES ( expression, constant )<br />
RECODE ( expression, instructions)<br />
NCOT and RECODE, which are discussed in <strong>the</strong> second <strong>PPL</strong> chapter, are typical of <strong>the</strong>se functions:<br />
SET Age = RECODE ( Age, 91 TO 99 = 90 );<br />
GENERATE Coded.Age = NCOT ( Age, 20, 90/5 );<br />
The first argument may be a simple or complex expression which, when resolved, is a value. The second argument<br />
provides additional instructions for evaluating <strong>the</strong> function.<br />
6.12 The LAG and DIF Functions<br />
LAG and DIF access a variable value in a prior case. These functions are used in econometrics, as well as in o<strong>the</strong>r<br />
fields. The LAG function “lags” back a specified number of cases <strong>to</strong> obtain a value of a given variable <strong>to</strong> use in<br />
<strong>the</strong> current case. The variable name and <strong>the</strong> number of cases <strong>to</strong> lag back are necessary:<br />
GENERATE Gross.Last.Month = LAG (Gross.Profit, 1);<br />
In this example, <strong>the</strong> variable Gross.Last.Month is generated from <strong>the</strong> variable Gross.Profit one case back. Each<br />
case represents a month’s values here.<br />
__________________________________________________________________________<br />
Figure 6.2 Using LAG and DIF<br />
TITLE 'Gross Profit (in Thousands of Dollars)' $<br />
LIST Acct84 [<br />
GENERATE Gross.Last.Month = LAG (Gross.Profit, 1);<br />
GENERATE Difference.1 = DIF (Gross.Profit, 1);<br />
GENERATE Difference.2 = DIF (Gross.Profit, 2);<br />
GENERATE Two.Month.Gross = LAG (Gross.Profit, 1) + Gross.Profit;<br />
KEEP Month Gross.Profit .NEW. ], MAX.PLACES 1 $<br />
Gross Profit (in Thousands of Dollars)<br />
Gross Gross Last Difference Difference Two Month<br />
Month Profit Month .1 .2 Gross<br />
1 4.8 - - - -<br />
2 5.1 4.8 0.3 - 9.9<br />
3 4.9 5.1 -0.2 0.1 10.0<br />
4 5.7 4.9 0.8 0.6 10.6<br />
5 6.2 5.7 0.5 1.3 11.9<br />
6 5.6 6.2 -0.6 -0.1 11.8<br />
__________________________________________________________________________<br />
The LAG function’s arguments are: 1) a name of a numeric variable or an expression that provides <strong>the</strong> location<br />
of a numeric variable, and 2) a positive integer constant (not exceeding 500) that indicates <strong>the</strong> number of cases<br />
<strong>to</strong> lag back. The new variable’s values in <strong>the</strong> initial cases in <strong>the</strong> file are set <strong>to</strong> missing type one. To do a lag on a<br />
character variable use <strong>the</strong> CLAG function described in <strong>the</strong> chapter “Modification of Character Variables”.
<strong>PPL</strong>: Functions and System Variables 6.9<br />
The DIF function finds <strong>the</strong> difference between a variable’s value in <strong>the</strong> current case and that variable’s value<br />
in a prior case. The variable name (or expression) and <strong>the</strong> number of cases back <strong>to</strong> find <strong>the</strong> comparison variable<br />
value are required:<br />
GENERATE Difference.1 = DIF (Gross.Profit, 1);<br />
GENERATE Difference.2 = DIF (Gross.Profit, 2);<br />
Here, <strong>the</strong> variables Difference.1 and Difference.2 are generated and set equal <strong>to</strong> <strong>the</strong> difference in Gross.Profit this<br />
month (<strong>the</strong> current case) and last month and <strong>the</strong> month before that (one case back and two cases back).<br />
The DIF function’s arguments are: 1) a variable name or expression, and 2) a positive integer constant (not<br />
exceeding 500) that indicates <strong>the</strong> number of cases back in which <strong>to</strong> find <strong>the</strong> comparison value. The new variable’s<br />
values in <strong>the</strong> initial cases are set <strong>to</strong> missing type 1. Thus, DIF is very similar in operation <strong>to</strong> LAG. Figure 6.2<br />
illustrates <strong>the</strong> results obtained in various usages of LAG and DIF. Note that it is also easy <strong>to</strong> get sums, products,<br />
quotients, and so on, by using LAG and DIF in conjunction with o<strong>the</strong>r arithmetic operations.<br />
LAG and DIF work with <strong>the</strong> cases <strong>the</strong>y get from any preceding <strong>PPL</strong>. This means that you will almost never<br />
use <strong>the</strong>se functions within an IF. Only <strong>the</strong> cases which have a true value on <strong>the</strong> IF in Figure 6.3 will be input <strong>to</strong><br />
<strong>the</strong> LAG/DIF function. Thus variable If.1 is only set when <strong>the</strong> IF statement is true. The first time that <strong>the</strong> IF is<br />
true <strong>the</strong>re is nothing in <strong>the</strong> lag buffer so variable If.1 is set <strong>to</strong> missing and <strong>the</strong> lag buffer is set <strong>to</strong> <strong>the</strong> current value<br />
of var1, a 3.. The second time that <strong>the</strong> IF is true, variable If.1 is set <strong>to</strong> 3, <strong>the</strong> value s<strong>to</strong>red in <strong>the</strong> lag buffer. The<br />
lag buffer now contains <strong>the</strong> value 5 which is used <strong>the</strong> next time <strong>the</strong>re is a true result for <strong>the</strong> IF.<br />
___________________________________________________________________________<br />
Figure 6.3 Interaction of LAG and IF<br />
File Work<br />
var1 var2<br />
1 3<br />
2 4<br />
1 5<br />
1 6<br />
2 7<br />
MODIFY work [ GEN If.1; GEN No.If;<br />
IF var1 = 1 SET If.1 = LAG ( var2, 1 );<br />
SET No.If = LAG ( var2, 1 ); ],<br />
OUT work2 $<br />
File Work2<br />
No<br />
var1 var2 If.1 If<br />
1 3 - -<br />
2 4 - 3<br />
1 5 3 4<br />
1 6 5 5<br />
2 7 - 6<br />
___________________________________________________________________________<br />
6.13 Modular (Remainder) Arithmetic<br />
MOD is a function that returns <strong>the</strong> remainder after a constant has been divided in<strong>to</strong> <strong>the</strong> value of <strong>the</strong> first expression.<br />
(This is often referred <strong>to</strong> as modular arithmetic.) The first expression usually points <strong>to</strong> a numeric variable; <strong>the</strong>
6.10 <strong>PPL</strong>: Functions and System Variables<br />
second argument must be numeric. If Age is 25, <strong>the</strong>n MOD (Age, 7) is 4, <strong>the</strong> remainder after all <strong>the</strong> possible 7's<br />
are removed.<br />
The following examples illustrate <strong>the</strong> results returned by <strong>the</strong> MOD function:<br />
Function Result<br />
MOD ( .75, .5 ) .25<br />
MOD ( 1, .3 ) .1<br />
MOD ( 6, 1 ) .0<br />
MOD ( 6, 2 ) .0<br />
MOD ( 6, 4 ) 2.0<br />
MOD ( 12, 7 ) 5.0<br />
MOD ( .M., 3 ) -<br />
MOD may be used <strong>to</strong> construct patterns for retaining cases:<br />
IF MOD ( .N., 3 ) EQ 1 OR<br />
MOD ( .N., 7 ) EQ 1, RETAIN;<br />
This instruction tests <strong>the</strong> case number (.N.) and, if it is 1, 4, 7, 8, 10, 13, 15, 16, and so on, retains <strong>the</strong> case.<br />
6.14 Setting PLACES in Specific Variables<br />
The function PLACES requests a specific number of decimal places for specified numeric variables. The function<br />
sets <strong>the</strong> selected variable <strong>to</strong> <strong>the</strong> desired number of places before <strong>the</strong> file is passed <strong>to</strong> any commands, such as <strong>the</strong><br />
LIST command. Thus, <strong>the</strong> function PLACES may operate on one particular variable, and <strong>the</strong> subsequent use of<br />
LIST identifiers, such as MIN.PLACES or MAX.PLACES, may <strong>the</strong>n affect all of <strong>the</strong> variables in <strong>the</strong> listing (including<br />
<strong>the</strong> one already modified by <strong>the</strong> PLACES function). The following example, which uses <strong>the</strong> output file<br />
from <strong>the</strong> command T.TEST with <strong>the</strong> PLACES function and <strong>the</strong> LIST identifier MAX.PLACES, illustrates this.<br />
The number of decimal places of <strong>the</strong> variable named T.Prob is set <strong>to</strong> 2, and <strong>the</strong>n <strong>the</strong> file is passed <strong>to</strong> <strong>the</strong> LIST<br />
command:<br />
LIST TTests<br />
[ SET T.Prob = PLACES (T.Prob, 2) ], MAX.PLACES 3 $<br />
The identifier MAX.PLACES requests that <strong>the</strong> maximum number of decimal places for all of <strong>the</strong> variables be limited<br />
<strong>to</strong> 3. The listing produced will have three decimal places (if <strong>the</strong> data has that many places) for all of <strong>the</strong><br />
variables, except for T.Prob, which will have only two places.<br />
The PLACES function requires two expressions in paren<strong>the</strong>ses: 1) <strong>the</strong> argument that is <strong>the</strong> name of <strong>the</strong> variable<br />
whose places are <strong>to</strong> be set, and 2) an integer from 0 <strong>to</strong> 9 that specifies <strong>the</strong> number of decimal places in <strong>the</strong><br />
fractional portion of <strong>the</strong> number, counting from <strong>the</strong> decimal point.<br />
Note: <strong>the</strong> result of a PLACES function is usually less accurate than <strong>the</strong> input value, because information beyond<br />
<strong>the</strong> requested number of places has been dropped.<br />
6.15 Extracting Digits Using NUMEX<br />
Specific digits may be extracted from numeric variables <strong>to</strong> yield a new numeric value. The NUMEX function<br />
operates only on <strong>the</strong> integer portion of a numeric value; any sign and fraction portion are ignored.<br />
NUMEX requires two arguments, a numeric expression and a character string mask composed only of X's<br />
and 0's and enclosed in quotes:<br />
GEN Engine.Num = NUMEX ( Serial.Num, 'X0XXX0' );
<strong>PPL</strong>: Functions and System Variables 6.11<br />
The selection mask is made up of X and 0 (zero) characters and may be up <strong>to</strong> nine characters in length. An X<br />
retains (extracts) a digit and a 0 drops (ignores) a digit. The mask is aligned with <strong>the</strong> right-most digit of <strong>the</strong> numeric<br />
value.<br />
Lead zeros are not retained in <strong>the</strong> output numeric value. The following examples illustrate NUMEX:<br />
Function Result<br />
( 984601, 'XXX' ) 601<br />
( 80742 , 'XXXX' ) 742<br />
( 10065 , 'X00X' ) 5<br />
The CHAREX function is similar <strong>to</strong> NUMEX. It extracts specific digits from a numeric value, but CHAREX<br />
yields a character representation of <strong>the</strong> digits, in which lead zeros are preserved. CHAREX is explained fur<strong>the</strong>r<br />
in <strong>the</strong> final <strong>PPL</strong> chapter.<br />
6.16 COMBINATIONS of N things, K at a time<br />
COMBINATIONS (n,k) returns <strong>the</strong> number of different ways that K things can be taken from N things; i.e., N<br />
things K at a time. For example, combinations (5,2) is 10, namely, 1:2, 1:3, 1:4, 1:5, 2:3, 2:4, 2:5, 3:4, 3:5 and 4:5.<br />
N should be an integer from 1 <strong>to</strong> 6,000. K should be an integer from 0 <strong>to</strong> 60, but not more than N. If <strong>the</strong> result<br />
would be <strong>to</strong>o large, missing 1 is returned. If an argument is invalid, missing 3 is returned.<br />
The function is defined as N! divided by <strong>the</strong> product of K! and (N-K)!. However, <strong>the</strong> actual computation is<br />
done by a series of integer divisions, cancelling out terms, until <strong>the</strong> denomina<strong>to</strong>r is all ones. The result is <strong>the</strong> product<br />
of <strong>the</strong> remaining values in <strong>the</strong> numera<strong>to</strong>r.<br />
Function Result<br />
COMBINATIONS( 6,0 ) 1<br />
COMBINATIONS( 6,1 ) 6<br />
COMBINATIONS( 6,3 ) 20<br />
COMBINATIONS( 6,6 ) 1<br />
COMBINATIONS( 46,6 ) 9,366,819 (<strong>the</strong> NJ lottery odds)<br />
COMBINATIONS( 6000,60) 0.4368755E145<br />
COMBINATIONS( 20,.7) Missing 3 (invalid argument)<br />
6.17 EXPAND ONE OR MORE VARIABLES<br />
EXPAND is a <strong>PPL</strong> statement that projects <strong>the</strong> values of one or more input variables in<strong>to</strong> a set of new variables,<br />
each associated with a specified value in <strong>the</strong> input variables. In it’s simplest usage, EXPAND uses one input variable<br />
<strong>to</strong> create a group of new zero/one variables. Each case begins with <strong>the</strong> new variables set <strong>to</strong> zero. Then, if <strong>the</strong><br />
value on <strong>the</strong> input variable is one of <strong>the</strong> specified values, <strong>the</strong> associated new variable is set <strong>to</strong> one.<br />
The input variable can be numeric or character. The output variables are always numeric. These new variables<br />
are sometimes called “dummy” variables. Several input variables can be expanded <strong>to</strong>ge<strong>the</strong>r. The output variables<br />
can <strong>the</strong>n be set <strong>to</strong>:<br />
1. one if ANY of <strong>the</strong> input variables has <strong>the</strong> associated value. This is <strong>the</strong> default.<br />
2. <strong>the</strong> NUMBER of input variables that have <strong>the</strong> associated value.<br />
3. a one (1) if <strong>the</strong> first input variable has <strong>the</strong> associated value, o<strong>the</strong>rwise a two (2) if <strong>the</strong> second input<br />
variable has <strong>the</strong> value, and so on. In o<strong>the</strong>r words, <strong>the</strong> RANK of <strong>the</strong> value.
6.12 <strong>PPL</strong>: Functions and System Variables<br />
Suppose variable BREAD is coded 1 through 4, with 1 meaning rye, 2 meaning wheat, 3 meaning raisin and 4<br />
meaning white. The <strong>PPL</strong> statement<br />
[ expand bread, values 1:4, gen rye wheat raisin white ]<br />
will generate four new variables named RYE, WHEAT, RAISIN and WHITE. If a given case has a 1 on BREAD,<br />
<strong>the</strong> value for RYE for that case will be set <strong>to</strong> one and <strong>the</strong> o<strong>the</strong>r three <strong>to</strong> zero. If BREAD is two, <strong>the</strong> second new<br />
variable, WHEAT, is set <strong>to</strong> one and <strong>the</strong> rest <strong>to</strong> zero, and so forth.<br />
The new variables are placed after <strong>the</strong> last current variable. In <strong>the</strong> VALUES phrase, ei<strong>the</strong>r 1:4 or 1 TO 4 could<br />
have been used, <strong>the</strong>y mean <strong>the</strong> same thing.<br />
6.18 Overall Syntax of a <strong>PPL</strong> EXPAND Statement<br />
An EXPAND statement consists of phrases, separated by commas. Some phrases are just a single word, o<strong>the</strong>rs<br />
are more extensive. Three of <strong>the</strong>se phrases are required. They start with EXPAND, VALUES and GENERATE.<br />
EXPAND is followed by <strong>the</strong> names of <strong>the</strong> variables <strong>to</strong> be expanded. There is usually just one variable, but <strong>the</strong>re<br />
can be more. If several, <strong>the</strong>y must be ei<strong>the</strong>r all numeric or all character. The EXPAND phrase comes first, <strong>the</strong><br />
order of <strong>the</strong> rest of <strong>the</strong> phrases doesn’t matter.<br />
[ EXPAND crust, ...<br />
[ EXPAND first.<strong>to</strong>p second.<strong>to</strong>p third.<strong>to</strong>p, ...<br />
VALUES is followed by integers if <strong>the</strong> EXPAND variables are numeric, or by quoted character strings is <strong>the</strong><br />
input is character.<br />
6.19 Numeric Input Values<br />
If <strong>the</strong> input variables are numeric, <strong>the</strong> values <strong>to</strong> be tested should be integers from 0 <strong>to</strong> 9999. Ranges can be used,<br />
<strong>the</strong>y are indicated by TO or a colon (:). If some values and/or ranges are placed within paren<strong>the</strong>ses, <strong>the</strong>y will all<br />
be mapped in<strong>to</strong> a single output variable.<br />
An output variable is created:<br />
1. for EACH integer outside of paren<strong>the</strong>ses, and<br />
2. for each paren<strong>the</strong>sis structure.<br />
VALUES 1 3:6 9, makes six output variables.<br />
VALUES 1 9, makes two output variables.<br />
VALUES 1 TO 9, makes nine output variables.<br />
VALUES 0:9, makes ten output variables.<br />
VALUES 1 (3 5:8) 4, makes three output variables.<br />
6.20 Character Input Values<br />
If <strong>the</strong> input variables are character, <strong>the</strong> values <strong>to</strong> be tested should come in quotes, ei<strong>the</strong>r ‘xxx’ or “xxx”. The default<br />
is <strong>to</strong> ignore leading blanks, trailing blanks and case. Thus ‘ Ohio ‘ is equivalent <strong>to</strong> ‘ohio’.<br />
VALUES (‘nj’ ‘new jersey’) ‘ohio’ ‘virginia’,<br />
The above example creates 3 output variables, since <strong>the</strong>re is one set in paren<strong>the</strong>ses, and 2 standalone values. The<br />
first output variable is set <strong>to</strong> 1 when EITHER ‘nj’ or ‘new jersey’ is found.<br />
6.21 The GENERATE or GEN phrase<br />
[ EXPAND <strong>to</strong>pping.1 <strong>to</strong>pping.2, VALUES 1:5 9, GEN <strong>to</strong>p.* ]<br />
[ EXPAND region, VALUES 1:4, GEN east west north south]<br />
GENERATE provides <strong>the</strong> names for <strong>the</strong> variables being created. This can be done in two ways, prefix or full<br />
names.
<strong>PPL</strong>: Functions and System Variables 6.13<br />
GENERATE prefix.*<br />
GENERATE name name name<br />
A prefix like crust.* can be provided. Given<br />
[EXPAND varname, VALUES 1:3, GENERATE vvv.* ]<br />
<strong>the</strong> new variables will be named vvv.1, vvv.2, and vvv.3 . Given<br />
[EXPAND varname, VALUES 7 5 2, GENERATE vvv.* ]<br />
<strong>the</strong> new variables will be named vvv.7, vvv.5, and vvv.2 .<br />
A prefix can be used in a character expand. The quoted values are used <strong>to</strong> complete <strong>the</strong> names of <strong>the</strong> new<br />
variables. If (‘nj’ ‘new jersey’) or such is supplied, <strong>the</strong> first element is used, in this case ‘nj’.<br />
Alternatively, a name can be supplied for each value. Given [EXPAND varname, VALUES 1:3, GENER-<br />
ATE aaa bbb ccc] <strong>the</strong> new variables will be named aaa, bbb, and ccc, with aaa representing <strong>the</strong> value 1 and so<br />
forth. Given<br />
[EXPAND varname, VALUES 7 5 2, GENERATE aaa bbb ccc]<br />
<strong>the</strong> new variables will be named aaa, bbb, and ccc, with aaa representing <strong>the</strong> value 7 (because 7 was <strong>the</strong> first value,<br />
and aaa was <strong>the</strong> first name). In a character expand, <strong>the</strong> first test value is associated with <strong>the</strong> first output name, and<br />
so on.<br />
6.22 Options With Several Input Variables<br />
*The default, when two input variables have <strong>the</strong> same value, is <strong>to</strong> simply set <strong>the</strong> associated output variable <strong>to</strong> 1.<br />
1. ADD, causes an output variable <strong>to</strong> show <strong>the</strong> NUMBER of input variables that have that value.<br />
2. RANK, causes an output variable <strong>to</strong> show <strong>the</strong> ORDER of <strong>the</strong> input variable that is <strong>the</strong> first <strong>to</strong> have<br />
that value. By “order” we mean its position in <strong>the</strong> EXPAND phrase. Consider<br />
[EXPAND var1 var2 var3, values 1:5, gen xxx.*].<br />
If a case has a 3 on both var1 and var2, <strong>the</strong> default is <strong>to</strong> simply set xxx.3 <strong>to</strong> 1. If ADD is in use, xxx.3<br />
would be 2, <strong>the</strong> count of input variables that have that value.<br />
Suppose RANK is in use and a case has 2, 4 and 5 on input variables var1, var2 and var3, having<br />
used VALUES 1:5. That would cause xxx.2 <strong>to</strong> be set <strong>to</strong> 1, xxx.4 <strong>to</strong> 2, and xxx.5 <strong>to</strong> 3. Why is xxx.5<br />
set <strong>to</strong> 3 ? Because <strong>the</strong> initial 5 was found in <strong>the</strong> third input variable.<br />
3. NEED 2, This sets <strong>the</strong> number of non-missing input values that are needed for <strong>the</strong> output variables<br />
<strong>to</strong> be non-missing. The default is one. Thus, if all of <strong>the</strong> expand input is missing for a case, <strong>the</strong> default<br />
is for <strong>the</strong> result variables <strong>to</strong> be set <strong>to</strong> missing for that case.<br />
“NEED 0” can be used. This causes <strong>the</strong> result variables <strong>to</strong> be non-missing, no matter what <strong>the</strong> input<br />
is. “NEED 0” can be used when <strong>the</strong>re is just one input variable.<br />
Suppose “NEED 2” is used when <strong>the</strong>re are 3 input variables. Any case with less than 2 non-missing<br />
input values will be given missing result values.<br />
6.23 Options When <strong>the</strong> Input Variables Are Character<br />
[ EXPAND region, VALUES ‘north’ ‘south’ ‘east’ ‘west’,<br />
GEN region.*, EXACT, NO TRIM ]<br />
The default is <strong>to</strong> ignore leading blanks, trailing blanks and case. Thus ‘ Ohio ‘ is equivalent <strong>to</strong> ‘ohio’.
6.14 <strong>PPL</strong>: Functions and System Variables<br />
1. EXACT, Causes <strong>the</strong> case used in <strong>the</strong> VALUES quoted constants <strong>to</strong> be matched exactly. If VAL-<br />
UES ‘East’ ‘South’ ‘West’ were used, a case with ‘east’ would by default match <strong>the</strong> first value.<br />
However, if EXACT is in use, only ‘East’ would match it.<br />
2. NO TRIM, Causes lead and trailing blanks used in <strong>the</strong> VALUES quoted constants <strong>to</strong> be matched<br />
exactly.<br />
The default does <strong>the</strong> compares using left-justified copies of both <strong>the</strong> test values (from <strong>the</strong> VALUES phrase)<br />
and <strong>the</strong> data from <strong>the</strong> EXPAND variables in <strong>the</strong> current case. In o<strong>the</strong>r words, lead blanks are trimmed before comparing.<br />
If NO TRIM is used, <strong>the</strong> lead blanks are meaningful. Trailing blanks don’t matter in any event.<br />
__________________________________________________________________________<br />
Figure 6.4 EXPAND Example<br />
File xxx has one variable and four cases.<br />
crust<br />
1<br />
3<br />
9<br />
--<br />
LIST xxx [ EXPAND crust, VALUES 1:5, GEN crust.* ]$<br />
produces<br />
crust crust.1 crust.2 crust.3 crust.4 crust.5<br />
1 1 0 0 0 0<br />
3 0 0 1 0 0<br />
9 0 0 0 0 0<br />
-- - - - - -<br />
___________________________________________________________________________<br />
In Figure 6.4, note <strong>the</strong> 9 in case 3. A non-missing input value is ignored when it does not match anything in <strong>the</strong><br />
VALUES phrase. Suppose <strong>the</strong>re is only one input variable. If a case has a value of zero on that variable when<br />
VALUES 1 TO 7 was used, <strong>the</strong> new variables for that case are all zero.<br />
6.24 SYSTEM VARIABLES<br />
System variables are defined and set by P-<strong>STAT</strong> as a run is processed. Typically, <strong>the</strong> values of system variables<br />
may not be changed by users, but <strong>the</strong>y may be accessed, tested and assigned <strong>to</strong> o<strong>the</strong>r variables. The names of<br />
system variables are surrounded by decimal points. This distinguishes system variables from user variables,<br />
whose names must begin with a letter.<br />
General system variables are discussed in <strong>the</strong> following sections. Numeric constants (.e., .PI.) are in <strong>the</strong> summary<br />
at <strong>the</strong> end of <strong>the</strong> chapter. O<strong>the</strong>r system variables used in across-case modifications are described elsewhere<br />
in this manual.<br />
6.25 Referencing Good and Missing Data<br />
.G. is <strong>the</strong> system variable for good data and .M. is <strong>the</strong> system variable for missing data of any type. .M1. indicates<br />
missing type 1, .M2. indicates missing type 2, and .M3. indicates missing type 3. Combinations such as .M13. for<br />
types 1 and 3 can also be used.<br />
The system variable .G. tests for good (non-missing) data:
<strong>PPL</strong>: Functions and System Variables 6.15<br />
IF Name EQ .G., RETAIN;<br />
The system variables for missing data are used both <strong>to</strong> test for missing and <strong>to</strong> set values <strong>to</strong> one of <strong>the</strong> types of<br />
missing:<br />
IF Age EQ .M., SET Age = .M2. ;<br />
IF Age LT 3 , SET Age = .M3. ;<br />
When .M. is specified as a consequence, it is treated as if it were .M1. (missing type 1). When .M. is specified<br />
as a test, it is treated as if it were any of <strong>the</strong> three types of missing. In this example, a case is deleted if <strong>the</strong> value<br />
of variable Age is any of <strong>the</strong> three types of missing:<br />
IF Age EQ .M., DELETE ;<br />
Note that when an IF clause tests for missing or good values, it produces only a true or false result.<br />
The system variable .M. and <strong>the</strong> equal-sign opera<strong>to</strong>r can be combined in<strong>to</strong> <strong>the</strong> opera<strong>to</strong>r MISSING. Both of<br />
<strong>the</strong>se instructions produce <strong>the</strong> same results:<br />
IF Age MISSING, DELETE ;<br />
IF Age EQ .M. , DELETE ;<br />
Similarly, .G. and <strong>the</strong> equal-sign may be combined in<strong>to</strong> <strong>the</strong> opera<strong>to</strong>r GOOD. These are <strong>the</strong> same:<br />
IF <strong>Inc</strong>ome GOOD, RETAIN;<br />
IF <strong>Inc</strong>ome EQ .G., RETAIN;<br />
Note that <strong>the</strong> system variables .G. and .M. are values, and thus may be used with <strong>the</strong> equal-sign opera<strong>to</strong>r, but that<br />
GOOD and MISSING are opera<strong>to</strong>rs already.<br />
6.26 Selecting Variables with .NEW. and .OTHERS.<br />
.NEW. and .OTHERS. reference variables concisely in variable selection clauses. .NEW. is used after KEEP and<br />
DROP <strong>to</strong> refer <strong>to</strong> all new variables created within a command:<br />
GENERATE Medicare.Amt = .80 * Approved.Amt ;<br />
GENERATE Patient.Amt = Approved.Amt - Medicare.Amt;<br />
KEEP Patient.ID TO Approved.Amt .NEW. ;<br />
Only <strong>the</strong> specified variables and <strong>the</strong> two new variables are kept. .NEW. may be used in KEEP and DROP clauses<br />
with or without o<strong>the</strong>r variable names.<br />
.OTHERS. is used in KEEP clauses as a shortcut <strong>to</strong> rearranging variables. .OTHERS. refers <strong>to</strong> all o<strong>the</strong>r variables<br />
in <strong>the</strong> file not explicitly specified in <strong>the</strong> KEEP clause:<br />
KEEP Patient.ID Code.Num .OTHERS.<br />
Billed.Amt TO Approved.Amt ;<br />
This clause keeps all of <strong>the</strong> variables in <strong>the</strong> file, but reorders <strong>the</strong>m as specified.<br />
6.27 Referencing <strong>the</strong> Number of Variables in <strong>the</strong> File<br />
.NV. is <strong>the</strong> system variable for <strong>the</strong> number of variables in <strong>the</strong> file at a given time. This value changes as KEEP,<br />
DROP, GENERATE, SPLIT and COLLECT statements are processed. The following example illustrates ano<strong>the</strong>r<br />
solution <strong>to</strong> <strong>the</strong> problem of creating a series of dummy variables, discussed earlier in this chapter.<br />
GENERATE Number.Vars = .NV.,<br />
GENERATE North.East = 0, GENERATE North.West = 0,<br />
GENERATE South.East = 0, GENERATE South.West = 0;<br />
SET V (Number.Vars + Region) = 1;
6.16 <strong>PPL</strong>: Functions and System Variables<br />
As each case is read, a new variable Number.Vars is generated equal <strong>to</strong> <strong>the</strong> number of variables in <strong>the</strong> file. This<br />
number includes <strong>the</strong> variable being created:<br />
Number North North South South<br />
XA XB XC Region Vars East West East West<br />
1 2 2 2 5 0 1 0 0<br />
2 - 3 4 5 0 0 0 1<br />
Number.Vars is 5. Thus, V ( Number.Vars + Region ) is V(7) or North.West when variable Region is 2, and V(9)<br />
or South.West when variable Region is 4.<br />
6.28 Referencing <strong>the</strong> Current Case Number<br />
.N., .HERE. and .USED. are system variables that refer <strong>to</strong> case numbers. .N. is equal <strong>to</strong> <strong>the</strong> current input case<br />
number (after any case selection). This value is increased every time a case is read, even though that case may be<br />
deleted and not passed <strong>to</strong> <strong>the</strong> current command. .HERE. is <strong>the</strong> number of cases that have been retained — that<br />
have actually been passed <strong>to</strong> <strong>the</strong> command up <strong>to</strong> <strong>the</strong> point when .HERE. is processed. .USED. is <strong>the</strong> number of<br />
cases that have been used after <strong>the</strong> completion of all <strong>PPL</strong>. These are cases that are passed <strong>to</strong> <strong>the</strong> current command<br />
preceding <strong>the</strong> <strong>PPL</strong>. The three values are <strong>the</strong> same when no cases have been deleted.<br />
.N. provides an easy way <strong>to</strong> delete individual cases by using <strong>the</strong>ir positions in <strong>the</strong> file:<br />
IF .N. AMONG ( 31, 100 TO 105, 399 ), DELETE;<br />
The next instruction retains <strong>the</strong> first 98 cases in <strong>the</strong> file and makes <strong>the</strong>m available <strong>to</strong> any subsequent <strong>PPL</strong> clauses<br />
and <strong>to</strong> <strong>the</strong> current command:<br />
IF .N. LT 99, RETAIN;<br />
However, <strong>the</strong> case reader continues <strong>to</strong> read through <strong>the</strong> rest of <strong>the</strong> file, testing each case against <strong>the</strong> value of .N.<br />
Thus, case selection is more economical:<br />
CASES 1 TO 98;<br />
The diagonal elements of a square matrix may be set <strong>to</strong> 1 easily with .N. and a DO loop:<br />
DO #J USING 1 .ON.;<br />
IF .N. EQ #J, SET V(#J) = 1;<br />
ENDDO;<br />
This is often useful when working with matrices.<br />
.HERE. is set <strong>to</strong> <strong>the</strong> number of cases that have been processed by <strong>the</strong> <strong>PPL</strong> clause in which .HERE. is found.<br />
If no cases have been deleted prior <strong>to</strong> that <strong>PPL</strong> clause, .HERE. is <strong>the</strong> same as .N. , <strong>the</strong> current input case number.<br />
.USED., which may be abbreviated <strong>to</strong> .U., is set after all cases are processed by all <strong>PPL</strong> clauses. It is <strong>the</strong> count of<br />
all cases not deleted by any logical tests. .USED. is <strong>the</strong> same as .N. when no cases have been deleted in any of <strong>the</strong><br />
<strong>PPL</strong> clauses.<br />
Figure 6 .5 illustrates <strong>the</strong> differences between .N., .HERE. and .USED. Each case in <strong>the</strong> output file contains<br />
three new variables. Input.Case.No is <strong>the</strong> sequence number in <strong>the</strong> input file, Student.No is <strong>the</strong> sequence number<br />
of students, and Output.Case.No is <strong>the</strong> sequence number in <strong>the</strong> output file. Student.No and Output.Case.No have<br />
gaps in <strong>the</strong> number sequence, indicating cases that were not students and cases that were students with missing<br />
tests, respectively.
<strong>PPL</strong>: Functions and System Variables 6.17<br />
__________________________________________________________________________<br />
Figure 6.5 Showing <strong>the</strong> Differences Between .N., .HERE. and .USED.<br />
File ABU:<br />
Test Test Test<br />
Status .1 .2 .3<br />
student 95 99 94<br />
student 87 81 93<br />
non-mat 78 86 89<br />
student 67 - 69<br />
student 87 88 90<br />
LIST ABU<br />
[ GENERATE Input.Case.No = .N. ;<br />
GENERATE Output.Case.No;<br />
GENERATE Student.NO ]<br />
[<br />
IF Status NE 'Student', DELETE ;<br />
SET Student.No = .HERE. ;<br />
IF ANY ( Test? ) MISSING, DELETE ;<br />
SET Output.Case.No = .USED. ] $<br />
Input Output<br />
Test Test Test Case Student Case<br />
Status .1 .2 .3 No No No<br />
student 95 99 94 1 1 1<br />
student 87 81 93 2 2 2<br />
student 87 88 90 5 4 3<br />
__________________________________________________________________________<br />
When files are concatenated on-<strong>the</strong>-fly and <strong>the</strong> same modifications are applied ( * ), case counting (.N.,<br />
.HERE. and .USED.) continues as if <strong>the</strong> files were a single file:<br />
MODIFY File1<br />
[ GENERATE Input.Case.No = .N.; GENERATE Output.Case.No;<br />
IF Test4 MISSING, DELETE ;<br />
SET Output.Case.No = .HERE.]<br />
+ File2 [ * ], OUT File12 $<br />
6.29 Referencing Numeric and Character Variables<br />
.NUMERIC. is <strong>the</strong> list of all numeric variables in a file. Similarly, .CHARACTER. is <strong>the</strong> list of all character variables<br />
in <strong>the</strong> file. Both of <strong>the</strong>se system variables are used in KEEP or DROP clauses.<br />
A selection of numeric variables may be appropriate for recoding and for input <strong>to</strong> some commands:
6.18 <strong>PPL</strong>: Functions and System Variables<br />
SURVEY Streams<br />
[ KEEP .NUMERIC. ;<br />
DO #J USING Stream1 TO Stream7;<br />
IF V(#J) MISSING THEN;<br />
SET V(#J) = 0;<br />
ELSE;<br />
SET V(#J) = NCOT ( V(#J), 25, 50, 75 );<br />
ENDIF;<br />
ENDDO ] ;<br />
STUBS Stream1 TO Stream7 $<br />
A selection of character variables may be useful in mapping (recoding) character values.<br />
MAP River<br />
[KEEP .CHARACTER. ], VAR Station1 TO Station7, OUT RiverMap $<br />
Ei<strong>the</strong>r system variable reorders variables:<br />
MODIFY Class88<br />
[ KEEP ID Name .NUMERIC. .OTHERS. ], OUT Class88 $<br />
6.30 Accessing <strong>the</strong> PUT Counter<br />
Each time that a case is read, .PUT. is set <strong>to</strong> 0. If <strong>the</strong> PUT or PUTL instructions are evoked (see <strong>the</strong> prior chapter),<br />
.PUT. is increased. After all <strong>the</strong> modifications for a given case are done, .PUT. can be tested <strong>to</strong> see whe<strong>the</strong>r <strong>the</strong><br />
PUT logic produced printed text. In this example, several checks for mutually inconsistent data are made. If inconsistencies<br />
are found, an explana<strong>to</strong>ry statement is printed:<br />
MODIFY InFile<br />
[ IF Age LT 18 AND Veteran GT 0,<br />
PUT '<strong>Inc</strong>onsistent values of Age and Veteran for '<br />
First.Name Last.Name ;<br />
IF Age LT 15 AND Married GT 0,<br />
PUT '<strong>Inc</strong>onsistent values of Age and Married for '<br />
First.Name Last.Name ;<br />
IF .PUT. GT 0, RETAIN ],<br />
OUT To.Check $<br />
The PUT counter .PUT. is increased, and those records with inconsistencies are retained for fur<strong>the</strong>r examination<br />
in a new file, To.Check.<br />
6.31 File, Date, Page and Line References<br />
.FILE. is <strong>the</strong> system variable that refers <strong>to</strong> <strong>the</strong> current P-<strong>STAT</strong> system file. It can be used <strong>to</strong> pass <strong>the</strong> filename <strong>to</strong><br />
<strong>the</strong> TITLE command or as an argument in <strong>the</strong> FIRST and LAST functions. FIRST and LAST are usually used for<br />
processing groups of related cases. When FIRST and LAST are used with .FILE. as <strong>the</strong> argument, <strong>the</strong>y test for<br />
<strong>the</strong> beginning and end of <strong>the</strong> file:<br />
[ IF FIRST (.FILE.), GENERATE #Children = 0 ;<br />
IF Age LT 16, INCREASE #Children ;<br />
IF LAST (.FILE.), RETAIN ;<br />
KEEP School.District #Children ]
<strong>PPL</strong>: Functions and System Variables 6.19<br />
The statement “IF FIRST (.FILE.)” is true only when <strong>the</strong> first case of a file is processed. Similarly <strong>the</strong> statement<br />
“IF LAST (.FILE.)” is true only when <strong>the</strong> last case of a file is processed. FIRST, LAST, and .FILE. have extensive<br />
uses in across-case data modification and are discussed in detail in <strong>the</strong> chapter “<strong>PPL</strong>: Across Case Modifications”.<br />
.DATE. is <strong>the</strong> current date. Its value is set when <strong>the</strong> current command begins. .DATE. is in character form,<br />
and thus a variable generated or set <strong>to</strong> .DATE. must be of character data type:<br />
[ GENERATE Today:C = .DATE. ]<br />
The exact string produced by .DATE. depends upon <strong>the</strong> computer on which P-<strong>STAT</strong> is running. Using .NDATE.<br />
requests <strong>the</strong> numeric form of <strong>the</strong> date. Note: .NDATE. returns a 4 digit year.<br />
.PAGE. and .TIME. reference <strong>the</strong> current page number since <strong>the</strong> command began and <strong>the</strong> current time when<br />
<strong>the</strong> command began. .CPAGE. resets <strong>the</strong> page number at each command ra<strong>the</strong>r than at each run. .RPAGE. sets <strong>the</strong><br />
page number within a run. The page value is numeric. The time value is character in <strong>the</strong> form “11:34:05” (hours:<br />
minutes: seconds). .NTIME. requests <strong>the</strong> numeric values of time (without colons). These four system variables<br />
are often used in <strong>to</strong>p and bot<strong>to</strong>m titles. Exact, run and command values, as well as numeric and character values,<br />
may also be requested.<br />
Manipulating date and time values is covered in a separate chapter. It describes 40 functions and 10 commands<br />
for formatting date/time values, finding <strong>the</strong> difference between date/time values, etc.
6.20 <strong>PPL</strong>: Functions and System Variables<br />
<strong>PPL</strong><br />
SUMMARY<br />
Functions are part of <strong>the</strong> P-<strong>STAT</strong> programming language. Function arguments are enclosed in<br />
paren<strong>the</strong>ses:<br />
LIST File109<br />
[ SET Usage = LOG ( Usage ) ;<br />
SET CPU.Time = PLACES ( CPU.Time, 2 ) ] $<br />
<strong>PPL</strong> Functions: Numeric — Single Expression<br />
The following functions require a single numeric expression as an argument:<br />
ABS (exp)<br />
gives <strong>the</strong> absolute value of <strong>the</strong> expression.<br />
COS (exp)<br />
gives <strong>the</strong> cosine of <strong>the</strong> expression.<br />
ACOS (exp)<br />
gives <strong>the</strong> arc cosine of <strong>the</strong> expression.<br />
EXP (exp)<br />
raises e <strong>to</strong> <strong>the</strong> exponent which is <strong>the</strong> value of <strong>the</strong> expression.<br />
FACTORIAL (exp)<br />
The FACTORIAL function yields <strong>the</strong> fac<strong>to</strong>rial value of <strong>the</strong> argument. This is often shown as N!.<br />
FRAC (exp)<br />
gives <strong>the</strong> fractional part of <strong>the</strong> numerical expression.<br />
INT (exp)<br />
gives <strong>the</strong> integer part of <strong>the</strong> numerical expression.<br />
LOC (exp)<br />
gives <strong>the</strong> location of <strong>the</strong> variable specified in <strong>the</strong> expression. The location is <strong>the</strong> position of <strong>the</strong> variable<br />
in <strong>the</strong> file, counting from <strong>the</strong> left.<br />
LOG (exp)<br />
gives <strong>the</strong> natural log (base e) of <strong>the</strong> numerical expression.<br />
LOG10 (exp)<br />
gives <strong>the</strong> common log (base 10) of <strong>the</strong> numerical expression.<br />
vnp=var name/position nn=number vn=variable name exp=expression
<strong>PPL</strong>: Functions and System Variables 6.21<br />
ROUND (exp)<br />
rounds <strong>the</strong> numerical expression <strong>to</strong> <strong>the</strong> nearest integer.<br />
SIN (exp)<br />
gives <strong>the</strong> sine of <strong>the</strong> numerical expression.<br />
ASIN (exp)<br />
gives <strong>the</strong> arc sine of <strong>the</strong> numerical expression.<br />
SQRT (exp)<br />
gives <strong>the</strong> square root of <strong>the</strong> numerical expression.<br />
TAN (exp)<br />
gives <strong>the</strong> tangent of <strong>the</strong> numerical expression.<br />
ATAN (exp)<br />
gives <strong>the</strong> arc tangent of <strong>the</strong> numerical expression.<br />
<strong>PPL</strong> Functions: Numeric — List<br />
The following functions operate on a list of numeric variables, which may be referenced by name, position,<br />
ranges, and wildcards. Functions will return missing if any variable is missing unless “.GOOD” is<br />
<strong>the</strong> suffix. When that is <strong>the</strong> case, <strong>the</strong> result will be based on all non-missing (good) values. At least one<br />
variable name or position (vnp) is required in <strong>the</strong> list.<br />
MAX (vnp list)<br />
gives <strong>the</strong> maximum value of <strong>the</strong> variables in <strong>the</strong> list:<br />
[ GEN Larger = MAX ( Length Girth ) ]<br />
MAX.GOOD (vnp list)<br />
gives <strong>the</strong> maximum value of <strong>the</strong> non-missing variables in <strong>the</strong> list.<br />
MEAN (vnp list)<br />
gives <strong>the</strong> arithmetic mean of <strong>the</strong> variables in <strong>the</strong> list:<br />
[ GEN Mean.Weight = MEAN ( V(1) .ON. ) ]<br />
MEAN.GOOD (vnp list)<br />
gives <strong>the</strong> arithmetic mean of <strong>the</strong> non-missing variables in <strong>the</strong> list.<br />
MIN (vnp list)<br />
gives <strong>the</strong> minimum value of <strong>the</strong> variables in <strong>the</strong> list.<br />
MIN.GOOD (vnp list)<br />
gives <strong>the</strong> minimum value of <strong>the</strong> non-missing variables in <strong>the</strong> list.<br />
SDEV (vnp list)<br />
gives <strong>the</strong> standard deviation of <strong>the</strong> variables in <strong>the</strong> list.<br />
vn=variable name exp=expression vnp=var name/position nn=number
6.22 <strong>PPL</strong>: Functions and System Variables<br />
SDEV.GOOD (vnp list)<br />
gives <strong>the</strong> standard deviation of <strong>the</strong> non-missing variables in <strong>the</strong> list.<br />
SUM (vnp list)<br />
gives <strong>the</strong> sum of <strong>the</strong> variables in <strong>the</strong> list:<br />
GEN Score = SUM ( Test? ) / 3 ;<br />
SUM.GOOD (vnp list)<br />
gives <strong>the</strong> sum of <strong>the</strong> non-missing variables in <strong>the</strong> list.<br />
<strong>PPL</strong> Functions: Numeric — Special<br />
The following special functions require one expression that describes <strong>the</strong> domain of <strong>the</strong> function (usually<br />
a variable) and one or more extra arguments, depending on <strong>the</strong> particular function.<br />
COMBINATIONS (exp, exp )<br />
COMBINATIONS (n,k) returns <strong>the</strong> number of different ways that K things can be taken from N things;<br />
i.e., N things K at a time. For example, combinations(5,2) is 10, namely, 1:2, 1:3, 1:4, 1:5, 2:3, 2:4, 2:5,<br />
3:4, 3:5 and 4:5.<br />
DIF (exp, nn)<br />
gives <strong>the</strong> difference between <strong>the</strong> current value of <strong>the</strong> numeric variable designated in <strong>the</strong> expression and<br />
<strong>the</strong> value of that variable nn cases back:<br />
GEN Difference.2 = DIF ( Gross.Profit, 2 ) ;<br />
The number nn must be a positive integer constant not exceeding 500.<br />
LAG (exp, nn)<br />
gives <strong>the</strong> value of <strong>the</strong> numeric variable, designated in <strong>the</strong> expression, nn cases back:<br />
GEN Gross.Last.Yr = LAG ( Gross.Profit, 1 ) ;<br />
The number nn, <strong>the</strong> number of cases <strong>to</strong> “lag” back, must be a positive integer constant not exceeding 500.<br />
MOD (exp, nn)<br />
gives <strong>the</strong> remainder after <strong>the</strong> numeric expression has been divided by <strong>the</strong> positive constant (nn):<br />
SET Time.Hours = MOD ( Ship.Time, 12 ) ;<br />
This is sometimes called modular arithmetic.<br />
NCOT (exp, n-chotimization instructions)<br />
recodes <strong>the</strong> numeric variable specified in <strong>the</strong> expression according <strong>to</strong> <strong>the</strong> instructions given in <strong>the</strong> second<br />
argument:<br />
GEN Age = NCOT ( Age, 10, 20, 30, 40 ) ; or<br />
GEN Age = NCOT ( Age, 10, 40/10 ) ;<br />
Both <strong>the</strong> preceding instructions do an N-way dicho<strong>to</strong>mization or division of <strong>the</strong> variable values. All values<br />
of age less than or equal <strong>to</strong> 10 become 1, those less than or equal <strong>to</strong> 20 become 2, and so on up <strong>to</strong><br />
values of 40, which become 4. Above 40 becomes a 5.<br />
vnp=var name/position nn=number vn=variable name exp=expression
<strong>PPL</strong>: Functions and System Variables 6.23<br />
NUMEX (exp, 'XX00')<br />
extracts specific digits from a numeric variable value and yields a numeric representation of those digits.<br />
NUMEX operates only on <strong>the</strong> integer portion of <strong>the</strong> number — any fractional portion and sign are ignored.<br />
The two required arguments are a numeric expression and a character string mask enclosed in<br />
quotes:<br />
GEN Month = NUMEX (Date, 'XX00' ) ;<br />
The selection mask is composed of X and 0 (zero) characters and may be up <strong>to</strong> nine characters in length.<br />
An X retains a digit and a 0 drops a digit. The selection mask is aligned with <strong>the</strong> right-most digit of <strong>the</strong><br />
numeric value. Lead zeros are not retained in <strong>the</strong> output number. Thus, <strong>the</strong> selection mask “XX00X”<br />
applied <strong>to</strong> “156” yields <strong>the</strong> number 6. The character function CHAREX may be used if lead zeros are<br />
needed in <strong>the</strong> result.<br />
PLACES (exp, nn)<br />
sets <strong>the</strong> variable specified in <strong>the</strong> numeric expression <strong>to</strong> <strong>the</strong> number of places specified by <strong>the</strong> second argument,<br />
which must be a positive integer not greater than 9.<br />
GEN ##N = 1.2345 $<br />
PUT ##N > ( PLACES ( ##N, 1 )) > (PLACES ( ##N,3 )) $<br />
produces <strong>the</strong> following line:<br />
1.2345 1.2 1.235<br />
<strong>PPL</strong> Functions: Character and Numeric<br />
COUNT.GOOD (vnp, vnp)<br />
gives <strong>the</strong> number of non-missing values in <strong>the</strong> list of expressions. Only variable names or positions may<br />
be in <strong>the</strong> list.<br />
FIRST.GOOD (vnp, vnp)<br />
gives <strong>the</strong> value of <strong>the</strong> first non-missing variable in <strong>the</strong> list of expressions. Only variable names or positions<br />
may be in <strong>the</strong> list.<br />
GEN Date = FIRST.GOOD (Date.1 TO Date.4) ;<br />
LAST.GOOD (vnp, vnp)<br />
gives <strong>the</strong> value of <strong>the</strong> last non-missing variable in <strong>the</strong> list of expressions. Only variable names or positions<br />
may be in <strong>the</strong> list.<br />
FIRST (.FILE. or vn)<br />
is evaluated as true if it is <strong>the</strong> first case in <strong>the</strong> subgroup specified in <strong>the</strong> expression, and false if it is not<br />
<strong>the</strong> first case. The required expression is a variable name (vn) or a list of up <strong>to</strong> 5 variables, or <strong>the</strong> system<br />
value .FILE. (meaning <strong>the</strong> current file):<br />
IF FIRST (Grade, Sex), INC #Counter ;<br />
Changing values of <strong>the</strong> variable or variables define different subgroups.<br />
LAST (.FILE. or vn)<br />
is evaluated as true if it is <strong>the</strong> last case in <strong>the</strong> subgroup specified in <strong>the</strong> expression, and false if it is not<br />
<strong>the</strong> last case. The required expression is a variable name (vn) or a list of up <strong>to</strong> 5 variables, or <strong>the</strong> system<br />
vn=variable name exp=expression vnp=var name/position nn=number
6.24 <strong>PPL</strong>: Functions and System Variables<br />
value .FILE. (meaning <strong>the</strong> current file). Changing values of <strong>the</strong> variable or variables define different<br />
subgroups.<br />
RECODE (exp, recode instructions )<br />
recodes <strong>the</strong> character or numeric variable specified in <strong>the</strong> expression according <strong>to</strong> <strong>the</strong> instructions given<br />
in <strong>the</strong> second argument:<br />
SET Height =<br />
RECODE ( Height, 0 TO 65 = 1, 65.1 TO 100 = 2, G = 3) ;<br />
All values of height from 0 through 65 become 1, and all values from 65.1 through 100 become 2. Any<br />
o<strong>the</strong>r GOOD values become 3. (See <strong>the</strong> fourth <strong>PPL</strong> chapter for a full explanation of RECODE.)<br />
<strong>PPL</strong> System Variables<br />
System variables are variables that are defined and set by P-<strong>STAT</strong>. Their names are enclosed between<br />
decimal points <strong>to</strong> distinguish <strong>the</strong>m from user-defined variables. P-<strong>STAT</strong> au<strong>to</strong>matically sets <strong>the</strong> values of<br />
<strong>the</strong> system variables as a run progresses. Usually <strong>the</strong> values may not be changed by users, but <strong>the</strong>y may<br />
be accessed, tested and assigned <strong>to</strong> o<strong>the</strong>r variables. (System variables used especially in titles are fur<strong>the</strong>r<br />
described in <strong>the</strong> TITLES chapter.)<br />
.CHARACTER.<br />
.DATE.<br />
.e.<br />
.FILE.<br />
.G.<br />
.HERE.<br />
.M.<br />
is <strong>the</strong> list of character variables in a file. .CHARACTER. is used in KEEP and DROP selections:<br />
DROP .CHARACTER.;<br />
is <strong>the</strong> current date. Its value is set when <strong>the</strong> current command begins, and it is in character form:<br />
GENERATE Today:C = .DATE. ;<br />
It is equivalent <strong>to</strong> .CDATE. (<strong>the</strong> command date).<br />
is <strong>the</strong> system value for e, <strong>the</strong> base of natural logs. It equals 2.718281828.<br />
is <strong>the</strong> current P-<strong>STAT</strong> system file. Its value is <strong>the</strong> name of that file. It is used as <strong>the</strong> argument for <strong>the</strong><br />
functions FIRST and LAST, and also in titles.<br />
is a good or non-missing variable value. It tests whe<strong>the</strong>r good data is present in an expression:<br />
IF Test.Score EQ .G., RETAIN;<br />
is <strong>the</strong> count of <strong>the</strong> number of cases actually processed thus far by <strong>the</strong> current <strong>PPL</strong> clause.<br />
is a missing or non-good variable value. It is used <strong>to</strong> test whe<strong>the</strong>r missing data is present in an expression.<br />
.M. refers collectively <strong>to</strong> all three types of missing; it is <strong>the</strong> opposite of .G. (above).<br />
vnp=var name/position nn=number vn=variable name exp=expression
<strong>PPL</strong>: Functions and System Variables 6.25<br />
.M1., .M2., .M3.<br />
.N.<br />
.NEW.<br />
.NUMERIC.<br />
.NV.<br />
.ON.<br />
.OTHERS.<br />
.PAGE.<br />
.PI.<br />
.PUT.<br />
.TIME.<br />
are missing variable values of three types: MISSING1, MISSING2 and MISSING3. .M1., .M2. and .M3.<br />
are used for logical testing within an IF phrase and for recoding.<br />
is <strong>the</strong> case counter. Its value is <strong>the</strong> current case number after case (row) selection.<br />
are all variables newly generated in all <strong>PPL</strong> clauses in <strong>the</strong> current phrase. Its value is all of <strong>the</strong> names of<br />
<strong>the</strong>se new variables. .NEW. is used in KEEP and DROP selections:<br />
GEN Average = MEAN.GOOD ( Value? ) ;<br />
GEN Total = SUM.GOOD ( Value? ) ;<br />
KEEP ID .NEW. .OTHERS. ;<br />
is <strong>the</strong> list of numeric variables in a file. .NUMERIC. is used in KEEP and DROP selections:<br />
KEEP .NUMERIC. ;<br />
is <strong>the</strong> current number of variables in <strong>the</strong> file.<br />
is used in case and variable selection and in DO loops <strong>to</strong> indicate from here onward through <strong>the</strong> last case<br />
or variable:<br />
DO #J USING 1 .ON. ;<br />
IF V(#J) GOOD, SET V(#J) = V(#J)/10 );<br />
ENDDO;<br />
are all variables o<strong>the</strong>r than those explicitly referenced in a KEEP or DROP selection. It is used in reordering<br />
variables:<br />
KEEP SS.Number Department .OTHERS. Final.Grade;<br />
is <strong>the</strong> current page number since <strong>the</strong> command began. It is equivalent <strong>to</strong> .CPAGE. .RPAGE. is <strong>the</strong> current<br />
page number since <strong>the</strong> run or P-<strong>STAT</strong> session began.<br />
is <strong>the</strong> system value for pi. It equals 3.141592654.<br />
is <strong>the</strong> PUT counter. Its value is <strong>the</strong> number of times PUT was invoked in <strong>the</strong> current case.<br />
is <strong>the</strong> current time. Its value is set when <strong>the</strong> current command begins, and it is in character form:<br />
GENERATE Time:C = .TIME.;<br />
It is equivalent <strong>to</strong> .CTIME. (<strong>the</strong> command time).<br />
vn=variable name exp=expression vnp=var name/position nn=number
6.26 <strong>PPL</strong>: Functions and System Variables<br />
.USED.<br />
is <strong>the</strong> number of cases used after all <strong>PPL</strong> clauses are processed. The count does not include cases that<br />
are deleted because of logical tests.<br />
O<strong>the</strong>r Date and Time System Variables.<br />
The system variables .DATE. and .TIME. may be prefaced with N, X, R or C:<br />
.NDATE. .NTIME.<br />
.XDATE. .NXDATE. .XTIME. .NXTIME.<br />
.RDATE. .NRDATE. .RTIME. .NRTIME.<br />
.CDATE. .NCDATE. .CTIME. .NCTIME.<br />
The N specifies <strong>the</strong> numeric form of <strong>the</strong> date or time, ra<strong>the</strong>r than <strong>the</strong> character form. The X specifies <strong>the</strong><br />
exact date or time when <strong>the</strong> system variable is processed. The R specifies <strong>the</strong> run date or time — when<br />
<strong>the</strong> current run began. The C specifies <strong>the</strong> command date or time — when <strong>the</strong> current command began.<br />
The numeric form of exact, run, and command dates or times may also be specified. The dates and times<br />
are printed as <strong>the</strong>y are represented in <strong>the</strong> computer system on which P-<strong>STAT</strong> is being used.<br />
Note: The numeric forms of <strong>the</strong> date now all have <strong>the</strong> year returned as 4 digits in preparation for <strong>the</strong> year<br />
2000.<br />
vnp=var name/position nn=number vn=variable name exp=expression
7<br />
Random Number and<br />
Distribution Functions<br />
This chapter covers three different groups of functions: random number functions; distribution functions and functions<br />
which can be used <strong>to</strong> handle <strong>the</strong> “fuzzy equals” problem.l<br />
7.1 RANDOM NUMBER FUNCTIONS<br />
The <strong>PPL</strong> functions, RANNORM, RANUNI, RANBIN and RANTABLE, generate random (“pseudo” random)<br />
numbers from, respectively, <strong>the</strong> normal distribution, <strong>the</strong> uniform distribution, <strong>the</strong> binomial distribution and a user's<br />
tabled distribution. The random numbers may be used for many purposes, such as generating random data,<br />
selecting a random subset of cases from a file or assigning cases <strong>to</strong> ei<strong>the</strong>r a control or experimental treatment. Examples<br />
illustrating <strong>the</strong>se tasks follow <strong>the</strong> basic explanations.<br />
In a normal distribution, <strong>the</strong> random numbers are normal deviates (“standard scores”) that range from -6<br />
through +6 and <strong>the</strong> probability of obtaining specific values depends on <strong>the</strong> area under <strong>the</strong> normal curve. In a uniform<br />
or rectangular distribution, <strong>the</strong> random numbers range from zero through one and <strong>the</strong> probability of obtaining<br />
any value equals <strong>the</strong> probability of obtaining any o<strong>the</strong>r value. (The random numbers do not include <strong>the</strong> exact values<br />
zero and one.)<br />
In a binomial distribution, <strong>the</strong> random numbers are observations from a binomial distribution with <strong>the</strong> specified<br />
order — that is, <strong>the</strong>y are integers that range from 0 <strong>to</strong> <strong>the</strong> order of <strong>the</strong> binomial distribution. The probability<br />
depends on <strong>the</strong> likelihood of <strong>the</strong> possible observations and <strong>the</strong> probability of a single event ( a “win”), which is<br />
assumed <strong>to</strong> be .5 unless ano<strong>the</strong>r probability is supplied. In a user's tabled distribution, <strong>the</strong> random numbers are<br />
observations (integers) that range from one <strong>to</strong> <strong>the</strong> order of <strong>the</strong> distribution specified by <strong>the</strong> user. The probability<br />
of <strong>the</strong> various observations is also specified by <strong>the</strong> user.<br />
The arguments for any of <strong>the</strong> random number functions are: 1) an initial seed control argument, 2) three optional<br />
scratch variables, and 3) any function specific arguments. The initial argument controls how <strong>the</strong> seed<br />
functions that prime <strong>the</strong> random number genera<strong>to</strong>r are obtained. NOTE: <strong>the</strong> arguments are initialized at <strong>the</strong> beginning<br />
of <strong>the</strong> command. Except for <strong>the</strong> <strong>PPL</strong> command this is when <strong>the</strong> first case is processed. A BRANCH in<br />
a macro back <strong>to</strong> a location outside of <strong>the</strong> command which is generating <strong>the</strong> numbers causes <strong>the</strong> arguments <strong>to</strong> be<br />
re-initialized. Possible first argument values are:<br />
0 different seed values obtained from <strong>the</strong> current date and time are used<br />
-1 same default seed values are used every time <strong>the</strong> function is used<br />
-3 three seed values are supplied by <strong>the</strong> user as <strong>the</strong> next three arguments<br />
When 0 is specified, three seed values obtained from <strong>the</strong> current date and time are used <strong>to</strong> start <strong>the</strong> number<br />
genera<strong>to</strong>r. The seed values and <strong>the</strong> random numbers <strong>the</strong>y generate differ each time:<br />
RANNORM ( 0 )<br />
When -1 is specified, three default seed values are used — <strong>the</strong>y are <strong>the</strong> same each time one of <strong>the</strong> random number<br />
functions is used:<br />
RANUNI ( -1 )<br />
The argument -1 is used only when <strong>the</strong> same “random” values are desired. This may be <strong>the</strong> case when a specific<br />
procedure involving random numbers must be repeated exactly. When -3 is specified as <strong>the</strong> first argument, three
7.2 Random Number and Distribution Functions<br />
seed values should be supplied as <strong>the</strong> next three arguments. The values should be three constants that are integers<br />
between 1 and 30,000:<br />
RANNORM ( -3, 912, 4508, 7 )<br />
Three scratch variables may be given as <strong>the</strong> next arguments for any of <strong>the</strong> random number functions:<br />
RANUNI ( 0, #S1, #S2, #S3 )<br />
When three scratch variables are supplied, <strong>the</strong> final seed values are saved as <strong>the</strong> values of <strong>the</strong> scratch variables.<br />
Thus, a subsequent run can use <strong>the</strong>se values as starting seeds and continue a progression. The scratch variables<br />
should be generated prior <strong>to</strong> using <strong>the</strong>m. (See <strong>the</strong> second RANTABLE example in <strong>the</strong> final paragraph of this section.)<br />
Finally, any function specific arguments follow — only RANBIN and RANTABLE require <strong>the</strong>se. Here,<br />
<strong>the</strong> “2” is <strong>the</strong> order of <strong>the</strong> binomial distribution:<br />
RANBIN ( -1, 2 )<br />
7.2 Normal and Uniform Distributions<br />
The RANNORM function may be used <strong>to</strong> generate a file of random numbers with a specific mean and standard<br />
deviation. First, a file with one case is built:<br />
MAKE Random, VAR Random.Number ;<br />
- $<br />
Then, that file is modified <strong>to</strong> produce <strong>the</strong> desired number of cases and <strong>to</strong> set <strong>the</strong> values <strong>to</strong> random numbers. The<br />
REPEAT instruction repeats <strong>the</strong> one case 100 times:<br />
MOD Random [<br />
REPEAT 100 ;<br />
SET Random.Number = ( RANNORM (0) * 2.8 ) + 24 ],<br />
OUT RandomX $<br />
The RANNORM function generates a standardized random number, a “Z-score” with mean 0 and standard deviation<br />
1. That number is multiplied by <strong>the</strong> desired standard deviation and <strong>the</strong>n added <strong>to</strong> <strong>the</strong> desired mean.<br />
The RANUNI function is often used <strong>to</strong> select a random sample of cases from a file. This command selects a<br />
random subset of one third of <strong>the</strong> original cases in file Subjects:<br />
MOD Subjects [<br />
GEN #Temp EQ RANUNI (0) ;<br />
IF #Temp LT .333334, RETAIN ], OUT Sub.3 $<br />
These instructions do sampling with replacement — <strong>the</strong>y select a random sample of five cases from a file of 100<br />
cases (<strong>the</strong> same case could be selected more than once):<br />
MOD FileA [<br />
GEN #N = MOD (.N., 100) + 1 ;<br />
IF #N EQ 2, GEN #R = ( RANUNI (0) * 100 ) + 1 ;<br />
IF #N EQ INT (#R), RETAIN ]<br />
+ FileA (*) + FileA (*) + FileA (*) + FileA (*),<br />
OUT FileB $<br />
The scratch variable #N (a pseudo case number) is generated equal <strong>to</strong> <strong>the</strong> MOD of <strong>the</strong> case number plus one <strong>to</strong><br />
get numbers running from 2 <strong>to</strong> 100 followed by 1. (The actual case numbers run from 1 <strong>to</strong> 500 when <strong>the</strong> files are<br />
concatenated using <strong>the</strong> “+” opera<strong>to</strong>r. After <strong>the</strong> MOD function, <strong>the</strong>y run from 1 <strong>to</strong> 99 followed by 0.)<br />
When <strong>the</strong> first case of <strong>the</strong> file is processed (that is, when #N = 2), a random number between zero and one is<br />
generated. It is multiplied by 100 and one is added <strong>to</strong> it. (Random numbers exactly equal <strong>to</strong> zero or one are not<br />
generated. By multiplying by 100 and adding one, <strong>the</strong> range of <strong>the</strong> random numbers shifts from 0-<strong>to</strong>-1 <strong>to</strong> 1-
Random Number and Distribution Functions 7.3<br />
through-100.) If #N equals <strong>the</strong> integer value of <strong>the</strong> random scratch variable, <strong>the</strong> case is selected. The file is read<br />
four more times and <strong>the</strong> same instructions are executed each time.<br />
7.3 Binary and User's Tabled Distributions<br />
The RANBIN function could be used <strong>to</strong> assign cases <strong>to</strong> ei<strong>the</strong>r a control or an experimental treatment group. This<br />
command does this:<br />
MOD Expermt5 [<br />
GEN #Bin = RANBIN ( 0, 2 ) ;<br />
IF #Bin EQ 1, SET Group = 'C', F.SET Group = 'E' ],<br />
OUT Expermt5 $<br />
#Bin is generated equal <strong>to</strong> a random observation from an order 2 binomial distribution — that is, from a binomial<br />
distribution that contains <strong>the</strong> integers 0, 1 and 2 in <strong>the</strong>se proportions .25, .5, and .25. (You could think of this as<br />
<strong>the</strong> distribution obtained when <strong>to</strong>ssing two coins. Zero heads are observed 25% of <strong>the</strong> time, one head 50% of <strong>the</strong><br />
time and two heads 25% of <strong>the</strong> time, when <strong>the</strong> probability of obtaining a head in a single <strong>to</strong>ss is .5.) When RAN-<br />
BIN returns a 1, which it does half <strong>the</strong> time, a case is assigned <strong>to</strong> <strong>the</strong> control group; when it returns a 0 or 2, it is<br />
assigned <strong>to</strong> <strong>the</strong> experimental group.<br />
The RANTABLE function is similar <strong>to</strong> RANBIN, except that <strong>the</strong> probabilities are set by <strong>the</strong> user. This<br />
command:<br />
MOD Expermt6<br />
[ SET Group = RANTABLE ( 0, 1, 2, 2 ) ], OUT Expermt6 $<br />
assigns cases <strong>to</strong> one of three groups, with <strong>the</strong> probability of assignment <strong>to</strong> group one being 1/5, group two 2/5 and<br />
group three 2/5. The arguments for RANTABLE after <strong>the</strong> initial seed control argument give <strong>the</strong> number of values<br />
in <strong>the</strong> distribution and <strong>the</strong> proportions in which <strong>the</strong>y are observed. In this example, <strong>the</strong>re are three function arguments<br />
(1, 2, 2), so <strong>the</strong>re are three values in <strong>the</strong> distribution (1, 2 and 3). The sum of <strong>the</strong> arguments divided by <strong>the</strong><br />
value of a single argument gives <strong>the</strong> proportion of that value in <strong>the</strong> distribution. For example, 1 / (1 + 2 + 2) is<br />
1/5, which is <strong>the</strong> proportion of <strong>the</strong> <strong>to</strong>tal observations that are ones.<br />
This command does <strong>the</strong> same task <strong>the</strong> prior command does, but it sets <strong>the</strong> seed values with <strong>the</strong> three constants<br />
following <strong>the</strong> -3 and saves <strong>the</strong>m in <strong>the</strong> supplied scratch variables:<br />
MOD Expermt6 [<br />
GEN #A = .M., GEN #B = .M., GEN #C = .M. ;<br />
SET Group =<br />
RANTABLE ( -3, 657, 1469, 20078, #A, #B, #C, 1, 2, 2 ) ;<br />
IF LAST ( .FILE. ),<br />
PUT #A > #B > #C ],<br />
OUT Expermt6 $<br />
The initial argument of -3 for RANTABLE specifies that <strong>the</strong> initial seed values are supplied as three constants<br />
The three scratch variables follow. The constants and scratch variables come directly after <strong>the</strong> initial seed control<br />
argument and before <strong>the</strong> function specific arguments. Alternatively, <strong>the</strong> three scratch variables could be generated<br />
equal <strong>to</strong> <strong>the</strong> three initial seed values and those constants could be omitted from <strong>the</strong> RANTABLE arguments.<br />
7.4 DISTRIBUTION FUNCTIONS<br />
Distribution or probability functions return <strong>the</strong> area under a distribution from <strong>the</strong> lower tail of <strong>the</strong> distribution <strong>to</strong><br />
<strong>the</strong> specified critical value. The area is <strong>the</strong> probability that a random value falls below this critical value. Subtracting<br />
this value from one yields <strong>the</strong> significance level for a one-tailed test — that is, <strong>the</strong> percentage of <strong>the</strong><br />
distribution in <strong>the</strong> upper tail. To obtain <strong>the</strong> significance level for a two-tailed test, subtract <strong>the</strong> probability from<br />
one and multiply by two:
7.4 Random Number and Distribution Functions<br />
( 1 - PROBNORM ( ABS (nn), df) ) * 2<br />
Inverse probability functions return <strong>the</strong> critical value corresponding <strong>to</strong> <strong>the</strong> probability or area under <strong>the</strong> distribution<br />
that is supplied as <strong>the</strong> function argument. The critical value is <strong>the</strong> value that must be obtained for<br />
significance at one minus <strong>the</strong> supplied probability.<br />
7.5 Probability Distributions<br />
The probability functions may have expressions as <strong>the</strong>ir arguments. The expressions should reduce <strong>to</strong> one or more<br />
arguments appropriate for <strong>the</strong> function. These are <strong>the</strong> probability functions and <strong>the</strong>ir arguments:<br />
1. PROBBIN ( nn, n, p ) Binomial Distribution<br />
computes <strong>the</strong> probability that a variable from a binomial (Bernoulli) distribution with probability p<br />
and size or degree n is less than or equal <strong>to</strong> <strong>the</strong> first argument nn:<br />
PROBBIN ( 4, 10, .5 ) = .376953125<br />
This is <strong>the</strong> probability of getting four or fewer tails in ten <strong>to</strong>sses of a coin. (This is <strong>the</strong> same as <strong>the</strong><br />
probability of getting six or more heads.) The probability of a single value is <strong>the</strong> difference between<br />
two successive values:<br />
PROBBIN ( 4, 10, .5 ) - PROBBIN ( 3, 10, .5 ) = .205078125<br />
This is <strong>the</strong> probability of getting exactly four tails in ten <strong>to</strong>sses of a coin. (This is <strong>the</strong> same as <strong>the</strong><br />
probability of getting exactly six heads.)<br />
2. PROBCHI ( nn, df ) Chi-square Distribution<br />
computes <strong>the</strong> probability that a random variable from a chi-square distribution with degrees of freedom<br />
df is less than <strong>the</strong> specified argument:<br />
PROBCHI ( 31.264, 11 ) = .999<br />
Degrees of freedom must be an integer.<br />
3. PROBF ( nn, df1, df2 ) F Distribution<br />
computes <strong>the</strong> probability that a variable from an F distribution with numera<strong>to</strong>r degrees of freedom<br />
df1 and denomina<strong>to</strong>r degrees of freedom df2 is less than <strong>the</strong> specified argument:<br />
PROBF ( 3.32, 2, 30 ) = .950170464<br />
Degrees of freedom may be a whole or fractional number.<br />
4. PROBNORM ( nn ) Normal Distribution<br />
computes <strong>the</strong> probability that a random variable from a normal distribution is less than <strong>the</strong> specified<br />
argument:<br />
PROBNORM ( -1.96 ) = .02499789530314<br />
The critical value -1.96 is significant at <strong>the</strong> .025 level for a one-tail test and at <strong>the</strong> .05 level for a twotail<br />
or non-directional test.<br />
The argument for PROBNORM should be a deviate from a normal distribution with a mean of zero<br />
and standard deviation of one — that is, a standard score between -6 and +6.<br />
5. PROBPOIS ( nn, lambda ) Poisson Distribution<br />
computes <strong>the</strong> probability that a variable from a Poisson distribution is less than or equal <strong>to</strong> <strong>the</strong> first<br />
argument. Lambda is <strong>the</strong> mean of <strong>the</strong> distribution. The mean in this example is 1.12 — it is <strong>the</strong>
Random Number and Distribution Functions 7.5<br />
number of defects per length of material:<br />
PROBPOIS ( 2, 1.12 ) = .896355852<br />
PROBPOIS ( 2, 1.12 ) - PROBPOIS (1, 1.12 ) = .204642687<br />
.8964 is <strong>the</strong> probability of finding two or fewer defects in a length of material. .2046 is <strong>the</strong> probability<br />
of finding exactly 2 defects.<br />
6. PROBT ( nn, df ) t Distribution<br />
computes <strong>the</strong> probability that a random variable from a t distribution is less than <strong>the</strong> first argument<br />
nn when degrees of freedom equal <strong>the</strong> second argument df. This is <strong>the</strong> probability that a random<br />
variable is less than 2.179:<br />
PROBT ( 2.179, 12 ) = .975008377<br />
The significance level for a two-tail test is 1 minus <strong>the</strong> probability times 2:<br />
(1 - PROBT ( 2.179, 12 ) ) * 2 ) = .049983245959<br />
A critical value of 2.179 is significant at <strong>the</strong> .025 level for a one-tail test and at <strong>the</strong> .05 level for a<br />
two-tail test (.025 in each tail) when <strong>the</strong> degrees of freedom are 12.<br />
The first argument for PROBT should be a deviate or critical value from student's t distribution with<br />
a mean of zero and standard deviation of one. The degrees of freedom may be a whole or fractional<br />
number.<br />
7.6 Inverse Probability Distributions<br />
The inverse probability functions may have expressions as <strong>the</strong>ir arguments. However, <strong>the</strong> expressions should reduce<br />
<strong>to</strong> one or more arguments appropriate for <strong>the</strong> function. These are <strong>the</strong> inverse probability functions and <strong>the</strong>ir<br />
arguments:<br />
1. INVBIN ( nn, n, p ) Inverse Binomial Distribution<br />
INVBIN.RT ( nn, n, p ) Inverse Binomial Distribution — Right Tail<br />
returns <strong>the</strong> observation from <strong>the</strong> binomial distribution with probability p and size or degree n whose<br />
area is nn:<br />
INVBIN ( .38, 10, .5 ) = 4<br />
INVBIN.RT ( .38, 10, .5 ) = 6<br />
Approximately 38% of <strong>the</strong> time, when <strong>to</strong>ssing 10 coins, you will get 4 or fewer tails and 6 or more<br />
heads. INVBIN.RT returns an observation from <strong>the</strong> right tail of <strong>the</strong> binomial distribution. INVBIN<br />
is <strong>the</strong> inverse of <strong>the</strong> PROBBIN function.<br />
2. INVCHI ( nn, df ) Inverse Chi-Square Distribution<br />
returns <strong>the</strong> critical value from <strong>the</strong> chi-square distribution with degrees of freedom df and whose area<br />
is <strong>the</strong> argument nn:<br />
INVCHI ( .999, 11 ) = 31.2641339<br />
INVCHI is <strong>the</strong> inverse of <strong>the</strong> PROBCHI function.<br />
3. INVF ( nn, df1, df2 ) Inverse F Distribution<br />
returns <strong>the</strong> critical value from <strong>the</strong> F distribution with degrees of freedom df1 and df2 whose area is<br />
<strong>the</strong> argument nn:
7.6 Random Number and Distribution Functions<br />
INVF ( .95, 2, 30 ) = 3.315829544<br />
INVF is <strong>the</strong> inverse of <strong>the</strong> PROBF function.<br />
4. INVNORM ( nn ) Inverse Normal or Probit Distribution<br />
returns <strong>the</strong> deviate or critical value from <strong>the</strong> normal distribution whose area is <strong>the</strong> specified argument.<br />
PROBIT is a synonym:<br />
PROBIT ( 0.025 ) = -1.959964<br />
A critical value of -1.96 or less is required for a one-tail test with a significance level of .025, or a<br />
value of 1.96 or greater is required if a difference in <strong>the</strong> opposite direction is expected. For a twotail<br />
or non-directional test with a significance level of .05, a critical value of -1.96 or less or 1.96 or<br />
more is required (.025 in each of <strong>the</strong> two tails).<br />
The argument for INVNORM is an area, measured from <strong>the</strong> lower tail of <strong>the</strong> normal distribution,<br />
that is <strong>the</strong> probability of obtaining a value less than <strong>the</strong> calculated deviate. It should be a number<br />
between 0 and 1. This function is <strong>the</strong> inverse of PROBNORM.<br />
5. INVPOIS ( nn, lambda ) Inverse Poisson Distribution<br />
INVPOIS.RT ( nn, lambda ) Inverse Poisson Distribution — Right Tail<br />
returns <strong>the</strong> observation from <strong>the</strong> Poisson distribution with mean lambda whose area is <strong>the</strong> argument<br />
nn:<br />
INVPOIS ( .9, 1.12 ) = 2<br />
Approximately 90% of <strong>the</strong> time, 2 or fewer defects will be found in a unit length of material with<br />
1.12 defects per unit. INVPOIS.RT returns an observation from <strong>the</strong> right tail of <strong>the</strong> Poisson distribution.<br />
INVPOIS is <strong>the</strong> inverse of <strong>the</strong> PROBPOIS function.<br />
6. INVT ( nn, df ) Inverse t Distribution<br />
returns <strong>the</strong> critical value from <strong>the</strong> t distribution with degrees of freedom df whose area is <strong>the</strong> argument<br />
nn:<br />
INVT ( .975, 12 ) = 2.178812725<br />
A critical value of 2.179 is required for a one-tail test with a significance level of .025 or a two-tail<br />
test with a significance level of .05. INVT is <strong>the</strong> inverse of <strong>the</strong> PROBT function.<br />
7.7 THE FUZZY EQUALS PROBLEM<br />
The internal representation of fractional decimal numbers in a binary computer can be exact for numbers (like .5<br />
or .75) that can be expressed as sums of reciprocals of powers of two. This is true up <strong>to</strong> a point: .5 + 1/2**53 is<br />
accurate on a pentium chip (which uses 53 bits <strong>to</strong> represent <strong>the</strong> fractional part), but .5 + 1/2**54 and beyond would<br />
not be accurately represented.<br />
Most fractional numbers however cannot be represented accurately. Computation involving <strong>the</strong>m is consequently<br />
approximate. It is quite possible for two different sequences of calculation that ‘should’ produce <strong>the</strong> same result<br />
<strong>to</strong> instead produce results that differ slightly, perhaps by one bit, sometimes by several.<br />
For example, consider this P-<strong>STAT</strong> statement.<br />
IF .1 + .2 EQ .3, PUT ‘YES’, F.PUT ‘NO’ $<br />
This ought <strong>to</strong> say YES, but on a Pentium PC it says NO because <strong>the</strong>y are not quite <strong>the</strong> same: a HEX display of <strong>the</strong><br />
result of adding .1 and .2 is one bit different from a HEX display of .3, and a one-bit difference prevents an equal<br />
result. This is not a P-<strong>STAT</strong> effect: exactly <strong>the</strong> same thing occurs in a trivial C or Fortran 95 program.
Random Number and Distribution Functions 7.7<br />
There may be situations when a FUZZY compare ra<strong>the</strong>r than an EXACT compare is appropriate. An exact compare<br />
returns equal only when <strong>the</strong> two numbers being compared are exactly <strong>the</strong> same. A fuzzy compare would<br />
accept as equal two numbers that are VERY close. The question is: how close ?<br />
Logical opera<strong>to</strong>rs like EQ and GT now have optional extensions like EQ.2 or GT.5 which cause <strong>the</strong> compare <strong>to</strong><br />
be fuzzy. For example, using EQ.2 ra<strong>the</strong>r than just EQ will treat two numbers as equal if <strong>the</strong>y are no more than<br />
two steps apart.<br />
We use ‘step’ <strong>to</strong> mean moving from a given 64-bit double-precision floating-point number <strong>to</strong> <strong>the</strong> next representable<br />
number. An upwards step from 0.1 is slightly more than 0.1, a downwards step is slightly less.<br />
A step can best be seen by using HEX notation. The HEX representation of <strong>the</strong> 64-bit value 0.1 is 3FB9 9999<br />
9999 999A. Each HEX character represents 4 bits; <strong>the</strong> characters 0-9 and A-F are used <strong>to</strong> show <strong>the</strong> 16 possible<br />
forms of 4 bits. Note: <strong>the</strong> actual 64-bit internal representation of 0.1 may differ slightly on computers using differing<br />
chips and compilers.<br />
The ending ‘A’ shows that <strong>the</strong> last 4 bits of 0.1 are 1010. The value one STEP.UP from 0.1 would be one bit<br />
greater; in this case it would have <strong>the</strong> same initial 15 bytes, and <strong>the</strong> final byte would be 1011, one bit more. A<br />
step affects <strong>the</strong> 15th or 16th significant digit on a Pentium type of chip. For example, it takes 2 steps <strong>to</strong> go<br />
from 30.11122233344411<br />
<strong>to</strong> 30.11122233344412 which differs in <strong>the</strong> 16th decimal digit.<br />
7.8 The Fuzzy Functions<br />
Four new functions have been added <strong>to</strong> manipulate such numbers.<br />
1. HEX ( number ) produces <strong>the</strong> HEX representation of <strong>the</strong> input in a character*16<br />
result.<br />
2. STEP.UP ( number, n ) produces <strong>the</strong> number that is N steps up from <strong>the</strong> input value. The<br />
second argument, <strong>the</strong> number of steps, can be from zero <strong>to</strong> 9999. If<br />
omitted, it defaults <strong>to</strong> one.<br />
3. STEP.DOWN( number, n ) produces <strong>the</strong> number that is N steps down from <strong>the</strong> input value. The<br />
second argument, <strong>the</strong> number of steps, can be from zero <strong>to</strong> 9999. If<br />
omitted, it defaults <strong>to</strong> one.<br />
4. STEPS ( nn1, nn2 ) produces <strong>the</strong> number of steps from <strong>the</strong> smaller of NN1 and NN2 <strong>to</strong><br />
<strong>the</strong> larger. Missing 3 is returned if more than one million steps separate<br />
<strong>the</strong> arguments.<br />
put ( HEX( .1 ))$ is ‘3FB999999999999A’<br />
put ( HEX( STEP.UP(.1 )))$ is ‘3FB999999999999B’<br />
put ( HEX( STEP.UP(.1, 2)))$ is ‘3FB999999999999C’<br />
put ( STEPS ( STEP.DOWN(.1), STEP.UP(.1) ))$ is 2<br />
7.9 Fuzzy Logical Opera<strong>to</strong>rs<br />
There are 6 logical opera<strong>to</strong>rs: GT, GE, EQ, NE, LE and LT. GT means greater than, EQ means equals, and<br />
so forth.<br />
There are also 6 eXact versions: XGT, XGE, XEQ, XNE, XLE and XLT. XEQ causes <strong>the</strong> compare of character<br />
values <strong>to</strong> be case-specific, whereas EQ is case-independent. For numeric compares, EQ and XEQ will by default<br />
do exact (non-fuzzy) compares. However, <strong>the</strong> EQ and GT type of opera<strong>to</strong>rs can be directed <strong>to</strong> do fuzzy compares.<br />
For numeric compares, <strong>the</strong> EQ opera<strong>to</strong>rs can be made <strong>to</strong> do fuzzy compares in two ways.<br />
1. EQ.2 or GT.5 or such can be used <strong>to</strong> cause a fuzzy compare of that many steps. The step part can<br />
be from 0 <strong>to</strong> 99, with 0 meaning no steps. EQ.2 is treated as a simple EQ when <strong>the</strong> compare involves<br />
character values.
7.8 Random Number and Distribution Functions<br />
2. FUZZ 5 $ is a new command that causes later use of <strong>the</strong> EQ type of logical opera<strong>to</strong>rs <strong>to</strong> use that<br />
many steps. It is ignored for character compares, and does not affect <strong>the</strong> XEQ type of opera<strong>to</strong>rs. It<br />
is also ignored for an opera<strong>to</strong>r like EQ.3 that already has a specific stepsize.<br />
In o<strong>the</strong>r words, <strong>the</strong> step count of 3 in EQ.3 has precedence over any current FUZZ command setting. Fuzz 0 $<br />
would turn it off.<br />
7.10 How Fuzzy Opera<strong>to</strong>rs Work<br />
Consider<br />
IF aaa EQ.2 bbb.<br />
The above test will be true whenever AAA is ei<strong>the</strong>r equal <strong>to</strong> BBB or within 2 steps of BBB (it does not matter<br />
which is <strong>the</strong> larger).<br />
The following 5 lines would do exactly <strong>the</strong> same thing:<br />
Consider<br />
IF STEP.DOWN(aaa, 2) XEQ bbb or<br />
STEP.DOWN(aaa ) XEQ bbb or<br />
aaa XEQ bbb or<br />
STEP.UP (aaa ) XEQ bbb or<br />
STEP.UP (aaa, 2) XEQ bbb<br />
IF aaa GT.5 bbb.<br />
It is first determined if AAA and BBB are ‘equal’, which in this case means no more than 5 steps apart in ei<strong>the</strong>r<br />
direction. Since <strong>the</strong> GT test is true only when (1) AAA is greater and (2) <strong>the</strong>y are not equal, AAA must be more<br />
than 5 steps greater than BBB for a true result <strong>to</strong> occur.<br />
In <strong>the</strong> first of <strong>the</strong>se next two statements, <strong>the</strong> values being compared are not equal, so a GT result can be true. In<br />
<strong>the</strong> second, <strong>the</strong> GT.1 test has enough fuzz <strong>to</strong> cause <strong>the</strong> two values <strong>to</strong> be considered <strong>to</strong> be equal, so one cannot be<br />
greater.<br />
IF STEP.UP( 999 ) GT.0 999 will be true,<br />
IF STEP.UP( 999 ) GT.1 999 will be false.<br />
Thus, aaa GT.5 bbb asks if AAA is more than 5 steps greater than BBB. The o<strong>the</strong>r opera<strong>to</strong>rs work in a similar<br />
manner.<br />
7.11 FUZZY Summary<br />
The GT, GE, EQ, NE, LE and LT logical opera<strong>to</strong>rs have always done exact compares on numeric values; <strong>the</strong> default<br />
has not changed.<br />
These 6 opera<strong>to</strong>rs have been extended: EQ.3 for example will return an equal result if <strong>the</strong> two values being compared<br />
are separated by no more than 3 steps. A step is <strong>the</strong> distance from one internally representable number <strong>to</strong><br />
<strong>the</strong> next one.<br />
The step part (<strong>the</strong> .3 in EQ.3) can be from 0 <strong>to</strong> 99. Using 5 steps should be sufficient <strong>to</strong> cover random differences.<br />
NEAR is supported as a more readable form of EQ.5 . Similarly, NOTNEAR means NE.5 .<br />
A new command, FUZZ 3 $ or such, causes subsequent use of EQ, etc. <strong>to</strong> do fuzzy compares of that many steps<br />
au<strong>to</strong>matically. However, this does NOT change an explicitly supplied step like GT.0 or EQ.1 .<br />
Using FUZZ 2 $ or such might be useful when pages of <strong>PPL</strong> are involved and you want <strong>to</strong> quickly see if fuzz<br />
makes a difference.<br />
XGT, XGE, XEQ, XNE, XLE and XLT can still be used in numeric compares. They always do an exact (nonfuzzy)<br />
compare. In o<strong>the</strong>r words, XEQ and EQ.0 are <strong>the</strong> same.
Random Number and Distribution Functions 7.9<br />
SUMMARY<br />
<strong>PPL</strong> Functions: Numeric — Random Numbers<br />
The number of arguments for <strong>the</strong> random number functions depends on how <strong>the</strong>y are used. There may<br />
be from one <strong>to</strong> three types of arguments: 1) a required initial seed control argument, 2) three optional<br />
scratch variables, and 3) any function specific arguments. The initial seed control argument is one of<br />
<strong>the</strong>se constants: 0, -1 or -3. When it is 0, three seed values from <strong>the</strong> current date and time are used <strong>to</strong><br />
start <strong>the</strong> random number genera<strong>to</strong>r. When it is -1, three default seed values that are <strong>the</strong> same every time<br />
are used. When it is -3, three constants <strong>to</strong> be used as <strong>the</strong> initial seed values should follow.<br />
Three scratch variables may be supplied next. When <strong>the</strong>y are supplied, <strong>the</strong> final seed values are saved<br />
as <strong>the</strong> values of <strong>the</strong> scratch variables. They may be used as initial seeds at a future time <strong>to</strong> continue a<br />
progression. Any function specific arguments come last.<br />
RANBIN (nn, nn, nn, nn, #vn, #vn, #vn, nn, p)<br />
generates random observations from a binomial distribution with <strong>the</strong> order and probability specified as<br />
<strong>the</strong> right-most arguments. When <strong>the</strong> probability is .5, it need not be given:<br />
[ GEN Obs = RANBIN (0, 2) ;<br />
IF Obs EQ 1, SET Group = 1, F.SET Group = 2 ]<br />
The GEN instruction generates observations from a binomial distribution of order 2 and probability .5 —<br />
that is, with <strong>the</strong> integers 0, 1 and 2 in <strong>the</strong> proportions .25, .5 and .25. For example, this is <strong>the</strong> distribution<br />
of heads (or tails) obtained when <strong>to</strong>ssing two coins. The IF statement tests <strong>the</strong> value of <strong>the</strong> random number<br />
and assigns group membership, with 50% in each group.<br />
RANNORM (nn, nn, nn, nn, #vn, #vn, #vn)<br />
generates random numbers from <strong>the</strong> normal distribution:<br />
GEN Random = (RANNORM (0) * 2.5) + 43.6 ;<br />
The random numbers are standard scores that range from -6 through +6 and <strong>the</strong> probability of obtaining<br />
specific values depends on <strong>the</strong> area under <strong>the</strong> normal curve. The example above generates random numbers<br />
with a standard deviation of 2.5 and a mean of 43.6.<br />
RANTABLE (nn, nn, nn, nn, #vn, #vn, #vn, nn, nn, nn)<br />
generates random observations from a user's tabled distribution. The values and <strong>the</strong> probabilities of each<br />
are given as <strong>the</strong> right-most arguments:<br />
GEN Section = RANTABLE (0, 15, 5, 10, 20) ;<br />
This instruction generates <strong>the</strong> random section numbers 1, 2, 3 and 4 because four arguments are supplied<br />
(not counting <strong>the</strong> initial seed control argument). They are generated in <strong>the</strong> following proportions: 15/50<br />
= .3, 5/50 = .1, 10/50 = .2 and 20/50 = .4. (The arguments are summed <strong>to</strong> get <strong>the</strong> <strong>to</strong>tal, and <strong>the</strong> value of<br />
each argument is <strong>the</strong> proportion of <strong>the</strong> <strong>to</strong>tal desired for that value.)<br />
RANUNI (nn, nn, nn, nn, #vn, #vn, #vn)<br />
generates random numbers from a uniform distribution. The random numbers range from zero <strong>to</strong> one<br />
and <strong>the</strong> probability of obtaining any value equals <strong>the</strong> probability of obtaining any o<strong>the</strong>r value. The result<br />
can be multiplied by a constant <strong>to</strong> change <strong>the</strong> range of <strong>the</strong> generated values. A random subset of cases<br />
may be selected using RANUNI:
7.10 Random Number and Distribution Functions<br />
GEN #Random EQ RANUNI (-1) ;<br />
IF #Random LE .7, RETAIN ;<br />
These instructions do <strong>the</strong> same things as <strong>the</strong> previous ones, but <strong>the</strong>y also set and save <strong>the</strong> seed values:<br />
GEN #A = .M., GEN #B = .M., GEN #C = .M. ;<br />
GEN #Random = RANUNI (-3, 257,25,8004, #A,#B,#C ) ;<br />
IF #Random LE .7, RETAIN ;<br />
IF LAST (.FILE.), PUT #A ' ' #B ' ' #C ;<br />
<strong>PPL</strong> Functions: Numeric — Probability<br />
The following probability functions require one or more expressions as <strong>the</strong>ir arguments. Each expression<br />
should reduce <strong>to</strong> <strong>the</strong> argument appropriate for <strong>the</strong> function.<br />
PROBBIN (nn, n, p)<br />
computes <strong>the</strong> probability that a variable from a binomial (Bernoulli) distribution with probability p and<br />
size or degree n is less than or equal <strong>to</strong> <strong>the</strong> first argument nn (that is, has nn or fewer successes in n trials).<br />
The probability of a single value is <strong>the</strong> difference between two successive values:<br />
PROBBIN ( 4, 10, .5 ) - PROBBIN ( 3, 10, .5 ) = .205078125<br />
.205 is <strong>the</strong> probability of getting exactly four tails in ten <strong>to</strong>sses of a coin.<br />
PROBCHI (nn, df)<br />
computes <strong>the</strong> probability that a random variable from a chi-square distribution with degrees of freedom<br />
df is less than <strong>the</strong> specified argument. Degrees of freedom must be an integer.<br />
PROBF (nn, df1, df2)<br />
computes <strong>the</strong> probability that a variable from an F distribution with numera<strong>to</strong>r degrees of freedom df1<br />
and denomina<strong>to</strong>r degrees of freedom df2 is less than <strong>the</strong> specified argument.<br />
PROBNORM (nn)<br />
computes <strong>the</strong> probability that a random variable is less than <strong>the</strong> specified argument. The argument should<br />
be a deviate from a normal distribution with a mean of zero and standard deviation of one — that is, it<br />
should be a standard score between -6 and +6. This is <strong>the</strong> probability that a random variable from a normal<br />
distribution is less than 1.96:<br />
PROBNORM ( 1.96 ) = .975002105<br />
For <strong>the</strong> significance level of a two-tail test, multiply 1 minus <strong>the</strong> probability of <strong>the</strong> absolute value of <strong>the</strong><br />
deviate by 2:<br />
( 1 - PROBNORM ( ABS( -1.96) ) ) * 2 ) = .04999579060628<br />
PROBPOIS (nn, lambda)<br />
computes <strong>the</strong> probability that a variable from a Poisson distribution is less than or equal <strong>to</strong> <strong>the</strong> specified<br />
argument. Lambda is <strong>the</strong> mean of <strong>the</strong> distribution.<br />
PROBT (nn, df)<br />
computes <strong>the</strong> probability that a random variable is less than <strong>the</strong> first argument when degrees of freedom<br />
equal <strong>the</strong> second argument. The first argument should be a deviate from student's t distribution with a<br />
mean of zero and standard deviation of one. The degrees of freedom may be a whole or fractional<br />
number.
Random Number and Distribution Functions 7.11<br />
<strong>PPL</strong> Functions: Numeric — Inverse Probability<br />
The following inverse probability functions require one or more expressions as <strong>the</strong>ir arguments. Each<br />
expression should reduce <strong>to</strong> <strong>the</strong> argument appropriate for <strong>the</strong> function.<br />
INVBIN (nn, n, p)<br />
returns <strong>the</strong> observation from <strong>the</strong> binomial distribution with probability p and size or degree n whose area<br />
is nn:<br />
INVBIN ( .38, 10, .5 ) = 4<br />
Approximately 38% of <strong>the</strong> time, when <strong>to</strong>ssing 10 coins, 4 or fewer will be tails. INVBIN.RT returns an<br />
observation from <strong>the</strong> right tail of <strong>the</strong> distribution. INVBIN is <strong>the</strong> inverse of <strong>the</strong> PROBBIN function.<br />
INVCHI (nn, df)<br />
returns <strong>the</strong> critical value from <strong>the</strong> chi-square distribution with degrees of freedom df whose area is nn.<br />
INVCHI is <strong>the</strong> inverse of <strong>the</strong> PROBCHI function.<br />
INVF (nn, df1, df2)<br />
returns <strong>the</strong> critical value from <strong>the</strong> F distribution with degrees of freedom df1 and df2 whose area is <strong>the</strong><br />
argument nn. INVF is <strong>the</strong> inverse of <strong>the</strong> PROBF function.<br />
INVNORM (nn)<br />
returns <strong>the</strong> deviate or critical value from <strong>the</strong> normal distribution whose area is <strong>the</strong> specified argument.<br />
The area, measured from <strong>the</strong> lower tail of <strong>the</strong> normal distribution, is a number between 0 and 1 that is <strong>the</strong><br />
probability of obtaining a value less than <strong>the</strong> calculated deviate. PROBIT is a synonym for INVNORM:<br />
PROBIT ( .95 ) = 1.644853628<br />
A critical value of approximately 1.64 is required for a significance level of 5% for a one-tail test. This<br />
function is <strong>the</strong> inverse of PROBNORM.<br />
INVPOIS (nn, lambda)<br />
returns <strong>the</strong> observation from <strong>the</strong> Poisson distribution with mean lambda whose area is nn. INVPOIS is<br />
<strong>the</strong> inverse of <strong>the</strong> PROBPOIS function. INVPOIS.RT returns an observation from <strong>the</strong> right tail of <strong>the</strong><br />
distribution.<br />
INVT (nn, df)<br />
returns <strong>the</strong> critical value from <strong>the</strong> t distribution with degrees of freedom df whose area is nn. INVT is<br />
<strong>the</strong> inverse of <strong>the</strong> PROBT function.<br />
<strong>PPL</strong> Functions: Fuzzy Numeric<br />
HEX ( nn )<br />
produces <strong>the</strong> HEX representation of <strong>the</strong> input in a character*16 result.<br />
STEP.UP ( nn, n )<br />
produces <strong>the</strong> number that is N steps up from <strong>the</strong> input value. The second argument, <strong>the</strong> number of steps,<br />
can be from zero <strong>to</strong> 9999. If omitted, it defaults <strong>to</strong> one.
7.12 Random Number and Distribution Functions<br />
STEP.DOWN ( nn, n )<br />
produces <strong>the</strong> number that is N steps down from <strong>the</strong> input value. The second argument, <strong>the</strong> number of<br />
steps, can be from zero <strong>to</strong> 9999. If omitted, it defaults <strong>to</strong> one.<br />
STEPS ( n, n )<br />
produces <strong>the</strong> number of steps from <strong>the</strong> smaller of NN1 and NN2 <strong>to</strong> <strong>the</strong> larger. Missing 3 is returned if<br />
more than one million steps separate <strong>the</strong> arguments.<br />
EQ / NE / LT / LE / GT / GE<br />
EQ.2 or GT.5 or such can be used <strong>to</strong> cause a fuzzy compare of that many steps. The step part can be from<br />
0 <strong>to</strong> 99, with 0 meaning no steps. EQ.2 is treated as a simple EQ when <strong>the</strong> compare involves character<br />
values.<br />
FUZZ 5 $ is a new command that causes later use of <strong>the</strong> EQ type of logical opera<strong>to</strong>rs <strong>to</strong> use that many<br />
steps. It is ignored for character compares, and does not affect <strong>the</strong> XEQ type of opera<strong>to</strong>rs. It is also ignored<br />
for an opera<strong>to</strong>r like EQ.3 that already has a specific step size.<br />
NEAR is a synonym for EQ. NOTNEAR is a synonym for NE. NEAR and NOTNEAR can be used<br />
after <strong>the</strong> FUZZ command has set a fuzz level.
8<br />
<strong>PPL</strong>:<br />
Across-Case Modifications<br />
Changes and summary statistics on groups of related cases are produced by data modification and aggregation<br />
across cases. Related cases are groups of cases in a file that is ordered by one or more variables defining group<br />
membership. For example, cases having <strong>the</strong> same value of a key variable such as Household.Number could be<br />
grouped <strong>to</strong>ge<strong>the</strong>r in <strong>the</strong> file. They are related by <strong>the</strong>ir common values of Household.Number. Across-case modifications<br />
use:<br />
• variables that exist across cases <strong>to</strong> hold accumulated values, and<br />
• functions <strong>to</strong> identify particular cases within a group of related cases.<br />
Scratch variables and <strong>the</strong> permanent vec<strong>to</strong>r permit <strong>the</strong> incrementing and saving of variables across cases.<br />
Scratch variables hold ei<strong>the</strong>r numeric or character values. The permanent vec<strong>to</strong>r is referenced with a P(J) notation<br />
allowing for calculation of <strong>the</strong> index value. It is created at <strong>the</strong> beginning of a run and can be used <strong>to</strong> pass values<br />
between commands as well as between cases. The P vec<strong>to</strong>r holds only numeric values. Multi-dimensional userdefined<br />
arrays are easier <strong>to</strong> use when an array is intrinsically multi-dimensional and can be defined <strong>to</strong> hold ei<strong>the</strong>r<br />
character or numeric data.<br />
<strong>PPL</strong>, <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong>, provides functions that identify or manipulate particular cases in<br />
a subgroup for modification and aggregation:<br />
• FIRST is true when <strong>the</strong> current case is <strong>the</strong> first case in a group.<br />
• LAST is true when <strong>the</strong> current case is <strong>the</strong> last case in a group.<br />
• SPLIT splits a case in<strong>to</strong> a number of new cases.<br />
• COLLECT collects a number of cases in<strong>to</strong> one large case.<br />
Splitting single cases in<strong>to</strong> multiple cases reorganizes data for plotting, t tests, or analysis of variance For example,<br />
monthly water flow measurements for multiple years may each be split in<strong>to</strong> 12 separate cases, and new<br />
variables showing <strong>the</strong> month and year may be created. Splitting also “undoes” collecting. Family or patient cases<br />
that are collected for modification may be split back in<strong>to</strong> <strong>the</strong>ir original cases afterwards.<br />
Collecting related cases permits subsequent modification of those cases: a family telephone number may be<br />
corrected for all members of a family, or a diagnosis may be added <strong>to</strong> all of a patient’s visit records. Collecting<br />
cases also permits <strong>the</strong> calculation of statistics that summarize <strong>the</strong> related cases, such as counts, means, <strong>to</strong>tals and<br />
o<strong>the</strong>rs. Total sales may be tallied for all <strong>the</strong> salesmen in each department, or mean income may be calculated for<br />
voters in each district.<br />
Several P-<strong>STAT</strong> commands also perform data modification and aggregation across related cases. The AG-<br />
GREGATE and DUPLICATES commands both produce files that contain summary records of a file or subgroups<br />
within a file. Aggregation and modification using <strong>the</strong>se commands are most appropriate when a file of summary<br />
information is <strong>the</strong> desired result. However, if <strong>the</strong> goal is <strong>to</strong> join summary information back on<strong>to</strong> <strong>the</strong> cases of <strong>the</strong><br />
original data file, COLLATE or LOOKUP must <strong>the</strong>n be used <strong>to</strong> do a hierarchical join. Using <strong>PPL</strong> for across-case<br />
modification and aggregation, as <strong>the</strong> file is read by a command such as MODIFY or LIST, often saves some extra<br />
steps.
8.2 <strong>PPL</strong>: Across-Case Modifications<br />
8.1 BASIC ACROSS-CASE AGGREGATION<br />
The FIRST and LAST functions are generally used with scratch variables or <strong>the</strong> permanent vec<strong>to</strong>r P for most basic<br />
types of aggregation. <strong>PPL</strong> instructions (such as GENERATE, SET and INCREASE), opera<strong>to</strong>rs (such as + , * and<br />
CONTAINS), and functions (such as MEAN, SQRT and TRIM) perform <strong>the</strong> actual modifications and<br />
calculations.<br />
8.2 Accessing FIRST and LAST Cases<br />
The FIRST and LAST functions determine whe<strong>the</strong>r a case is <strong>the</strong> first or last case in a file or, for cases ordered by<br />
subgroups, whe<strong>the</strong>r a case is <strong>the</strong> first or last case in <strong>the</strong> subgroup. FIRST and LAST are used with <strong>the</strong> system<br />
variable .FILE.<br />
<strong>to</strong> test for <strong>the</strong> beginning and ending cases of a file. (Any case selection is done before any o<strong>the</strong>r <strong>PPL</strong>, including<br />
testing for FIRST and LAST cases.) Figure 8.1 illustrates <strong>the</strong> use of FIRST and LAST.<br />
__________________________________________________________________________<br />
Figure 8.1 FIRST and LAST with Subgroups<br />
Age Sex<br />
Given <strong>the</strong>se 1 12 1<br />
six cases: 2 12 2<br />
3 12 2<br />
4 13 2<br />
5 14 1<br />
6 14 1<br />
The following statements are true for... case numbers:<br />
IF FIRST (.FILE. ), ... 1<br />
IF LAST (.FILE. ), ... 6<br />
IF FIRST ( Age ), ... 1, 4, 5<br />
IF FIRST ( Age, Sex ), ... 1, 2, 4, 5<br />
IF LAST ( Age ), ... 3, 4, 6<br />
IF LAST ( Age, Sex ), ... 1, 3, 4, 6<br />
__________________________________________________________________________<br />
The statement:<br />
IF FIRST ( .FILE. ),<br />
is true only when <strong>the</strong> first case of <strong>the</strong> file is processed. Similarly,<br />
IF LAST ( .FILE. ),<br />
is true only when <strong>the</strong> last case of <strong>the</strong> file is processed.<br />
If a file is ordered or sorted by one or more variables, <strong>the</strong> FIRST and LAST functions determine if a given<br />
case is <strong>the</strong> first or last member of <strong>the</strong> subgroup defined by those variables:<br />
IF FIRST (Division), GEN #Counter = 0 ;
<strong>PPL</strong>: Across-Case Modifications 8.3<br />
The first case of each division satisfies this test. Each time <strong>the</strong> first case in a division is processed, this IF statement<br />
is true and <strong>the</strong> variable Counter is set <strong>to</strong> zero. The LAST function is similar except that only <strong>the</strong> last case of a<br />
subgroup satisfies a LAST test.<br />
The FIRST and LAST functions are shown accessing cases in subgroups in Figure 8.1. The statement:<br />
( IF FIRST ( Age, Sex )<br />
is true for cases 1, 2, 4, and 5 — each time <strong>the</strong> value of Age changes or Sex within an Age group changes. (Notice<br />
that a comma separates <strong>the</strong> variables defining <strong>the</strong> subgroups.)<br />
The FIRST and LAST functions are used primarily with scratch variables and <strong>the</strong> permanent vec<strong>to</strong>r, both of<br />
which pass information between cases. Scratch variables contain ei<strong>the</strong>r numeric or character values. Depending<br />
on how <strong>the</strong>y are created, <strong>the</strong>y may be temporary or permanent. A temporary scratch variable exists only during<br />
<strong>the</strong> current command or macro. A permanent scratch variable retains its values across commands. The permanent<br />
vec<strong>to</strong>r exists for <strong>the</strong> duration of a P-<strong>STAT</strong> run. The permanent or (P) vec<strong>to</strong>r contains only numeric values.<br />
8.3 Scratch Variables<br />
Scratch variables may contain numeric or character values. They are created with GENERATE. A “#” (crosshatch<br />
or pound sign) is <strong>the</strong> first character in <strong>the</strong> scratch variable name. A scratch variable that starts with a single crosshatch<br />
exists for <strong>the</strong> duration of a single command or macro. A scratch variable that is created with two<br />
crosshatches exists for <strong>the</strong> duration of <strong>the</strong> P-<strong>STAT</strong> run. A scratch variable that is <strong>to</strong> contain character information<br />
must be defined as a character variable when it is generated. Its length, if greater than 40, must be cited:<br />
GENERATE #Name:C50 = Department.Name;<br />
GENERATE ##NAME:C50 = Department.Name;<br />
A scratch variable is not associated with a case; <strong>the</strong>refore, it has no position in <strong>the</strong> file. Scratch variables may<br />
not be used after ANY or ALL, or in list functions such as MEAN, SUM, MIN, MAX and SDEV.<br />
Scratch variables may be used within a command <strong>to</strong> hold temporary information:<br />
GENERATE #Temp = Rdg1 * Rdg2 + SQRT ( F.Fac<strong>to</strong>r ),<br />
GENERATE Result = ROUND ( #Temp / 10 );<br />
The scratch variable #Temp breaks up a complex calculation in<strong>to</strong> simpler components, without creating a new<br />
variable in <strong>the</strong> file. This calculation could be done in a one <strong>PPL</strong> clause with nested functions. However, several<br />
simple statements are more apt <strong>to</strong> be written correctly than a single complicated statement.<br />
The major use of scratch variables is across-case modification and aggregation. The scratch variable does not<br />
au<strong>to</strong>matically change when a new case is read, but only when it is explicitly changed. It is this property that makes<br />
it useful for passing information across cases in <strong>the</strong> file. Ano<strong>the</strong>r frequent use of scratch variables is in <strong>the</strong> TITLES<br />
command.<br />
TITLES 'Study Number #Study.Number'<br />
The following is an example which uses FIRST and LAST <strong>to</strong> count <strong>the</strong> number of cases which have <strong>the</strong> value<br />
of 'male’ on variable Sex:<br />
[ IF FIRST ( .FILE. ), GENERATE #Total.Males = 0;<br />
IF SEX EQ 'male', INCREASE #Total.Males;<br />
IF LAST ( .FILE. ), RETAIN ;<br />
KEEP .OTHERS. #Total.Males ]<br />
The scratch variable #Total.Males is generated and set equal <strong>to</strong> zero when <strong>the</strong> first case in <strong>the</strong> file is processed. A<br />
scratch variable remains zero until it is explicitly changed, typically with INCREASE or SET. The IF Sex EQ test<br />
is done for every case that is processed. When <strong>the</strong> result of <strong>the</strong> IF test is true, <strong>the</strong> value of #Total.Males is increased<br />
by 1. Each case is tested <strong>to</strong> see if it is <strong>the</strong> last case in <strong>the</strong> file. If it is not, <strong>the</strong> next case is read. The last case in<br />
<strong>the</strong> file is <strong>the</strong> only case that is retained.
8.4 <strong>PPL</strong>: Across-Case Modifications<br />
The last case contains all <strong>the</strong> original variables for that case plus <strong>the</strong> scratch variable Total.Males. When a<br />
scratch variable is used in a KEEP instruction, a regular variable with a position in <strong>the</strong> file is created. The # or ##<br />
is removed from <strong>the</strong> variable name and <strong>the</strong> variable can be referred <strong>to</strong> in subsequent <strong>PPL</strong> as Total.Males.<br />
__________________________________________________________________________<br />
Figure 8.2 Using Scratch Variables and FIRST<br />
File Staff:<br />
Rank Name Division Salary<br />
3 Sulley 12 22000<br />
2 De Jong 13 34300<br />
2 Swartz 13 27700<br />
5 Bryan 14 19500<br />
2 Fernald 12 26500<br />
3 Widmer 13 25000<br />
4 Williams 14 21300<br />
SORT Staff, BY Division Rank, OUT StaffSor $<br />
LIST StaffSor<br />
[ IF FIRST (Division), GEN #Cum.Salary = 0 );<br />
INCREASE #Cum.Salary BY Salary;<br />
KEEP .OTHERS. #Cum.Salary ] ,<br />
CONTROL Division $<br />
Cum<br />
Rank Name Division Salary Salary<br />
2 Fernald 12 26500 26500<br />
3 Sulley 12 22000 48500<br />
2 De Jong 13 34300 34300<br />
2 Swartz 13 27700 62000<br />
3 Widmer 13 25000 87000<br />
4 Williams 14 21300 21300<br />
5 Bryan 14 19500 40800<br />
__________________________________________________________________________<br />
Figure 8.2 illustrates <strong>the</strong> use of scratch variables with FIRST <strong>to</strong> get cumulative salary <strong>to</strong>tals. The file first<br />
must be sorted or ordered by <strong>the</strong> variables defining subgroup membership. It may also be sorted by o<strong>the</strong>r variables<br />
of interest. Fernald is <strong>the</strong> first member of Division 12 after <strong>the</strong> sort. Thus #Cum.Salary is generated equal <strong>to</strong> zero<br />
when his case is read:<br />
[ IF FIRST (Division), GEN #Cum.Salary = 0;<br />
#Cum.Salary is <strong>the</strong>n increased by 26,500, <strong>the</strong> value of Fernald’s Salary:<br />
INCREASE #Cum.Salary BY Salary;<br />
The KEEP instruction creates a new variable Cum.Salary. For Fernald’s case, Cum.Salary is set <strong>to</strong> 26,500, <strong>the</strong><br />
current value of <strong>the</strong> scratch variable #Cum.Salary:<br />
KEEP .OTHERS. #Cum.Salary ]
<strong>PPL</strong>: Across-Case Modifications 8.5<br />
Sulley, <strong>the</strong> next case after <strong>the</strong> sort, is also a member of Division 12. Therefore, #Cum.Salary is not reset <strong>to</strong><br />
zero, but it is increased by <strong>the</strong> value of Sulley’s salary <strong>to</strong> 48,500. This value, 48,500, is <strong>the</strong>n moved in<strong>to</strong> <strong>the</strong><br />
Cum.Salary variable for Sulley. De Jong is <strong>the</strong> first case in <strong>the</strong> next division, so #Cum.Salary is reset <strong>to</strong> zero, and<br />
<strong>the</strong> procedure is repeated.<br />
__________________________________________________________________________<br />
Figure 8.3 Creating a Summary Case with FIRST and LAST<br />
File Depts:<br />
Name Department Age Sex Position<br />
John Jones Hardware 33 m sales<br />
Jim Smith Hardware 42 m clerk<br />
Sara Clark Hardware 25 f service<br />
Gerry Walker Hardware 52 m manager<br />
Arlene Burns Personnel 29 f secretary<br />
George Dun Personnel 32 m clerk<br />
Jane Mason Personnel 43 f manager<br />
LIST Depts<br />
[ IF FIRST (Department), GENERATE #Employees = 0 ]<br />
[ INCREASE #Employees ) ]<br />
[ IF LAST (Department) RETAIN ;<br />
KEEP Department #Employees ] $<br />
Department Employees<br />
Hardware 4<br />
Personnel 3<br />
__________________________________________________________________________<br />
Figure 8.3 also illustrates aggregation using scratch variables. However, only a single summary case is retained<br />
for each department. The input file is already ordered by Department, so it need not be sorted.<br />
Each time <strong>the</strong> FIRST test is true, #Employees is initialized. It is increased as each case is processed. When <strong>the</strong><br />
LAST test is not true <strong>the</strong> case is deleted, and processing of <strong>the</strong> next case begins immediately. When <strong>the</strong> LAST<br />
test is true, KEEP is used <strong>to</strong> select variables for <strong>the</strong> summary case. The result is a report with one line per department<br />
containing <strong>the</strong> variables Department and Employees.<br />
8.4 The Permanent Vec<strong>to</strong>r<br />
The permanent (P) vec<strong>to</strong>r holds double-precision numeric values. The length of <strong>the</strong> P vec<strong>to</strong>r, like that of <strong>the</strong> V<br />
vec<strong>to</strong>r, is <strong>the</strong> maximum number of variables possible in a P-<strong>STAT</strong> system file. When a run begins, P-<strong>STAT</strong> generates<br />
<strong>the</strong> P vec<strong>to</strong>r with all <strong>the</strong> values set <strong>to</strong> missing type 1. New values are placed in <strong>the</strong> P vec<strong>to</strong>r using SET (using<br />
GENERATE will cause an error), and <strong>the</strong>y remain <strong>the</strong>re until <strong>the</strong>y are changed. The P vec<strong>to</strong>r is not re-initialized<br />
when a new P-<strong>STAT</strong> command begins.<br />
The values in <strong>the</strong> P vec<strong>to</strong>r are referenced by position number — P(5) refers <strong>to</strong> <strong>the</strong> value in <strong>the</strong> fifth location<br />
of <strong>the</strong> vec<strong>to</strong>r. This “subscript” notation permits <strong>the</strong> use of a variable or an expression as <strong>the</strong> index. Thus, locations<br />
in <strong>the</strong> P vec<strong>to</strong>r may be referenced with a DO loop:
8.6 <strong>PPL</strong>: Across-Case Modifications<br />
DO #J = 1, 8; SET P(#J) = V(#J) ; ENDDO;<br />
This will set <strong>the</strong> first eight values in <strong>the</strong> permanent vec<strong>to</strong>r <strong>to</strong> <strong>the</strong> first eight values in <strong>the</strong> current case. The contents<br />
of <strong>the</strong> expression denoting which P variable is <strong>to</strong> be used may be calculated:<br />
DO #L = 1, 6;<br />
SET P(#L) = V(#L);<br />
SET P(#L+6) = SQRT( V(#L) );<br />
ENDDO;<br />
However, <strong>the</strong> result of #L+6 must be an integer between 1 and <strong>the</strong> maximum number of variables in a file.<br />
Figure 8.4 illustrates <strong>the</strong> use of <strong>the</strong> permanent vec<strong>to</strong>r <strong>to</strong> move information between files. The initial PRO-<br />
CESS command is used as a vehicle for <strong>PPL</strong>; <strong>the</strong>re is no output file. The locations P(1) and P(2) are set <strong>to</strong> zero<br />
when <strong>the</strong> first case in <strong>the</strong> file is processed. Then, for each case (including <strong>the</strong> first), if <strong>Inc</strong>ome is not missing, P(1)<br />
is increased by 1 and P(2) is increased by <strong>the</strong> value of <strong>Inc</strong>ome. Thus, P(1) has <strong>the</strong> count of cases with good values<br />
for <strong>Inc</strong>ome and P(2) contains <strong>to</strong>tal <strong>Inc</strong>ome for all cases. P(1) and P(2) are available <strong>to</strong> <strong>the</strong> second MODIFY command,<br />
permitting <strong>the</strong> calculation of #Mean.<strong>Inc</strong>ome. As each case is processed, a simple subtraction produces <strong>the</strong><br />
difference between <strong>Inc</strong>ome for that case and #Mean.<strong>Inc</strong>ome.<br />
__________________________________________________________________________<br />
Figure 8.4 Moving Values Between Files with <strong>the</strong> P Vec<strong>to</strong>r<br />
C 'Get number of people and income <strong>to</strong>tals.' $<br />
PROCESS File1<br />
[ IF FIRST (.FILE.),<br />
SET P(1) = 0, SET P(2) = 0 ]<br />
[ IF <strong>Inc</strong>ome GOOD,<br />
INCREASE P(1),<br />
INCREASE P(2) BY <strong>Inc</strong>ome ] $<br />
C 'Now get mean income and income differences.' $<br />
MODIFY File1<br />
[ IF FIRST (.FILE.),<br />
GENERATE #Mean.<strong>Inc</strong>ome = P(2) / P(1) ]<br />
[ GENERATE Difference = <strong>Inc</strong>ome - #Mean.<strong>Inc</strong>ome ],<br />
OUT File2 $<br />
___________________________________________________________________________<br />
The permanent vec<strong>to</strong>r is also useful for passing a large number of numeric variables across cases within a<br />
command. Given some number of tests, each with a variable name beginning with “Test”, <strong>the</strong>se instructions will<br />
get class <strong>to</strong>tals for all <strong>the</strong> tests:<br />
[ IF FIRST ( Class ),<br />
DO #J USING Test?; SET P(#J) = 0; ENDDO;<br />
DO #J USING Test?; INC P(#J) BY V(#J);<br />
IF LAST ( Class ), RETAIN;<br />
DO #J USING Test?; SET V(#J) = P(#J; ENDDO ]<br />
Given any number of tests in any locations in <strong>the</strong> file, <strong>the</strong> P values in corresponding locations are initialized when<br />
<strong>the</strong> first case of a class is processed. (The index J uses <strong>the</strong> positions of all <strong>the</strong> Test? variables in <strong>the</strong> file as its<br />
values.) Each of <strong>the</strong> P values is increased by <strong>the</strong> associated test value.<br />
Each case is <strong>the</strong>n evaluated <strong>to</strong> determine if it is <strong>the</strong> last case for a class. If <strong>the</strong> test result is false, that case is<br />
not retained, <strong>the</strong> next case is read and processing resumes with <strong>the</strong> first <strong>PPL</strong> statement. If <strong>the</strong> test result is true,
<strong>PPL</strong>: Across-Case Modifications 8.7<br />
<strong>the</strong> <strong>PPL</strong> continues and <strong>the</strong> test values are set <strong>to</strong> <strong>the</strong> accumulated <strong>to</strong>tals. This final case for <strong>the</strong> class is <strong>the</strong> only case<br />
that is seen by <strong>the</strong> P-<strong>STAT</strong> command.<br />
The choice of whe<strong>the</strong>r <strong>to</strong> use <strong>the</strong> P vec<strong>to</strong>r or scratch variables depends on <strong>the</strong> number of variables involved,<br />
whe<strong>the</strong>r any are character variables, and <strong>the</strong> desired tasks. The P vec<strong>to</strong>r is usually easier <strong>to</strong> use when many variables<br />
are treated <strong>the</strong> same way, as in initialization:<br />
DO #Q = 1 TO 16; SET P(#Q) = 0 ; ENDDO;<br />
Scratch variables may be more convenient when only a few variables or character data are involved:<br />
GENERATE #T1 = 0, GENERATE #NN:C = ' ' ;<br />
8.5 User-defined Arrays<br />
Like scratch variables, arrays <strong>the</strong>y are defined during a run and used in <strong>PPL</strong> statements. An array can have up <strong>to</strong><br />
7 dimensions, and can be character or numeric. Array names have two characters, <strong>the</strong> second being <strong>the</strong> same as<br />
<strong>the</strong> first, like XX or cc or Zz. Case doesn’t matter. There can be up <strong>to</strong> 26 active arrays.<br />
How do arrays compare <strong>to</strong> <strong>the</strong> P vec<strong>to</strong>r? The P vec<strong>to</strong>r allows N numeric values, where N is <strong>the</strong> maximum<br />
number of variables in a file in a given version of P-<strong>STAT</strong>. This is usually 6,000. The P vec<strong>to</strong>r in one-dimensional;<br />
P(1) through P(N) can be used. Arrays are an improvement over <strong>the</strong> P vec<strong>to</strong>r in 3 ways:<br />
1. allowing character as well as numeric arrays.<br />
2. allowing dimensioning like XX(2,1,5).<br />
3. providing an array buffer (where arrays are placed) that is 3 times larger than <strong>the</strong> P vec<strong>to</strong>r.<br />
Arrays are defined by using <strong>the</strong> DEFINE.ARRAY command.<br />
DEFINE.ARRAY xx (10,30) TO 0 $<br />
This command defines XX as a numeric array with 2 dimensions. The first subscript will be 1 <strong>to</strong> 10, <strong>the</strong> second<br />
1 <strong>to</strong> 30. The 300 array values are initialized <strong>to</strong> zero. Initialization (<strong>the</strong> “TO 0” part) is optional; if it is not used<br />
<strong>the</strong> values are set <strong>to</strong> missing type 1.<br />
A dimensioning using zero or negative integers is allowed; for example: for example,<br />
DEFINE.ARRAY aa (-4:4, 0:10, -100:0) $<br />
A character array is defined by adding a numeric value which is <strong>the</strong> length for each of <strong>the</strong> character values in <strong>the</strong><br />
array<br />
DEFINE.ARRAY KK:12 (2, 10, 101:140) TO ' ' $<br />
This defines KK as a character array with 3 dimensions. Each value can hold 12 characters. This is shown by <strong>the</strong><br />
KK:12. The first subscript can be 1 or 2, <strong>the</strong> second 1 <strong>to</strong> 10, and <strong>the</strong> third 101 <strong>to</strong> 140. The 800 array values are<br />
initialized <strong>to</strong> blank. The maximum size of a character value is 50,000 characters.<br />
Each character value has a status word (<strong>to</strong> indicate missing or good), followed by <strong>the</strong> characters of <strong>the</strong> value.<br />
A character:4 value uses one array buffer element (as does a numeric value). A character:12 value needs 2 array<br />
buffer elements, a character:20 value needs 3 elements, and so on. The array buffer is very large and even <strong>the</strong><br />
Whopper II size can hold an array of 6000 C20 elements.<br />
SHOW.ARRAYS $<br />
The show.arrays command displays <strong>the</strong> names, size (if character), number of dimensions, and defined subscript<br />
range for each array.<br />
DROP.ARRAY aa zz pp $<br />
This command ends <strong>the</strong> definition of <strong>the</strong> indicated array or arrays and releases <strong>the</strong> array buffer space, making it<br />
available for o<strong>the</strong>r definitions.
8.8 <strong>PPL</strong>: Across-Case Modifications<br />
DROP.P.VECTOR $<br />
This command takes <strong>the</strong> P vec<strong>to</strong>r space and adds it <strong>to</strong> <strong>the</strong> array buffer. This allows larger arrays, but ends<br />
any use of <strong>the</strong> P vec<strong>to</strong>r in <strong>the</strong> run.<br />
Suppose we used 'DEFINE.ARRAY xx(3,5)$' and set #n <strong>to</strong> 2. These 3 (unrelated) standalone <strong>PPL</strong> statements<br />
would be valid, as would <strong>the</strong> nested DO loop.<br />
SET XX (2, #n ) = 77 $<br />
PUT XX (#n, #n-1 ) $<br />
IF XX (1, 3 ) LT XX(2,4), SET XX(1,3) <strong>to</strong> .m1. $<br />
DO #j = 1,3;<br />
DO #k = 1,5;<br />
SET XX( #j, #k ) = #j * 10 + #k;<br />
ENDDO;<br />
ENDDO $<br />
__________________________________________________________________________<br />
Figure 8.5 DEFINE.ARRAY and SHOW.ARRAYS<br />
DEFINE.ARRAY xx ( 0:3, 5 ) $<br />
DEFINE.ARRAY cc:12 ( 44, 2,2 ) $<br />
SHOW ARRAYS $<br />
---------Numeric array xx has been defined---------<br />
It has 20 values, organized in<strong>to</strong> 2 dimensions.<br />
The array buffer now has 17,980 unused elements.<br />
---------------------------------------------------<br />
-------Character array cc:12 has been defined-------<br />
It has 176 values, organized in<strong>to</strong> 3 dimensions.<br />
The array buffer now has 17,628 unused elements.<br />
----------------------------------------------------<br />
---------------array summary---------------<br />
There are 2 user-defined arrays:<br />
cc:12 ( 44, 2, 2 )<br />
xx ( 0:3, 5 )<br />
The array buffer contains 18,000 elements.<br />
372 are in use by existing arrays.<br />
17,628 are available for array definition.<br />
-------------------------------------------<br />
__________________________________________________________________________<br />
Once an array is defined it can be used by any command in <strong>the</strong> same way that <strong>the</strong> P vec<strong>to</strong>r is used. Given a<br />
file containing at least <strong>the</strong> following 3 variables:<br />
Gender coded 1=male<br />
2=female<br />
Age coded 1=le 30<br />
2=31 - 40<br />
3=Over 40<br />
<strong>Inc</strong>ome coded in dollar amounts.
<strong>PPL</strong>: Across-Case Modifications 8.9<br />
Produce <strong>the</strong> following report where Group represents one of <strong>the</strong> 6 possible gender/age groups.<br />
Group n had <strong>the</strong> highest average income of $xx,xxx.xx<br />
This type of question can be solved in a variety of ways. Because <strong>the</strong>re are 6 groups and it is necessary <strong>to</strong><br />
save both <strong>the</strong> number of cases in each group and <strong>the</strong> <strong>to</strong>tal income of each group across all <strong>the</strong> cases, arrays provide<br />
an easy way <strong>to</strong> handle <strong>the</strong> data collection.<br />
__________________________________________________________________________<br />
Figure 8.6 One-dimensional Arrays<br />
DEFINE.ARRAY gg (6) <strong>to</strong> 0 $<br />
DEFINE.ARRAY tt (6) <strong>to</strong> 0 $<br />
GEN ##High = 0; GEN ##Group $<br />
PROCESS px1298a [<br />
GEN #N = 0;<br />
DO #A = 1, 3;<br />
DO #G = 1, 2;<br />
INC #N;<br />
IF Age.ban = #A and Gender = #G,<br />
INCREASE GG(#N),<br />
INCREASE TT(#N) BY <strong>Inc</strong>ome;<br />
IF LAST ( .FILE. ) AND TT(#N) / GG(#N) GT ##High,<br />
SET ##High = TT(#N) / GG(#N),<br />
SET ##GROUP = #n;<br />
ENDDO;<br />
ENDDO;<br />
] $<br />
PUT "Group " ##Group<br />
" had <strong>the</strong> highest average income of $"<br />
@COMMAS @PLACES2 ##High $<br />
__________________________________________________________________________<br />
The first two commands in Figure 8.6 define two arrays with 6 elements in each. Array gg is used <strong>to</strong> accumulate<br />
<strong>the</strong> cases for each group while tt is used <strong>to</strong> accumulate <strong>to</strong>tal income. Figure 8.7 illustrates <strong>the</strong> same solution<br />
using two-dimensional arrays. In this particular example, <strong>the</strong> use of <strong>the</strong> one-dimensional arrays is somewhat easier<br />
<strong>to</strong> follow and <strong>the</strong>re is little difference in <strong>the</strong> amount of code required.<br />
It would be possible <strong>to</strong> use a single three dimensional array (3,2,2) <strong>to</strong> hold all twelve of <strong>the</strong> values that are<br />
needed for this particular problem. Such complexity might serve as an exercise in nesting do loops and handling<br />
scratch variables but would only complicate <strong>the</strong> solution of a fairly simple problem.<br />
Multiple dimensions are most useful when <strong>the</strong> contents of <strong>the</strong> cells is similar. For example: data on sales on<br />
5 divisions for 12 months from 6 regions of <strong>the</strong> country and might best be processed if s<strong>to</strong>red in a single 5,12,6<br />
array.
8.10 <strong>PPL</strong>: Across-Case Modifications<br />
__________________________________________________________________________<br />
Figure 8.7 Two-dimensional Arrays<br />
DEFINE.ARRAY gg (3,2) <strong>to</strong> 0 $<br />
DEFINE.ARRAY tt (3,2) <strong>to</strong> 0 $<br />
GEN ##High=0; GEN ##Group $<br />
PROCESS Myfile [<br />
DO #A = 1, 3;<br />
DO #G = 1, 2;<br />
IF Age = #A and Gender = #G,<br />
INCREASE GG(#A,#G),<br />
INCREASE TT(#A,#G) BY <strong>Inc</strong>ome;<br />
IF LAST (.FILE.) AND TT(#A,#G) / GG(#A,#G) GT ##HIGH,<br />
SET ##High = TT(#A,#G) / GG(#A,#G),<br />
SET ##Group = #A + (#G-1) * 3;<br />
ENDDO;<br />
ENDDO;<br />
] $<br />
PUT "Group " ##Group<br />
" had <strong>the</strong> highest average income of $"<br />
@COMMAS @PLACES2 ##High $<br />
__________________________________________________________________________<br />
8.6 Interaction of FIRST, LAST and O<strong>the</strong>r <strong>PPL</strong><br />
The FIRST and LAST functions interact with o<strong>the</strong>r <strong>PPL</strong> instructions in <strong>the</strong> following manner:<br />
1. Case selection, such as CASES 11 TO 30, is done first. The rest of <strong>the</strong> <strong>PPL</strong> sees only those 30 cases<br />
and has no idea that <strong>the</strong>y came from a larger file.<br />
2. An internal FIRST/NOTFIRST and LAST/NOTLAST flag is set for each FIRST or LAST test used<br />
in <strong>the</strong> <strong>PPL</strong>. This is done as soon as <strong>the</strong> case passes <strong>the</strong> CASES filter. FIRST(.FILE.), for example,<br />
is true for <strong>the</strong> first case processed. That case may have been <strong>the</strong> eleventh case of <strong>the</strong> original input<br />
file.<br />
3. The <strong>PPL</strong> for <strong>the</strong> current case is done now. Because <strong>the</strong> FIRST and LAST settings for a case are determined<br />
before o<strong>the</strong>r <strong>PPL</strong> begins, FIRST and LAST testing cannot be done on newly generated<br />
variables. Also, recoding a FIRST or LAST variable has no effect on <strong>the</strong> FIRST and LAST settings,<br />
since those settings are done before recodes occur.<br />
4. The DELETE and RETAIN instructions should not be used until all FIRST and LAST tests are<br />
complete.<br />
To summarize, FIRST and LAST logic is based on <strong>the</strong> pre-<strong>PPL</strong> values of only those cases that remain after any<br />
case selection.<br />
The last portion of this section on basic across-case modification gives two detailed examples that use scratch<br />
variables, <strong>the</strong> P vec<strong>to</strong>r and <strong>the</strong> FIRST function, along with IF tests, DO loops, <strong>the</strong> PUT instruction and <strong>the</strong> PUT<br />
counter (.PUT.). The examples integrate <strong>the</strong> various <strong>PPL</strong> procedures covered thus far in handling realistic problems<br />
encountered in data modification.
<strong>PPL</strong>: Across-Case Modifications 8.11<br />
8.7 Example: Checking a List of Variables<br />
In creating a new variable from a series of dummy variables, it may be sensible <strong>to</strong> check that only one of <strong>the</strong> dummy<br />
variables contains <strong>the</strong> value 1 and that <strong>the</strong> rest are zero.<br />
The instructions shown in Figure 8.8 test that only one of <strong>the</strong> dummy variables has been coded 1 and <strong>the</strong>y<br />
create <strong>the</strong> new variable Region. Scratch variables are used <strong>to</strong> contain <strong>the</strong> results of <strong>the</strong> IF test. The scratch variables<br />
#Test1 and #Test2 are generated equal <strong>to</strong> 0 in all cases.<br />
#Test1 is incremented each time <strong>the</strong> DO loop test is true. #Test1 is 0 at <strong>the</strong> end of <strong>the</strong> DO loop if none of <strong>the</strong><br />
variables contained a 1. #Test1 is greater than 1 if more than one of <strong>the</strong> variables contained a 1. #Test2 is incremented<br />
by <strong>the</strong> value of <strong>the</strong> variable in <strong>the</strong> DO loop each time <strong>the</strong> IF test is false or missing. If #Test2 is missing<br />
at <strong>the</strong> end of <strong>the</strong> loop, one of <strong>the</strong> dummy variable values is missing. If #Test2 is greater than 0 at <strong>the</strong> end of <strong>the</strong><br />
loop, one or more of <strong>the</strong> dummy variables was some value o<strong>the</strong>r than 0 or 1.<br />
__________________________________________________________________________<br />
Figure 8.8 Checking Variables Using PUT and Scratch Variables<br />
MODIFY Regional<br />
[ KEEP North.East TO South.West Age Sex ;<br />
GENERATE Region = .M1. ;<br />
GEN #Test1 = 0, GEN #Test2 = 0 ]<br />
[ DO #J = 1 TO 4;<br />
IF V(#J) EQ 1, SET Region = J,<br />
T.INCREASE #Test1, FM.INCREASE #Test2 BY V(#J) ;<br />
ENDDO;<br />
IF #Test1 EQ 0, PUT .N. ;<br />
IF #Test1 GT 1, PUT .N. ,<br />
SET Region = .M2.;<br />
IF #Test2 MISSING OR #Test2 GT 0,<br />
PUT .N. ,<br />
SET Region = .M3. ;<br />
IF .PUT. GT 0, RETAIN ],<br />
OUT Errors $<br />
__________________________________________________________________________<br />
The PUT instruction reports any error conditions. Region is set <strong>to</strong> ei<strong>the</strong>r missing type 2 or missing type 3,<br />
depending on whe<strong>the</strong>r <strong>the</strong> error is a missing or multiple coding of 1 or a missing or non-zero coding of 0. Finally,<br />
if <strong>the</strong>re were errors, <strong>the</strong> case is retained and written <strong>to</strong> an error file — <strong>the</strong> output file named “Errors”. .PUT. is a<br />
system variable which is reset <strong>to</strong> 0 as each new case is read. It is incremented each time that a PUT instruction is<br />
issued. In Figure 8.8 <strong>the</strong> PUT instructions are only made <strong>to</strong> report errors and any case with a .PUT. value of 0 is<br />
error free.<br />
Consider:<br />
[ IF FIRST ( .FILE.) GEN #n = 0] and<br />
[ GEN #n = 0]<br />
The generate by itself zeros #n whenever a new case is read. The generate hung on FIRST (.FILE.) only zeros #n<br />
when <strong>the</strong> initial case is read.
8.12 <strong>PPL</strong>: Across-Case Modifications<br />
8.8 Example: Selecting a Block of Cases<br />
Selecting a block of cases, such as all cases from <strong>the</strong> one with <strong>the</strong> value “Jones” on Last.Name up <strong>to</strong> (but not including)<br />
<strong>the</strong> case with <strong>the</strong> value “Smith”, is a bit more complicated than selecting cases by <strong>the</strong>ir position in <strong>the</strong> file<br />
(.N.) or by specific values of a variable. Values in <strong>the</strong> permanent vec<strong>to</strong>r may be used <strong>to</strong> delineate <strong>the</strong> block of<br />
cases:<br />
[ IF FIRST ( .FILE. ), SET P(1) = 0 ;<br />
IF Last.Name EQ 'Jones', SET P(1) = 1 ;<br />
IF P(1) = 0, DELETE ;<br />
IF Last.Name EQ 'Smith', QUITFILE ]<br />
When <strong>the</strong> first case in <strong>the</strong> file is processed, P(1) is set equal <strong>to</strong> 0. It is reset <strong>to</strong> 1 when <strong>the</strong> Jones case is processed.<br />
Any cases with values of 0 for P(1) are deleted from fur<strong>the</strong>r processing. Thus, cases prior <strong>to</strong> Jones are deleted.<br />
When <strong>the</strong> Smith case is found, processing of <strong>the</strong> file s<strong>to</strong>ps (without using this case).<br />
8.9 THE SPLIT FUNCTION<br />
The SPLIT function divides a case in<strong>to</strong> multiple cases. When data are collected with related information in<br />
a single case, reorganization in<strong>to</strong> multiple cases may be necessary for various commands. For example, a household<br />
survey may have both household information and information for several household members in <strong>the</strong> same<br />
case. A medical study may have several patient visits or lab results in <strong>the</strong> same case. This organization is often<br />
inappropriate for many statistical analyses such as TTEST or ANOVA, which require explicit grouping variables<br />
or indices.<br />
Special forms of <strong>the</strong> SPLIT function:<br />
[ SPLIT ] or [ SPLIT * ]<br />
reverse <strong>the</strong> effects of COLLECT, <strong>the</strong> function which ga<strong>the</strong>rs multiple cases in<strong>to</strong> one case. They are described after<br />
<strong>the</strong> discussion of COLLECT and in <strong>the</strong> summary ending this chapter.<br />
8.10 Splitting a Case<br />
The simplest usage of SPLIT divides each case in<strong>to</strong> a designated number of cases. This file has two cases and<br />
each case has two variables:<br />
Test1 Test2<br />
16 12<br />
17 11<br />
Suppose you wanted <strong>to</strong> split each case in<strong>to</strong> 2 cases. A command such as:<br />
LIST X [ SPLIT INTO 2 ] $<br />
receives only <strong>the</strong> newly created cases. When SPLIT INTO 2 is encountered, each case in <strong>the</strong> file is converted in<strong>to</strong><br />
two cases:<br />
Test1<br />
16<br />
12<br />
17<br />
11<br />
There is an error message if <strong>the</strong> number of variables being split is not a multiple of <strong>the</strong> SPLIT argument. For example,<br />
you cannot do a simple SPLIT INTO 3 when <strong>the</strong>re are 5 variables, but you can when <strong>the</strong>re are 3, 6, 9, 12,<br />
etc.
<strong>PPL</strong>: Across-Case Modifications 8.13<br />
Ei<strong>the</strong>r:<br />
SPLIT INTO N or SPLIT N<br />
may be used; <strong>the</strong> word INTO is optional. N, <strong>the</strong> number of new cases, must be an integer. It can be an integer<br />
constant, a permanent scratch variable (##n), or (in a macro) a temporary scratch variable (#n). Thus, when <strong>the</strong><br />
input case has 40 variables, SPLIT INTO 4 yields ten variables in each of <strong>the</strong> four new cases. The first ten variable<br />
names are used. Case one has values 1 <strong>to</strong> 10, case two has values 11 <strong>to</strong> 20, and so on. There is an error message<br />
if <strong>the</strong> variables are split in<strong>to</strong> cases such that a numeric and a character variable would be combined in<strong>to</strong> a single<br />
variable (would be in <strong>the</strong> same column).<br />
The variables present in <strong>the</strong> new cases are also determined by additional options used with <strong>the</strong> SPLIT function.<br />
These options can occur in any order, as often as needed. Their order determines <strong>the</strong> order of <strong>the</strong> variables<br />
in <strong>the</strong> new cases. SPLIT itself must precede any options.<br />
8.11 CARRYing Identifying Variables<br />
CARRY is an optional instruction that specifies one or more variables <strong>to</strong> be carried in every case formed by <strong>the</strong><br />
SPLIT. CARRY requires one or more variables as its argument:<br />
CARRY Name, or CARRY Name Age Sex,<br />
Figure 8.9 illustrates <strong>the</strong> results of a SPLIT where CARRY is used <strong>to</strong> position <strong>the</strong> variables Name, Age and<br />
Sex in each of <strong>the</strong> new cases. Only <strong>the</strong> variables not mentioned in <strong>the</strong> CARRY instruction are split. (STUB may<br />
be used with LIST <strong>to</strong> highlight <strong>the</strong> hierarchical relationship between <strong>the</strong> carried variables and <strong>the</strong> split variables.)<br />
__________________________________________________________________________<br />
Figure 8.9 Using CARRY in <strong>the</strong> SPLIT Function<br />
FILE Students:<br />
Name Age Sex Test1 Test2<br />
Smith, Jason 11 1 16 12<br />
Wilson, Ann 14 2 17 11<br />
LIST Students [ SPLIT INTO 2, CARRY Name Age Sex ] $<br />
Name Age Sex Test1<br />
Smith, Jason 11 1 16<br />
Smith, Jason 11 1 12<br />
Wilson, Ann 14 2 17<br />
Wilson, Ann 14 2 11<br />
LIST Students [ SPLIT INTO 2, CARRY Name Age Sex ],<br />
STUB Name Age Sex $<br />
Name Age Sex Test1<br />
Smith, Jason 11 1 16<br />
12<br />
Wilson, Ann 14 2 17<br />
11<br />
_________________________________________________________________________
8.14 <strong>PPL</strong>: Across-Case Modifications<br />
8.12 Selecting Variables To USE<br />
The variables <strong>to</strong> be used in <strong>the</strong> SPLIT may first be selected by using KEEP in a separate modification clause,<br />
[ KEEP Test1 Test2 ;<br />
SPLIT INTO 2 ]<br />
or <strong>the</strong>y can be specified as part of <strong>the</strong> SPLIT function with <strong>the</strong> USE option:<br />
SPLIT INTO 2, USE Test1 Test2 ;<br />
Figure 8.10 shows <strong>the</strong> results of a USE selection.<br />
__________________________________________________________________________<br />
Figure 8.10 Selecting Variables for SPLIT with USE<br />
FILE Students:<br />
Name Age Sex Test1 Test2<br />
Smith, Jason 11 1 16 12<br />
Wilson, Ann 14 2 17 11<br />
LIST Students [ SPLIT INTO 2, USE Test1 Test2 ] $<br />
Test1<br />
16<br />
12<br />
17<br />
11<br />
__________________________________________________________________________<br />
USE requires ei<strong>the</strong>r one or more variable names as its argument. The number of variables must be a multiple<br />
of <strong>the</strong> SPLIT argument. If SPLIT INTO 6 is used and USE specifies 18 variables, <strong>the</strong> 18 variables are split in<strong>to</strong><br />
six output cases with three variables each. The variable names are those of <strong>the</strong> first three variables specified after<br />
USE. The USE variables can include ranges:<br />
USE Test1 TO Test9 Test99<br />
The USE option may be used solely <strong>to</strong> reorder <strong>the</strong> variables that are in <strong>the</strong> output cases:<br />
SPLIT INTO 2, USE Test2 Test1 ;<br />
When all variables are <strong>to</strong> be used, USE is not necessary — <strong>the</strong>se two instructions are equivalent:<br />
SPLIT INTO 2;<br />
SPLIT INTO 2, USE V(1) .ON. ;<br />
USE is also not necessary when o<strong>the</strong>r options, such as CARRY, are present and <strong>the</strong> number of variables not being<br />
carried is a multiple of <strong>the</strong> SPLIT argument.<br />
8.13 Defining New Variables with CREATE<br />
When <strong>the</strong> cases in a file are split, <strong>the</strong> names of <strong>the</strong> variables are those of <strong>the</strong> variables present in <strong>the</strong> first new case.<br />
Test1 and Test2 may be appropriate names before <strong>the</strong> SPLIT, when <strong>the</strong> variables are in one case. However, when<br />
<strong>the</strong> case is SPLIT, a variable name such as Test.Score may be more appropriate for all <strong>the</strong> Test? variables. The<br />
CREATE option gives an output variable a new name and also specifies just which variable values are <strong>to</strong> be used<br />
for that variable. CREATE takes <strong>the</strong> place of USE.
<strong>PPL</strong>: Across-Case Modifications 8.15<br />
The first argument for CREATE is <strong>the</strong> new variable name. The subsequent arguments are <strong>the</strong> existing variables<br />
whose values will be those of <strong>the</strong> new variable:<br />
CREATE Test.Score Test1 Test2<br />
The new variable created is “Test.Score”. The first new case output from SPLIT gets <strong>the</strong> value of <strong>the</strong> variable<br />
Test1, <strong>the</strong> first variable in <strong>the</strong> current input case <strong>to</strong> be used, for <strong>the</strong> new variable Test.Score. The second new case<br />
gets <strong>the</strong> value of Test2, <strong>the</strong> second variable in <strong>the</strong> current input case, for <strong>the</strong> same new variable, and so on. Figure<br />
8.11 shows <strong>the</strong> effect of CREATE. Note that <strong>the</strong> variables produced by <strong>the</strong> SPLIT are in <strong>the</strong> order in which <strong>the</strong>y<br />
are mentioned.<br />
__________________________________________________________________________<br />
Figure 8.11 Naming <strong>the</strong> New Variables with CREATE<br />
FILE Students:<br />
Name Age Sex Test1 Test2<br />
Smith, Jason 11 1 16 12<br />
Wilson, Ann 14 2 17 11<br />
LIST Students<br />
[ SPLIT 2,<br />
CREATE Test.Score Test1 Test2 ,<br />
CARRY Name ] $<br />
Test<br />
Score Name<br />
16 Smith, Jason<br />
12 Smith, Jason<br />
17 Wilson, Ann<br />
11 Wilson, Ann<br />
__________________________________________________________________________<br />
The number of variables in <strong>the</strong> list following after CREATE and <strong>the</strong> name for <strong>the</strong> created variable must equal<br />
<strong>the</strong> number of new cases being produced. Several CREATE instructions may follow a SPLIT. However, <strong>the</strong> number<br />
of variables in each CREATE list must equal <strong>the</strong> number of new cases being produced. Figure 8.12 shows<br />
three new cases produced from each existing case. Thus, three variables are in each CREATE list. Two new variables<br />
are defined. Two new variables, each using three existing variables, equal six variables, which is <strong>the</strong> number<br />
of variables in <strong>the</strong> original case <strong>to</strong> be split.<br />
When CREATE is used, any variables not cited in <strong>the</strong> CARRY or CREATE instructions are omitted from <strong>the</strong><br />
SPLIT unless USE is also included. When USE is included without a variable list, all <strong>the</strong> remaining variables are<br />
included in <strong>the</strong> SPLIT. The variable names for <strong>the</strong>se additional variables are those of <strong>the</strong> variables in <strong>the</strong> first case<br />
of <strong>the</strong> output file.<br />
8.14 Wildcard Notation and Masks<br />
Often cases which contain <strong>the</strong> type of data that is appropriate for splitting have variable names in which part of<br />
<strong>the</strong> name is a prefix and <strong>the</strong> rest is a counter or additional text <strong>to</strong> distinguish <strong>the</strong> values. When this situation exists,<br />
<strong>the</strong> ? wildcard notation can be used.
8.16 <strong>PPL</strong>: Across-Case Modifications<br />
The ? ei<strong>the</strong>r follows a prefix <strong>to</strong> indicate all variables starting with that prefix, or it precedes a suffix <strong>to</strong> indicate<br />
all variables ending with that suffix. Ei<strong>the</strong>r of <strong>the</strong> following produce <strong>the</strong> same result:<br />
SPLIT 2, CREATE Test.Score Test1 Test2 ;<br />
SPLIT 2, CREATE Test.Score Test? ;<br />
Ano<strong>the</strong>r way <strong>to</strong> select certain variables is <strong>to</strong> use a mask after a range:<br />
USE Test1 TO Test8 (MASK 1001),<br />
is <strong>the</strong> same as saying:<br />
USE Test1 Test4 Test5 Test8,<br />
given of course that Test1 through Test8 are consecutive variables in <strong>the</strong> file.<br />
Figure 8.12 Multiple CREATE Lists<br />
File Field121:<br />
Crop Date Y1 Y2 Y3 Y4 Y5 Y6<br />
Alfalfa 8/24/83 181 179 182 195 192 198<br />
Alfalfa 8/30/82 179 177 176 192 190 199<br />
LIST Field121<br />
[ SPLIT INTO 3, CARRY Crop Date,<br />
CREATE Plot.1 Y1 TO Y3, CREATE Plot.2 Y4 TO Y6 ],<br />
STUB Crop Date $<br />
Crop Date Plot.1 Plot.2<br />
Alfalfa 8/24/83 181 195<br />
179 192<br />
182 198<br />
8/30/82 179 192<br />
177 190<br />
176 199<br />
__________________________________________________________________________<br />
8.15 INDEXing Cases<br />
The INDEX option sequences <strong>the</strong> cases created by SPLIT. Several different indices may be built at <strong>the</strong> same time.<br />
Multiple indices are most useful when SPLIT is used <strong>to</strong> reorganize data for analysis of variance.<br />
INDEX requires a name for <strong>the</strong> new variable:<br />
SPLIT 2, INDEX Treatment ;<br />
A new variable named “Treatment” is created. It has <strong>the</strong> value 1 in <strong>the</strong> first case created by SPLIT and <strong>the</strong> value<br />
2 in <strong>the</strong> second case created by SPLIT. Figure 8.13 illustrates <strong>the</strong> use of INDEX.<br />
Multiple indices may also be created:<br />
INDEX Plot 2 Subplot 3,<br />
This creates two new variables named “Plot” and “Subplot”. Plot has <strong>the</strong> values 1 and 2. Subplot has <strong>the</strong> values<br />
1, 2, and 3. The first index moves more slowly than <strong>the</strong> second index, so that Plot remains 1 as Subplot is succes-
<strong>PPL</strong>: Across-Case Modifications 8.17<br />
sively 1, 2, and 3. Then Plot becomes 2, and Subplot is successively 1, 2, and 3. This means that <strong>the</strong>re must be<br />
six cases created by <strong>the</strong> SPLIT.<br />
When <strong>the</strong> right-most index value is omitted, <strong>the</strong> appropriate value is assumed. INDEX A 2 B is equivalent <strong>to</strong><br />
INDEX A 2 B 3 when SPLIT INTO 6 has been used, because <strong>the</strong> product of <strong>the</strong> INDEX values equals <strong>the</strong> SPLIT<br />
argument.<br />
__________________________________________________________________________<br />
Figure 8.13 Producing an Index Variable<br />
FILE Students:<br />
Name Age Sex Test1 Test2<br />
Smith, Jason 11 1 16 12<br />
Wilson, Ann 14 2 17 11<br />
LIST Students<br />
( SPLIT 2, CARRY Age Sex,<br />
INDEX Seq, CREATE Test.Score Test? ) $<br />
Test<br />
Age Sex Seq Score<br />
11 1 1 16<br />
11 1 2 12<br />
14 2 1 17<br />
14 2 2 11<br />
__________________________________________________________________________<br />
8.16 Ordering Variables with STEP and CYCLE<br />
The order of <strong>the</strong> variables in a file is sometimes not <strong>the</strong> desired one. Variables may be rearranged by using <strong>the</strong><br />
<strong>PPL</strong> instruction KEEP with variable selection and possibly a MASK, or within a SPLIT, by using CREATE and<br />
USE with lists of variables. In addition, if <strong>the</strong> variables are arranged in a regular pattern, <strong>the</strong>y may be ordered<br />
using <strong>the</strong> STEP and CYCLE options, which permit more concise specification when <strong>the</strong>re are many variables.<br />
The STEP option selects every second variable when its argument is two, every third variable when its argument<br />
is three, and so on. For example, given a file with 26 variables named A <strong>to</strong> Z, this:<br />
SPLIT 13, USE ( A TO Z) STEP 2;<br />
selects every second variable between A and Z, beginning with A.<br />
STEP moves through <strong>the</strong> list of variables selecting <strong>the</strong> first (A), advancing <strong>the</strong> step size (2), selecting <strong>the</strong> designated<br />
variable (C), and so on, until <strong>the</strong> list is exhausted. The number of variables selected by <strong>the</strong> STEP<br />
procedure must be a multiple of <strong>the</strong> number of variables required by <strong>the</strong> SPLIT function. In <strong>the</strong> prior example, 13<br />
variables are selected from <strong>the</strong> 26 variables in <strong>the</strong> USE list, and <strong>the</strong>se are divided in<strong>to</strong> 13 cases. There is one variable<br />
per case. The USE list should be specified “B TO Z” if every o<strong>the</strong>r variable beginning with B is <strong>to</strong> be selected.<br />
When STEP and CYCLE are used, <strong>the</strong> variable list following USE or CREATE must be enclosed in paren<strong>the</strong>ses.<br />
CYCLE works in a similar manner, except that when <strong>the</strong> variable list is exhausted, CYCLE goes back <strong>to</strong> <strong>the</strong> beginning<br />
of <strong>the</strong> list and begins selecting from <strong>the</strong> unused variables. (STEP does not return <strong>to</strong> <strong>the</strong> start of <strong>the</strong> list.)
8.18 <strong>PPL</strong>: Across-Case Modifications<br />
Because <strong>the</strong> initial starting place in <strong>the</strong> list changes when CYCLE is used, different variables are selected in each<br />
iteration. The number of iterations depends on <strong>the</strong> CYCLE argument and <strong>the</strong> number of variables in <strong>the</strong> USE list:<br />
SPLIT 6, USE ( V(1) TO V(12) ) CYCLE 3 ;<br />
The CYCLE instruction selects variables 1, 4, 7, 10; 2, 5, 8, 11; and 3, 6, 9, 12; in that order.<br />
The selection order is a result of <strong>the</strong> initial variable in <strong>the</strong> USE list, <strong>the</strong> number of variables in <strong>the</strong> USE list,<br />
and <strong>the</strong> CYCLE argument. Ultimately, all <strong>the</strong> variables in <strong>the</strong> USE list are selected. Thus, CYCLE differs from<br />
STEP, where only a fraction of <strong>the</strong> variables in <strong>the</strong> USE or CREATE list are selected. Note that <strong>the</strong> number of<br />
variables in <strong>the</strong> variable list (12) must be a multiple of <strong>the</strong> number of cases in<strong>to</strong> which <strong>the</strong> current case is being<br />
SPLIT (6). This is true for both <strong>the</strong> STEP and CYCLE procedures. Also, both STEP and CYCLE must follow<br />
ei<strong>the</strong>r USE or CREATE and must not have a comma preceding <strong>the</strong>m.<br />
Figure 8.14 shows <strong>the</strong> differing results that depend on whe<strong>the</strong>r STEP or CYCLE is used. STEP moves<br />
through <strong>the</strong> entire USE list, beginning with <strong>the</strong> first variable (Q2) and selecting every o<strong>the</strong>r variable. Four variables<br />
are selected and that meets <strong>the</strong> requirement of this SPLIT that a multiple of 4 be chosen. Four variables split<br />
in<strong>to</strong> four cases yield one variable per case.<br />
__________________________________________________________________________<br />
Figure 8.14 Using STEP and CYCLE<br />
File F:<br />
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9<br />
1 2 3 4 5 6 7 8 9<br />
11 12 13 14 15 16 17 18 19<br />
LIST F<br />
[ SPLIT 4, INDEX A 2 B, INDEX C, CARRY Q1,<br />
USE ( Q2 TO Q9 ) xxxx 2 ] $<br />
produces: if xxxx = STEP if xxxx = CYCLE<br />
A B C Q1 Q2 Q2 Q4<br />
1 1 1 1 2 2 4<br />
1 2 2 1 4 6 8<br />
2 1 3 1 6 3 5<br />
2 2 4 1 8 7 9<br />
1 1 1 11 12 12 14<br />
1 2 2 11 14 16 18<br />
2 1 3 11 16 13 15<br />
2 2 4 11 18 17 19<br />
__________________________________________________________________________<br />
CYCLE moves through <strong>the</strong> entire USE list, beginning with <strong>the</strong> first variable and selecting every o<strong>the</strong>r variable<br />
also. However, since eight variables are cited in <strong>the</strong> USE list, CYCLE returns <strong>to</strong> <strong>the</strong> first unused variable<br />
(Q3) in <strong>the</strong> USE list, and begins selecting again. It keeps cycling until all of <strong>the</strong> variables in <strong>the</strong> USE list are selected.<br />
Eight variables are chosen, meeting <strong>the</strong> requirement of this SPLIT that a multiple of 4 variables be chosen.<br />
These are split in<strong>to</strong> four cases, yielding two variables per case.<br />
SPLIT 1 and Cycle can be used <strong>to</strong> rearrange <strong>the</strong> variables in a file so that <strong>the</strong> variables in <strong>the</strong> second half of<br />
<strong>the</strong> file are interleaved with <strong>the</strong> variables in <strong>the</strong> first half of <strong>the</strong> file.
<strong>PPL</strong>: Across-Case Modifications 8.19<br />
A1 A2 A3 A4 B1 B2 B3 B4 becomes<br />
A1 B1 A2 B2 A3 B3 A4 B4 with <strong>the</strong> following <strong>PPL</strong><br />
LIST AB [ SPLIT 1, USE ( V(1) .ON. ) CYCLE 4 ] $<br />
8.17 How SPLIT Interacts With O<strong>the</strong>r <strong>PPL</strong><br />
There can be only one SPLIT function per command. Normal <strong>PPL</strong> can precede or follow SPLIT. Using <strong>PPL</strong> first<br />
allows selection and modification of cases in <strong>the</strong> usual manner before SPLIT is used.<br />
The interaction of <strong>PPL</strong> with SPLIT is as follows:<br />
1. The first case passes from <strong>the</strong> first <strong>PPL</strong> phrase <strong>to</strong> <strong>the</strong> next, and it is modified and retained or deleted<br />
in <strong>the</strong> usual manner. If retained, it reaches <strong>the</strong> SPLIT instruction.<br />
2. When SPLIT receives <strong>the</strong> case, <strong>the</strong> current number of variables and <strong>the</strong>ir names change as <strong>the</strong> original<br />
case is split in<strong>to</strong> a number of new cases.<br />
3. The first of <strong>the</strong>se new cases passes <strong>to</strong> subsequent <strong>PPL</strong> phrases, one after ano<strong>the</strong>r, until it is deleted<br />
or retained and received by <strong>the</strong> command in use. The second new case <strong>the</strong>n passes <strong>to</strong> <strong>the</strong> <strong>PPL</strong> phrases<br />
following <strong>the</strong> SPLIT instruction, and so on, until all of <strong>the</strong> new cases resulting from <strong>the</strong> split of <strong>the</strong><br />
first original case have passed through.<br />
4. The second original case is now processed. It passes <strong>to</strong> <strong>the</strong> <strong>PPL</strong> preceding <strong>the</strong> SPLIT, and <strong>the</strong>n it<br />
passes <strong>to</strong> <strong>the</strong> SPLIT instruction. It is split in<strong>to</strong> multiple new cases, which each pass in turn <strong>to</strong> <strong>the</strong><br />
<strong>PPL</strong> following <strong>the</strong> SPLIT. When all <strong>the</strong> new cases resulting from splitting <strong>the</strong> second case have been<br />
processed, <strong>the</strong> process begins again with <strong>the</strong> third original case.<br />
__________________________________________________________________________<br />
Figure 8.15 A Simple COLLECT<br />
File MyFile:<br />
Id Age Sex<br />
1 29 M<br />
2 26 F<br />
3 42 F<br />
4 - M<br />
LIST MyFile [ COLLECT 2 ] $<br />
Age Sex Age Sex<br />
Id.1 .1 .1 Id.2 .2 .2<br />
1 29 M 2 26 F<br />
3 42 F 4 - M<br />
__________________________________________________________________________<br />
8.18 THE COLLECT FUNCTION<br />
COLLECT is used <strong>to</strong> ga<strong>the</strong>r two or more adjacent cases in<strong>to</strong> a single larger case. This larger case can be used<br />
with <strong>PPL</strong> <strong>to</strong> modify related variables or <strong>to</strong> generate across-case statistics. After <strong>the</strong> <strong>PPL</strong>, SPLIT can be used <strong>to</strong>
8.20 <strong>PPL</strong>: Across-Case Modifications<br />
break <strong>the</strong> collected case back up in<strong>to</strong> its original cases with any new variables appended. Additional <strong>PPL</strong> can precede<br />
or follow <strong>the</strong> COLLECT function.<br />
COLLECT is always followed by an integer argument which indicates how many cases <strong>to</strong> collect:<br />
COLLECT 4<br />
This integer is <strong>the</strong> “COLLECT counter”. It specifies <strong>the</strong> maximum number of cases <strong>to</strong> collect. It can be an integer<br />
constant, a permanent scratch variable (#nn), or (in a macro) a temporary scratch variable (#n).<br />
Figure 8.15 illustrates a simple COLLECT. As <strong>the</strong> input file is processed, every two cases are collected in<strong>to</strong><br />
a single case. Because variable names in a P-<strong>STAT</strong> file must be unique, <strong>the</strong> variable names in <strong>the</strong> collected case<br />
have a suffix added <strong>to</strong> <strong>the</strong> original variable name. The maximum suffix value is equal <strong>to</strong> <strong>the</strong> COLLECT counter.<br />
Variables in <strong>the</strong> first case get a suffix of .1, variables in <strong>the</strong> second case get a suffix of .2, and so on. When <strong>the</strong><br />
number of cases in <strong>the</strong> file is not a multiple of <strong>the</strong> COLLECT counter, missing data are generated <strong>to</strong> fill <strong>the</strong> remaining<br />
variables in <strong>the</strong> final case.<br />
8.19 Collecting BY Groups<br />
Usually, in a COLLECT situation, <strong>the</strong> number of cases <strong>to</strong> be collected is not a constant. All households are not<br />
<strong>the</strong> same size. The BY option may be used with COLLECT <strong>to</strong> specify <strong>the</strong> variable or variables indicating group<br />
membership. When BY is used, <strong>the</strong> COLLECT counter indicates a maximum number of cases <strong>to</strong> collect, ra<strong>the</strong>r<br />
than an absolute number of cases. A maximum of 999 cases may be collected at once. However, if <strong>the</strong>re are a<br />
great many variables in <strong>the</strong> file, <strong>the</strong> actual maximum will be less due <strong>to</strong> memory size limitations.<br />
Figure 8.16 illustrates using COLLECT with BY. When <strong>the</strong> value of <strong>the</strong> variable House.Id changes, <strong>the</strong> end<br />
of a group is signaled and <strong>the</strong> current COLLECT is considered complete. When COLLECT 4 is specified and a<br />
Household has only three members, missing data are generated in that collected case for <strong>the</strong> variables with <strong>the</strong><br />
suffix .4. When a household has only two members, missing data are generated for all <strong>the</strong> .3 and .4 variables. A<br />
household with more than four members causes an error because <strong>the</strong> COLLECT counter is a maximum value. Notice<br />
that <strong>the</strong> variable defining group membership, House.Id, is carried only once in each collected case.<br />
__________________________________________________________________________<br />
Figure 8.16 Collecting BY Group Membership<br />
File Caseload:<br />
House<br />
Id Sex Age<br />
1001 M 43<br />
1001 F 44<br />
1001 M 19<br />
1002 F 23<br />
1002 M 29<br />
LIST Caseload ( COLLECT 4, BY House.Id ) $<br />
House Sex Age Sex Age Sex Age Sex Age<br />
Id .1 .1 .2 .2 .3 .3 .4 .4<br />
1001 M 43 F 44 M 19 - -<br />
1002 F 23 M 29 - - - -<br />
__________________________________________________________________________
<strong>PPL</strong>: Across-Case Modifications 8.21<br />
8.20 CARRYing Common Information<br />
There may be several variables in a group of related cases that do not define group membership, but that are usually<br />
<strong>the</strong> same for all <strong>the</strong> cases in <strong>the</strong> group. For example, in <strong>the</strong> file Caseload, each case might have Address as a<br />
variable. This would normally be collected as Address.1, Address.2, and so on. This is reasonable when Address<br />
is expected <strong>to</strong> be different for each case. However, when Address has <strong>the</strong> same value for each case in a household<br />
group, it is more reasonable <strong>to</strong> have Address as a single variable in <strong>the</strong> collected case. Using CARRY followed<br />
by one or more variables:<br />
CARRY Address<br />
causes <strong>the</strong>se variables <strong>to</strong> be placed in <strong>the</strong> collected case only once.<br />
A CARRY variable may be missing for some of <strong>the</strong> cases. However, if it is not missing, it must be <strong>the</strong> same<br />
for <strong>the</strong> entire group of collected cases unless ei<strong>the</strong>r FIRST or LAST is also used as a collect option:<br />
CARRY Address, FIRST<br />
FIRST requests that <strong>the</strong> first non-missing value of <strong>the</strong> CARRY variable be used. LAST requests that <strong>the</strong> last nonmissing<br />
value be used. (Note that FIRST and LAST are not <strong>the</strong> previously described logical functions.)<br />
8.21 Ordering Cases with INDEX and SORT<br />
INDEX and SORT are COLLECT options which permit <strong>the</strong> individual cases <strong>to</strong> be placed in <strong>the</strong> collected case in<br />
a different order. INDEX may only be used when <strong>the</strong> BY option is also used.<br />
The use of INDEX is illustrated in Figure 8.17. The variable Visit indicates <strong>the</strong> order that each case should<br />
have in <strong>the</strong> collected case. The first patient has three visits in 1, 3, 2 order. When <strong>the</strong>y are collected with Visit as<br />
<strong>the</strong> INDEX variable, <strong>the</strong> second case is placed in <strong>the</strong> .3 position because <strong>the</strong> value of Visit is 3. The third case for<br />
that patient is placed in <strong>the</strong> .2 position because <strong>the</strong> value of visit is 2.<br />
__________________________________________________________________________<br />
Figure 8.17 Collecting Cases in a Specified Order<br />
File Patients:<br />
Id Visit WBC<br />
1354 1 98<br />
1354 3 72<br />
1354 2 70<br />
4211 2 83<br />
4211 3 85<br />
LIST Patients<br />
[ COLLECT 3, BY Id, INDEX Visit ] $<br />
Visit WBC Visit WBC Visit WBC<br />
Id .1 .1 .2 .2 .3 .3<br />
1354 1 98 2 70 3 72<br />
4211 - - 2 83 3 85<br />
__________________________________________________________________________<br />
An INDEX value may not be missing, but <strong>the</strong> set of index values for a given collect need not be complete.<br />
For example, in Figure 8.17, <strong>the</strong> patient with Id 4211 has values 2 and 3 for <strong>the</strong> INDEX variable Visit but no value<br />
1. INDEX values are assumed <strong>to</strong> be both: 1) within <strong>the</strong> range of <strong>the</strong> COLLECT counter, and 2) unique integers.
8.22 <strong>PPL</strong>: Across-Case Modifications<br />
If INDEX values are out of range or repeated, one of <strong>the</strong> options WARN, IGNORE, FIRST, or LAST may<br />
be used <strong>to</strong> prevent an error message and indicate what <strong>to</strong> do. When WARN is used, a warning message is printed.<br />
When IGNORE is used, out of range or repeated INDEX values are ignored. When FIRST is used, <strong>the</strong> first of <strong>the</strong><br />
cases with <strong>the</strong> repeated index are collected. When LAST is used, <strong>the</strong> last of <strong>the</strong> cases are collected. The o<strong>the</strong>r<br />
cases are ignored. WARN and IGNORE may be used only with BY and INDEX.<br />
SORT is ano<strong>the</strong>r way of rearranging <strong>the</strong> cases in <strong>the</strong> collected case. SORT is followed by one or more variables<br />
giving <strong>the</strong> sort order in which <strong>the</strong> collected cases should be arranged:<br />
SORT WBC<br />
The sort direction can be controlled:<br />
SORT WBC (D)<br />
by specifying a direction. An upwards (U) or downwards (D) sort may be specified. When a direction is not specified,<br />
an upwards sort is assumed. Figure 8.18 shows <strong>the</strong> results produced by SORT.<br />
SORT variables may not be BY or CARRY variables. The use of both INDEX and SORT is redundant — indexing<br />
is done before sorting. Thus, sorting may “undo” indexing.<br />
__________________________________________________________________________<br />
Figure 8.18 Sorting <strong>the</strong> Collected Case<br />
File Patients:<br />
Id Visit WBC<br />
1354 1 98<br />
1354 3 72<br />
1354 2 70<br />
4211 2 83<br />
4211 3 80<br />
LIST Patients [ COLLECT 3, BY Id, SORT WBC ] $<br />
Visit WBC Visit WBC Visit WBC<br />
Id .1 .1 .2 .2 .3 .3<br />
1354 2 70 3 72 1 98<br />
4211 2 83 3 85 - -<br />
__________________________________________________________________________<br />
25.21 Complex Modification Using COLLECT<br />
Usually when a case is collected, <strong>the</strong>re are additional <strong>PPL</strong> instructions <strong>to</strong> calculate summary statistics or <strong>to</strong> do<br />
cross-case comparisons or aggregations. Because <strong>the</strong> collected case has variables with <strong>the</strong> same prefix followed<br />
by .1, .2, and so on, <strong>the</strong> use of wildcards and DO loops is helpful in specifying <strong>the</strong> <strong>PPL</strong> instructions.<br />
Figure 8.19 illustrates a complex modification problem — locating all <strong>the</strong> salesmen in a department who earn<br />
more than <strong>the</strong>ir manager. It illustrates COLLECT, additional <strong>PPL</strong>, and finally a SPLIT <strong>to</strong> break <strong>the</strong> collected case<br />
back up in<strong>to</strong> individual cases.<br />
In Figure 8.19, a new variable, Total.Pay, and a scratch variable, #Mgr.Total.Pay, are generated. #Mgr.Total.Pay<br />
is initialized. The COLLECT counter is set <strong>to</strong> 20, and cases are collected by Department. The five<br />
variables, Name, Position, Salary, Commission and Total.Pay, are each represented 20 times in <strong>the</strong> collected case<br />
as variables of <strong>the</strong> same names but with suffixes .1 <strong>to</strong> .20. The BY variable, Department, is present only once.
<strong>PPL</strong>: Across-Case Modifications 8.23<br />
When a complete department is collected in<strong>to</strong> one case, <strong>the</strong>re are 101 variables (1, plus 5 times 20) in <strong>the</strong> collected<br />
case even though a given department may have fewer than 20 members.<br />
When <strong>the</strong> manager’s <strong>to</strong>tal pay is located, <strong>the</strong> scratch variable #Mgr.Total.Pay is set equal <strong>to</strong> it. The value of<br />
#Mgr.Total.Pay is compared with Total.Pay for each salesman.<br />
__________________________________________________________________________<br />
Figure 8.19 A Complex Modification Problem<br />
FILE Staff:<br />
Department Name Position Salary Commission<br />
Furniture Adams Manager 20540 2875.25<br />
Furniture Brown Sales 17000 7230.80<br />
Hardware Mason Sales 16000 952.65<br />
Hardware Smith Manager 20300 862.95<br />
Hardware Green Sales 17000 4495.50<br />
LIST Staff<br />
[ GENERATE Total.Pay = Salary + Commission;<br />
GENERATE #Mgr.Total.Pay = 0 ;<br />
COLLECT 20, BY Department ]<br />
[ DO #J = 1, 20;<br />
IF Position?(#J) EQ 'Manager',<br />
SET #Mgr.Total.Pay = Total.Pay?(#J);<br />
ENDDO;<br />
DO #J = 1, 20;<br />
IF Total.Pay?(#J) LE #Mgr.Total.Pay,<br />
SET Total.Pay?(#J) = 0;<br />
ENDDO ]<br />
[ SPLIT ;<br />
IF Total.Pay GT 0, RETAIN ],<br />
COMMAS, MIN.PLACES 2 $<br />
Total<br />
Department Name Position Salary Commission Pay<br />
Furniture Brown Sales 17,000 7,230.80 24,230.80<br />
Hardware Green Sales 17,000 4,495.50 21,495.50<br />
__________________________________________________________________________<br />
The first DO loop goes from 1 <strong>to</strong> 20 <strong>to</strong> match <strong>the</strong> maximum possible size of <strong>the</strong> COLLECT. Note: instead of<br />
<strong>the</strong> constant 20 we could use <strong>the</strong> system variable .COLLECTSIZE., which is <strong>the</strong> number of cases found in <strong>the</strong><br />
most recent collect. A powerful attribute of wildcard notation is illustrated:<br />
[ DO #J = 1, 20 ;<br />
IF Position?(#J) EQ 'Manager',<br />
SET #Mgr.Total.Pay = Total.Pay?(#J);<br />
ENDDO ]
8.24 <strong>PPL</strong>: Across-Case Modifications<br />
One <strong>PPL</strong> instruction using wildcard notation takes <strong>the</strong> place of twenty instructions without it. Using a wildcard<br />
creates a vec<strong>to</strong>r of all <strong>the</strong> variables which begin with <strong>the</strong> wildcard prefix or suffix. The DO loop scratch<br />
variable #J accesses specified locations in this vec<strong>to</strong>r, just as it accesses locations in <strong>the</strong> V and P vec<strong>to</strong>rs.<br />
In Figure 8.19, each of <strong>the</strong> 20 variables which begin with “Position” is tested in turn <strong>to</strong> find <strong>the</strong> one with <strong>the</strong><br />
value “Manager”. If “Manager” is found in <strong>the</strong> fourteenth Position? variable, <strong>the</strong>n #Mgr.Total.Pay is set equal <strong>to</strong><br />
<strong>the</strong> fourteenth Total.Pay? variable.<br />
The second DO loop examines <strong>the</strong> Total.Pay of each salesman. Again <strong>the</strong> wildcard notation and loop scratch<br />
variable simplify <strong>the</strong> procedure:<br />
DO #J = 1, 20 ;<br />
IF Total.Pay?(#J) LE #Mgr.Total.Pay,<br />
SET Total.Pay?(#J) EQ 0 ; ENDDO;<br />
Each value in <strong>the</strong> vec<strong>to</strong>r of Total.Pay variables is tested, and any value that is less than or equal <strong>to</strong> #Mgr.Total.Pay<br />
is set <strong>to</strong> zero.<br />
The instruction:<br />
SPLIT ;<br />
is a special form of <strong>the</strong> SPLIT function that res<strong>to</strong>res collected cases and variable names <strong>to</strong> <strong>the</strong>ir original form. It<br />
uses <strong>the</strong> .1, .2 suffixes <strong>to</strong> ascertain <strong>the</strong> original number of cases and variable names. The order of <strong>the</strong> split cases<br />
may be somewhat different if SORT or INDEX is used in <strong>the</strong> COLLECT or if new variables are generated. The<br />
instruction:<br />
SPLIT *;<br />
produces all possible cases resulting from a COLLECT, even if no such cases existed before <strong>the</strong> COLLECT.<br />
Some cases may have all missing values of <strong>the</strong> suffixed variables when SPLIT * is used, whereas when SPLIT is<br />
used, only cases with at least one non-missing value of a suffixed variable are produced.)<br />
In Figure 8.19, when <strong>the</strong> collected case is split back up, each Department splits back in<strong>to</strong> its original cases.<br />
The final <strong>PPL</strong> instruction:<br />
IF Total.Pay GT 0, RETAIN ]<br />
only retains cases where Total.Pay is greater than zero.<br />
In <strong>PPL</strong> where COLLECT has been used, a DO loop with a scratch variable and <strong>the</strong> wildcard notation are very<br />
convenient for referring <strong>to</strong> <strong>the</strong> collected variables. This is <strong>the</strong> case in <strong>the</strong> prior example in Figure 8.19. It is important,<br />
however, that <strong>the</strong> variable name prefixes (<strong>the</strong> part preceding <strong>the</strong> ?) be unique. This example gives<br />
unexpected results:<br />
[ KEEP ID Policy.No Agent Amount Age Class;<br />
COLLECT 4, BY ID;<br />
DO #J = 1, 4;<br />
IF Age?(#J) LT 18, SET Class?(#J) = 0;<br />
ENDDO;<br />
After <strong>the</strong> COLLECT takes place, all <strong>the</strong> variables (except <strong>the</strong> BY variable) have names such as Policy.No.1,<br />
Policy.No.2, Policy.No.3, Policy.No.4, Agent.1, Agent.2, and so on. The notation “Class?(#J)” in <strong>the</strong> prior example<br />
refers <strong>to</strong> <strong>the</strong> first #J variables (<strong>the</strong> first four since #J takes on <strong>the</strong> values 1 TO 4) beginning with “Class”. Thus,<br />
when #J = 1, if <strong>the</strong> result of <strong>the</strong> IF test is true, <strong>the</strong> variable Class.1 is set <strong>to</strong> 0. When #J = 2, if <strong>the</strong> IF test is true,<br />
Class.2 is set <strong>to</strong> 0, and so on.<br />
Similarly, <strong>the</strong> notation “Age?(#J)” refers <strong>to</strong> <strong>the</strong> first #J variables beginning with “Age”. These are Agent.1,<br />
Agent.2, Agent.3 and Agent.4, and not <strong>the</strong> intended variables Age.1, Age.2, Age.3 and Age.4. This is because <strong>the</strong><br />
Agent variables precede <strong>the</strong> Age variables. “Age?(#J)” is not specific enough <strong>to</strong> refer <strong>to</strong> just <strong>the</strong> Age variables;
<strong>PPL</strong>: Across-Case Modifications 8.25<br />
“Age.?(#J)” (with <strong>the</strong> dot) is unique. Remember, we want <strong>to</strong> wildcard against variables Age.1, Age.2, Age.3 and<br />
Age.4 .<br />
__________________________________________________________________________<br />
Figure 8.20 A Second Complex Problem<br />
File Patients:<br />
Last First<br />
ID Name Name Date Diagnosis Description Charges<br />
12425 Adams John 831105 - Room Fee 35.95<br />
12425 Adams John 831104 - Lab Tests 182.45<br />
12425 Adams John 831106 Ulcer Diagnosis -<br />
15743 Blair Sally 831221 - Blood Tests 36.00<br />
15743 Blair Sally 831222 Kidney S<strong>to</strong>nes Diagnosis -<br />
15743 Blair Sally 831221 - Room Fee 35.00<br />
15743 Blair Sally 831222 - Surgery 745.25<br />
15743 Blair Sally 831222 - Room Fee 35.00<br />
15743 Blair Sally 831223 - Room Fee 35.00<br />
15743 Blair Sally 831223 - Blood Tests 45.00<br />
12269 Knox Tom 840304 - Lab Tests 69.50<br />
12269 Knox Tom 840304 - Room Fee 35.00<br />
12269 Knox Tom 840305 - Cat Scan 545.00<br />
12269 Knox Tom 840306 - Room Fee 35.00<br />
12269 Knox Tom 840306 Brain Tumor Diagnosis -<br />
12269 Knox Tom 840305 - Room Fee 35.00<br />
LIST Patients<br />
Resulting Listing:<br />
[ COLLECT 10, BY ID,<br />
CARRY Last.Name First.Name, SORT Date ;<br />
GENERATE Diagnosis:C32 = FIRST.GOOD (Diagnosis?) ;<br />
GENERATE Total.Charges = SUM.GOOD (Charges?) ;<br />
GENERATE Admit.Date = Date.1 ;<br />
GENERATE Discharge.Date = LAST.GOOD (Date?) ;<br />
KEEP Last.Name First.name .NEW. ] $<br />
Last First Total Admit Discharge<br />
Name Name Diagnosis Charges Date Date<br />
Adams John Ulcer 218.40 110483 110683<br />
Blair Sally Kidney S<strong>to</strong>nes 931.25 122183 122383<br />
Knox Tom Brain Tumor 719.50 30484 30684<br />
__________________________________________________________________________
8.26 <strong>PPL</strong>: Across-Case Modifications<br />
Ano<strong>the</strong>r example using COLLECT is illustrated in Figure 8.20. Whereas <strong>the</strong> problem in Figure 8.16 could<br />
be solved in o<strong>the</strong>r, perhaps simpler, ways, <strong>the</strong> report produced in Figure 8.20 would be extremely difficult <strong>to</strong> do<br />
without a function such as COLLECT, and it would require several steps using <strong>the</strong> SORT and COLLATE commands.<br />
With COLLECT, often only a single command is needed <strong>to</strong> produce a complex report.<br />
Figure 8.21 shows <strong>the</strong> variables and values for a single patient immediately after <strong>the</strong> COLLECT step:<br />
COLLECT 10, BY ID, CARRY Last.Name First.Name, SORT Date ;<br />
Each input case has seven variables. A collected case has 43 variables. There are 3 CARRY variables plus 10<br />
times <strong>the</strong> 4 remaining variables. Because <strong>the</strong> COLLECT is done using <strong>the</strong> SORT instruction, a patient’s cases are<br />
rearranged by Date so that <strong>the</strong> case with earliest date will be in <strong>the</strong> .1 position. The case with <strong>the</strong> next earliest date<br />
will be in <strong>the</strong> .2 position, and so on. The case with <strong>the</strong> last date can be located by looking for <strong>the</strong> last non-missing<br />
value for a Date? variable.<br />
__________________________________________________________________________<br />
Figure 8.21 Before and After COLLECT<br />
John Adams' Records Before and After COLLECT:<br />
BEFORE: There are 7 variables in each case.<br />
Last First<br />
ID Name Name Date Diagnosis Description Charges<br />
12425 Adams John 831105 - Room Fee 35.95<br />
12425 Adams John 831104 - Lab Tests 182.45<br />
12425 Adams John 831106 Ulcer Diagnosis -<br />
AFTER: There are 43 variables in <strong>the</strong> collected case.<br />
Id Last.Name First.Name<br />
12425 Adams John<br />
Date.1 Diagnosis.1 Description.1 Charges.1<br />
831104 - Lab Tests 182.45<br />
Date.2 Diagnosis.2 Description.2 Charges.2<br />
831105 - Room Fee 35.95<br />
Date.3 Diagnosis.3 Description.3 Charges.3<br />
831106 Ulcer Diagnosis -<br />
Date.4 Diagnosis.4 Description.4 Charges.4<br />
- - - -<br />
- - - -<br />
- - - -<br />
Date.10 Diagnosis.10 Description.10 Charges.10<br />
__________________________________________________________________________
<strong>PPL</strong>: Across-Case Modifications 8.27<br />
Because a suffix is appended on<strong>to</strong> each of <strong>the</strong> collected variables, Diagnosis is now Diagnosis.1 <strong>to</strong> Diagnosis.10.<br />
Therefore, this instruction (in Figure 8.20):<br />
GENERATE Diagnosis:C32 = FIRST.GOOD (Diagnosis?) ;<br />
does not cause a variable name conflict. The new variable Diagnosis is given <strong>the</strong> value of <strong>the</strong> first non-missing<br />
value of any variable beginning with “Diagnosis”. For John Adams, Diagnosis.1 and Diagnosis.2 are missing, but<br />
Diagnosis.3 is not missing. Its value, “Ulcer”, is used as <strong>the</strong> value of <strong>the</strong> newly generated variable Diagnosis.<br />
Creation of <strong>the</strong> o<strong>the</strong>r three new variables is similar. Total.Charges is <strong>the</strong> sum of all <strong>the</strong> non-missing values of any<br />
variable which begins with <strong>the</strong> prefix “Charges”:<br />
GENERATE Total.Charges = SUM.GOOD (Charges?) ;<br />
Because of <strong>the</strong> sort order, Admit.Date is Date.1 :<br />
GENERATE Admit.Date = Date.1 ;<br />
Even though it is not known how many cases were collected, it is easy <strong>to</strong> locate Discharge.Date with <strong>the</strong><br />
LAST.GOOD function:<br />
GENERATE Discharge.Date = LAST.GOOD (Date?) ;<br />
The last good (non-missing) value of any variable which begins with “Date” becomes <strong>the</strong> Discharge.Date. For<br />
John Adams, this is 831106, <strong>the</strong> value of Date.3.<br />
The final <strong>PPL</strong> in Figure 8.20 is a KEEP <strong>to</strong> select <strong>the</strong> variables Last.Name, First.Name and all <strong>the</strong> new (.NEW.)<br />
variables generated by <strong>the</strong> <strong>PPL</strong>.<br />
8.22 COLLECT System Variables<br />
COLLECT sets 5 system variables.<br />
1. .COLLECTSIZE. The number of cases in <strong>the</strong> most recent collect<br />
2. .COLLECTMIN. The size of <strong>the</strong> smallest collected group so far<br />
3. .COLLECTMAX. The size of <strong>the</strong> largest collected group so far<br />
4. .COLLECTIONS. The <strong>to</strong>tal number of collects that occured<br />
5. .COLLECTSUM. The number of cases that have been collected.<br />
For example, if a file is collected by household number:<br />
1. .COLLECTSIZE. The size of <strong>the</strong> household most recently collected (<strong>the</strong> current household)<br />
2. .COLLECTMIN. The smallest household<br />
3. .COLLECTMAX. The largest household<br />
4. .COLLECTIONS. The <strong>to</strong>tal number of households<br />
5. .COLLECTSUM. The number of people in all collected households<br />
These variables are reset as each new COLLECT occurs. Thus, if a file has 312 households, <strong>the</strong> 5 variables are<br />
reset 312 times. As a result, .COLLECTSIZE., for example, can by used in <strong>the</strong> <strong>PPL</strong> following a collect:<br />
LIST House<br />
( COLLECT 10, BY household )<br />
( DO #j = 1, .COLLECTSIZE. ) etc....<br />
The final settings remain until some later COLLECT begins reading cases anew.
8.28 <strong>PPL</strong>: Across-Case Modifications<br />
<strong>PPL</strong><br />
Across-case modification and aggregation are facilitated by:<br />
SUMMARY<br />
• Scratch Variables,<br />
• <strong>the</strong> Permanent Vec<strong>to</strong>r,<br />
• user-defined multi-dimensional arrays, and<br />
• <strong>the</strong> programming language functions FIRST, LAST, SPLIT and COLLECT.<br />
Scratch Variables have no position in a file. They are created using GENERATE followed by a name<br />
starting with one or two pound signs (#). The values of scratch variables created with one pound sign<br />
remain only for <strong>the</strong> duration of a command or macro. The values of scratch variables created with two<br />
pound signs remain for <strong>the</strong> duration of <strong>the</strong> run. They are explicitly changed with SET.<br />
The Permanent Vec<strong>to</strong>r is similar in behavior <strong>to</strong> scratch variables except that it has <strong>the</strong> name P assigned<br />
<strong>to</strong> it, and individual positions in P are located by subscript. The subscripts can be calculated. Permanent<br />
variables are set with SET. The P vec<strong>to</strong>r may only contain numeric values but <strong>the</strong>se values may be passed<br />
not only across cases of a file, but between commands.<br />
The wildcard character ? may be used <strong>to</strong> reference <strong>the</strong> suffixed variables created by COLLECT (as well<br />
as any o<strong>the</strong>r variables with a common prefix or suffix):<br />
[ COLLECT 20, BY Department;<br />
DO #J = 1, 20 ;<br />
IF Position?(#J) EQ 'Manager',<br />
SET #Mgr.Total.Pay = Total.Pay?(#J) )<br />
ENDDO ]<br />
“Position?(#J)” refers in turn <strong>to</strong> <strong>the</strong> first #J variables (20 in <strong>the</strong> above example) that begin with “Position”.<br />
ARRAY Commands<br />
An array can have up <strong>to</strong> 7 dimensions, and can be character or numeric. Array names have two characters,<br />
<strong>the</strong> second being <strong>the</strong> same as <strong>the</strong> first, like XX or cc or Zz. Case doesn’t matter. There can be up <strong>to</strong><br />
26 active arrays.<br />
DEFINE.ARRAY cc ( n,n,...)<br />
defines <strong>the</strong> array and (optionally) initializes it. Character arrays are declared by adding <strong>the</strong> desired character<br />
length immediately following <strong>the</strong> array name and a colon, i.e. AA:20 .<br />
DEFINE.ARRAY AA (5,8) TO 0 $<br />
DEFINE.ARRAY CC:20 (13,3) <strong>to</strong> ' ' $<br />
SHOW.ARRAYS<br />
reports on <strong>the</strong> status of all <strong>the</strong> defined arrays<br />
DROP.ARRAY aa zz<br />
requests that <strong>the</strong> listed arrays be dropped so that <strong>the</strong> space can be reused.<br />
nn=number list=variable list vn=variable name
<strong>PPL</strong>: Across-Case Modifications 8.29<br />
DROP.P.VECTOR<br />
releases <strong>the</strong> space normally used for <strong>the</strong> P vec<strong>to</strong>r and makes it available for use in arrays.<br />
<strong>PPL</strong> Functions: Across-Case<br />
FIRST (vn or .FILE.)<br />
is evaluated as true if this is <strong>the</strong> first case in <strong>the</strong> subgroup specified in <strong>the</strong> expression, and false if it is not<br />
<strong>the</strong> first case. The required expression is a variable name (vn) or a list of variables, or <strong>the</strong> system value<br />
.FILE. (meaning <strong>the</strong> entire file):<br />
IF FIRST (District, Department), SET P(1) = 0;<br />
Changing values of <strong>the</strong> variable or variables in an ordered file define different subgroups.<br />
LAST (vn or .FILE.)<br />
is evaluated as true if this is <strong>the</strong> last case in <strong>the</strong> subgroup specified in <strong>the</strong> expression, and false if it is not<br />
<strong>the</strong> last case. The required expression is a variable name (vn) or a list of variables, or <strong>the</strong> system value<br />
.FILE. (meaning <strong>the</strong> entire file):<br />
IF LAST (.FILE.), RETAIN;<br />
Changing values of <strong>the</strong> variable or variables define different subgroups.<br />
COLLECT nn<br />
specifies <strong>the</strong> number of adjacent cases <strong>to</strong> collect in<strong>to</strong> one case. Additional <strong>PPL</strong> may precede or follow<br />
<strong>the</strong> COLLECT function. <strong>PPL</strong> which follows COLLECT operates on <strong>the</strong> new longer case. A common<br />
usage is <strong>to</strong> COLLECT cases, do modifications, and <strong>the</strong>n SPLIT <strong>the</strong> long case back in<strong>to</strong> <strong>the</strong> original number<br />
of cases. Using (SPLIT) or (SPLIT *) “undoes” COLLECT. A number of additional options may be<br />
used. They must follow <strong>the</strong> COLLECT:<br />
LIST Patients [ COLLECT 4, BY Id, INDEX Visit ] $<br />
The options in <strong>the</strong> following list specify <strong>the</strong> cases <strong>to</strong> be collected and <strong>the</strong> variables in <strong>the</strong> new case.<br />
1. BY vn or list<br />
specifies one or more character and/or numeric variables that identify <strong>the</strong> cases that belong<br />
<strong>to</strong> a subgroup. The input file should be grouped or sorted by <strong>the</strong>se variables.<br />
Those cases with <strong>the</strong> same values of <strong>the</strong> BY variables, that is, members of <strong>the</strong> same<br />
subgroup, are collected in<strong>to</strong> one case. Values of missing1, missing2 and missing3 also<br />
define membership in different subgroups.<br />
When BY is used, <strong>the</strong> number of cases <strong>to</strong> be collected must still be specified. That<br />
number defines <strong>the</strong> maximum number of members of a subgroup. BY variables appear<br />
(are carried) only once in <strong>the</strong> new case.<br />
2. CARRY vn or list<br />
implies that <strong>the</strong> values of <strong>the</strong> specified variables are <strong>the</strong> same for all members of a subgroup,<br />
and that those variables should appear only once in <strong>the</strong> new case produced by<br />
COLLECT. If a value is missing, <strong>the</strong> first non-missing value for a CARRY variable is<br />
used. If <strong>the</strong> values differ, an error occurs unless FIRST and LAST are used.<br />
3. FIRST<br />
specifies that <strong>the</strong> first case be selected for collection if values of <strong>the</strong> INDEX variable<br />
are repeated or if values of <strong>the</strong> CARRY variable differ.<br />
vn=variable name nn=number list=variable list
8.30 <strong>PPL</strong>: Across-Case Modifications<br />
4. IGNORE<br />
specifies that any case with a value of <strong>the</strong> index variable that is repeated or out of range<br />
should be ignored. IGNORE can only be used with BY and INDEX.<br />
5. INDEX vn<br />
specifies a numeric variable whose values determine <strong>the</strong> order that <strong>the</strong> cases in a subgroup<br />
take in <strong>the</strong> collected case. INDEX values may not be missing or exceed <strong>the</strong><br />
COLLECT counter (<strong>the</strong> number of cases <strong>to</strong> be collected), without <strong>the</strong> use of IGNORE,<br />
WARN, FIRST, or LAST as well. INDEX may not be used without BY.<br />
6. LAST<br />
specifies that <strong>the</strong> last case be selected for collection if values of <strong>the</strong> INDEX variable<br />
are repeated or if values of <strong>the</strong> CARRY variable differ.<br />
7. SORT vn or list<br />
requests that <strong>the</strong> collected cases be sorted by <strong>the</strong> specified variables before being<br />
placed in <strong>the</strong> new, long case. SORT variables may not be BY or CARRY variables.<br />
The use of both INDEX and SORT is redundant, and since sorting is done after indexing,<br />
SORT “undoes” INDEX.<br />
8. WARN<br />
COLLECT System Variables<br />
An upward sort (U) or a downward sort (D) may be specified:<br />
[ COLLECT 5, BY Household,<br />
CARRY Last.Name, SORT Age (D) ]<br />
An upward sort is assumed when sort order is not explicitly specified.<br />
requests that a warning be printed if a case has a value on <strong>the</strong> index variable that is repeated<br />
or out of range. WARN may not be used without BY and INDEX.<br />
1. .COLLECTSIZE. The number of cases in <strong>the</strong> most recent collect<br />
2. .COLLECTMIN. The size of <strong>the</strong> smallest collected group so far<br />
3. .COLLECTMAX. The size of <strong>the</strong> largest collected group so far<br />
4. .COLLECTIONS. The <strong>to</strong>tal number of collects that occured<br />
5. .COLLECTSUM. The number of cases that have been collected.<br />
SPLIT INTO nn<br />
requests that each current case be split in<strong>to</strong> <strong>the</strong> specified number of cases. (The word INTO is optional.)<br />
Additional <strong>PPL</strong> may precede or follow <strong>the</strong> SPLIT function, but <strong>the</strong>re may be only one SPLIT per command.<br />
A number of options may be used. They must follow <strong>the</strong> SPLIT:<br />
LIST Filename<br />
[ SPLIT INTO 2, CARRY Name, INDEX Term 2,<br />
CREATE Grade (Grade.2 TO Grade.4) STEP 2 ] $<br />
The following options control <strong>the</strong> order in which <strong>the</strong> variables in SPLIT cases are placed in<strong>to</strong> <strong>the</strong> output<br />
cases and <strong>the</strong> naming of <strong>the</strong> variables which are created:<br />
1. CARRY vn or list<br />
specifies one or more variables whose values are <strong>to</strong> be carried in every case created by<br />
SPLIT.<br />
2. CYCLE nn<br />
specifies <strong>the</strong> size of steps <strong>to</strong> be taken in selecting variables <strong>to</strong> be used in <strong>the</strong> SPLIT.<br />
CYCLE follows a USE or CREATE variable list without a comma preceding it. The<br />
nn=number list=variable list vn=variable name
<strong>PPL</strong>: Across-Case Modifications 8.31<br />
3. CREATE<br />
first variable is used, <strong>the</strong> variable “nn” away from <strong>the</strong> first is used next, and so on. Multiple<br />
passes or cycles are made through <strong>the</strong> variable list until all of <strong>the</strong> variables in <strong>the</strong><br />
list are used.<br />
new.vn vn or new.vn list<br />
provides a new variable name and gives <strong>the</strong> current variables whose values are <strong>to</strong> be<br />
used in <strong>the</strong> split cases. They will be <strong>the</strong> values of <strong>the</strong> new variable. The number of<br />
variables <strong>to</strong> be used must be <strong>the</strong> same as “nn” (<strong>the</strong> number of cases in<strong>to</strong> which <strong>the</strong> current<br />
case is <strong>to</strong> be SPLIT).<br />
4. INDEX new.vn nn<br />
specifies that a new variable be present in each case created by SPLIT. That variable<br />
is an index with values going from 1 <strong>to</strong> “nn”. Multiple indices may be created, but <strong>the</strong><br />
product of <strong>the</strong> index values (<strong>the</strong> “nn’s”) must be equal <strong>to</strong> <strong>the</strong> number of cases created<br />
by SPLIT.<br />
5. STEP nn<br />
specifies <strong>the</strong> size of steps <strong>to</strong> be taken in selecting variables <strong>to</strong> be used in <strong>the</strong> SPLIT.<br />
STEP follows a USE or CREATE list without a comma preceding it. The first variable<br />
is used, <strong>the</strong> variable “nn” away from <strong>the</strong> first is used next, and so on. Only one pass<br />
through <strong>the</strong> variable list is made.<br />
6. USE vn or list<br />
specifies <strong>the</strong> variables <strong>to</strong> be used in <strong>the</strong> split case. They must be a multiple of “nn” (<strong>the</strong><br />
number of cases in<strong>to</strong> which <strong>the</strong> current case is <strong>to</strong> be SPLIT).<br />
SPLIT and SPLIT *<br />
These are special versions of SPLIT that “uncollect” a case created by COLLECT:<br />
[ SPLIT ] or [ SPLIT * ]<br />
SPLIT produces only those cases that have at least one non-missing value of a suffixed variable, whereas<br />
SPLIT * produces all possible cases from a COLLECT, even if no such cases existed before <strong>the</strong> COL-<br />
LECT. For example, if COLLECT 10 has been used, SPLIT * results in ten cases. SPLIT, on <strong>the</strong> o<strong>the</strong>r<br />
hand, produces 10 cases only if <strong>the</strong>re are some non-missing values of <strong>the</strong> suffixed variables (Test.10,<br />
Age.10 and so on).<br />
vn=variable name nn=number list=variable list
9<br />
<strong>PPL</strong>:<br />
Modification of Character Variables<br />
Character variables may be modified in many of <strong>the</strong> same ways that numeric variables are modified. However,<br />
since character and numeric variables have different properties, <strong>the</strong>re are several opera<strong>to</strong>rs and a number of functions<br />
that are specific <strong>to</strong> character variables.<br />
This chapter briefly discusses basic character procedures — <strong>the</strong> recoding of existing character variables, <strong>the</strong><br />
generation of new character variables and <strong>the</strong> logical testing of character values. The major portion of <strong>the</strong> chapter<br />
deals with special character opera<strong>to</strong>rs and functions that:<br />
• Test character variables<br />
• Trim and pad character strings<br />
• Left and right justify or center strings;<br />
• Extract substrings and access words within character strings;<br />
• Change character strings in<strong>to</strong> numeric values and vice-versa;<br />
• Concatenate character strings.<br />
9.1 BASIC CHARACTER PROCEDURES<br />
Data may be entered in a P-<strong>STAT</strong> system file as ei<strong>the</strong>r character strings — mixtures of letters, digits and o<strong>the</strong>r<br />
characters — or as numbers. Generally, it is clear which way a variable’s value should be entered. A person’s<br />
name is entered as a character variable, whereas his or her age is entered as a numeric variable. One can find a<br />
substring of Name and <strong>the</strong> mean Age, but <strong>the</strong> substring of Age and <strong>the</strong> mean Name do not make sense.<br />
Often <strong>the</strong>re is not a <strong>to</strong>tally clear-cut line between character and numeric data. There are situations in which<br />
a variable is coded with a character string when it really has some numeric attributes. The variable Sex, for example,<br />
may be coded with <strong>the</strong> numbers 1 and 2, or with <strong>the</strong> character strings “M” and “F”, or “Male” and<br />
“Female”. If such a variable is <strong>to</strong> be used in a listing, <strong>the</strong> character representation is preferred. If <strong>the</strong> variable is<br />
<strong>to</strong> be given <strong>to</strong> a correlation program, <strong>the</strong> numeric representation is necessary. In <strong>the</strong>se situations, functions are<br />
used <strong>to</strong> convert character representations in<strong>to</strong> numeric values, and numeric values in<strong>to</strong> character strings.<br />
P-<strong>STAT</strong> distinguishes between character and numeric values by using single or double quotes <strong>to</strong> enclose character<br />
strings. Numeric values are not enclosed in quotes. Thus, ’Sam Davis’ and ’924’ are character strings,<br />
whereas 924 is a numeric value.<br />
9.2 Generating New Character Variables<br />
Character variables are generated much as numeric ones are. However, when a character variable is created, it is<br />
necessary <strong>to</strong> specify that it is a character variable and, if it is o<strong>the</strong>r than 40 characters long, <strong>to</strong> specify its size. This<br />
instruction:<br />
[ GENERATE New.Name:C32 = Name ]<br />
creates a new variable named “New.Name”, that is 32 characters long (defined size 32) and equal <strong>to</strong> <strong>the</strong> value of<br />
<strong>the</strong> existing character variable named “Name”. If <strong>the</strong> variable Name has a length less than 32, <strong>the</strong> new variable<br />
New.Name will be padded with blanks on <strong>the</strong> right end until it is 32 characters long. If Name is longer than 32,<br />
characters will be truncated from <strong>the</strong> right end until only 32 characters remain.
9.2 <strong>PPL</strong>: Modification of Character Variables<br />
Character variables may be generated equal <strong>to</strong> a specific value:<br />
[ GENERATE City:C = 'Hous<strong>to</strong>n' ]<br />
The character variable named City, of size 40, is created and set equal <strong>to</strong> <strong>the</strong> string or value “Hous<strong>to</strong>n”, followed<br />
by nine blanks. The size of City is 40, since a size was not specified. Character variables may be generated with<br />
default names:<br />
[ DO #J = 1, 6; GEN ?:C = CHARACTER ( V(#J) ); ENDDO ]<br />
A character variable may be up <strong>to</strong> 50,000 characters long.<br />
9.3 Modifying Existing Character Variables<br />
Character variables are modified in much <strong>the</strong> same manner as numeric variables. The modification may be <strong>the</strong><br />
result of some logical test or may be an instruction by itself. A variable set equal <strong>to</strong> a character string must be a<br />
character variable. This instruction sets <strong>the</strong> variable State <strong>to</strong> <strong>the</strong> value “Iowa”:<br />
[ SET State = 'Iowa' ]<br />
This instruction sets <strong>the</strong> variable State <strong>to</strong> <strong>the</strong> value of <strong>the</strong> variable State.Name:<br />
[ SET State = State.Name ]<br />
Modification may occur as <strong>the</strong> result of a logical test:<br />
[ IF .N. EQ 10, SET Name = 'John Jones' ]<br />
The system variable .N., <strong>the</strong> case number, is tested as each case is processed. On <strong>the</strong> tenth case, <strong>the</strong> variable Name<br />
is set <strong>to</strong> <strong>the</strong> character string “John Jones”.<br />
9.4 Logical Selection of Character Variables<br />
The two logical opera<strong>to</strong>rs that operate in an identical fashion for both numeric and character data are equal (EQ)<br />
and not equal (NE). The o<strong>the</strong>r logical opera<strong>to</strong>rs work in a somewhat different manner. A character string being<br />
tested must be enclosed in quotes:<br />
[ IF Name EQ 'Jones', DELETE ]<br />
The concepts of less than and greater than are different. In P-<strong>STAT</strong>, <strong>the</strong>se opera<strong>to</strong>rs are honored for character<br />
data as a function of <strong>the</strong> sort sequence of each computing environment. The results may be different on different<br />
machines if <strong>the</strong> sort sequence of <strong>the</strong> characters is different.<br />
On computers using <strong>the</strong> ASCII character set (such as PC and SUN), <strong>the</strong> low <strong>to</strong> high order is numbers, uppercase<br />
letters, and lowercase letters. Most of <strong>the</strong> special characters, in particular, blank, are “lower” than ei<strong>the</strong>r<br />
letters or numbers.<br />
P-<strong>STAT</strong>, in its character comparisons, treats uppercase and lowercase letters as identical characters, that is,<br />
“A” = “a”, unless exact comparisons are specified. Such case-respecting comparisons are specified by prefacing<br />
logical opera<strong>to</strong>rs with “X” for eXact character comparisons:<br />
[ IF Grade XEQ 'f', SET Grade = 'I' ]<br />
The logical opera<strong>to</strong>rs that may be prefaced with “X” when <strong>the</strong>y are applied <strong>to</strong> character variables and values are:<br />
EQ, NE, LT, LE, GT, GE, AMONG and NOTAMONG.<br />
Logical opera<strong>to</strong>rs that have a list as <strong>the</strong>ir argument may be used with character strings in <strong>the</strong> same way that<br />
<strong>the</strong>y are used with numeric values:<br />
[ IF Name AMONG ( 'Jones' 'Smith' 'Wills' ), RETAIN ]
<strong>PPL</strong>: Modification of Character Variables 9.3<br />
Where <strong>the</strong> operation of <strong>the</strong> function depends on <strong>the</strong> concept of less than or greater than, <strong>the</strong> result will depend on<br />
<strong>the</strong> sort sequence on <strong>the</strong> individual computing environment. This example will continue all <strong>the</strong> cases that fall in<br />
<strong>the</strong> sort sequence between AAAA and AZZZ, that is, all <strong>the</strong> A’s:<br />
IF Name AMONG ( 'AAAA' TO 'AZZZ' ), RETAIN;<br />
The character opera<strong>to</strong>rs that parallel numeric opera<strong>to</strong>rs and that may be used in logical selection are:<br />
EQ NE LT LE GT GE<br />
XEQ XNE XLT XLE XGT XGE<br />
AMONG NOTAMONG GOOD MISSING<br />
XAMONG XNOTAMONG<br />
These opera<strong>to</strong>rs are discussed in <strong>the</strong> first <strong>PPL</strong> chapter. XEQ — <strong>the</strong> most useful of <strong>the</strong> exact character opera<strong>to</strong>rs,<br />
is fur<strong>the</strong>r explained later in this chapter in <strong>the</strong> section on character opera<strong>to</strong>rs.<br />
In addition <strong>to</strong> <strong>the</strong>se opera<strong>to</strong>rs, <strong>the</strong> opera<strong>to</strong>rs CONTAINS and MATCHES, which are specifically for character<br />
data, may be used in logical selection. They test if a character value contains a specific character string and if a<br />
character value matches a character string/wildcard combination. CONTAINS and MATCHES are explained in<br />
<strong>the</strong> section on character opera<strong>to</strong>rs.<br />
9.5 Locating Non-Missing Character Data<br />
The functions COUNT.GOOD, FIRST.GOOD and LAST.GOOD count or locate non-missing data values. These<br />
functions are used with character data <strong>the</strong> same way that <strong>the</strong>y are used with numeric data. The arguments for <strong>the</strong>se<br />
functions are lists of variable names or positions.<br />
The COUNT.GOOD function yields <strong>the</strong> number of non-missing (“good”) values of <strong>the</strong> variables specified in<br />
<strong>the</strong> list:<br />
[ GENERATE Count = COUNT.GOOD ( Midterm Final ) ]<br />
The variable Count is generated and set equal <strong>to</strong> <strong>the</strong> number of non-missing test scores of <strong>the</strong> variables Midterm<br />
and Final. Count is a numeric variable, even though Midterm and Final may be character variables. For each case<br />
in this example, <strong>the</strong> maximum possible value of Count is 2.<br />
The FIRST.GOOD and LAST.GOOD functions yield <strong>the</strong> first or last non-missing value of <strong>the</strong> variables specified<br />
in <strong>the</strong> list:<br />
[ GENERATE Name:C =<br />
FIRST.GOOD ( Last.Name First.Name Middle.Name ) ]<br />
The character variable Name is generated and set equal <strong>to</strong> <strong>the</strong> first non-missing value of <strong>the</strong> character variables<br />
Last.Name, First.Name and Middle.Name. The variables referenced in <strong>the</strong> function list can be referenced by name<br />
or position. The word TO, meaning all <strong>the</strong> variables from <strong>the</strong> first mentioned variable through <strong>the</strong> last mentioned<br />
variable, or <strong>the</strong> system variable .ON., meaning all <strong>the</strong> variables from <strong>the</strong> one mentioned through <strong>the</strong> last variable<br />
in <strong>the</strong> file, may be used in <strong>the</strong> function list. Ei<strong>the</strong>r of <strong>the</strong>se instructions:<br />
[ SET Last.Guess = LAST.GOOD ( V(1) TO V(9) ) ]<br />
[ SET Last.Guess = LAST.GOOD ( V(1) .ON. ) ]<br />
sets <strong>the</strong> variable Last.Guess <strong>to</strong> <strong>the</strong> value of <strong>the</strong> last non-missing variable in <strong>the</strong> list.<br />
9.6 CHARACTER OPERATORS<br />
Character opera<strong>to</strong>rs and character functions modify character expressions, variables and strings. Opera<strong>to</strong>rs generally<br />
have two operands, one before and one after <strong>the</strong> opera<strong>to</strong>r. Functions have single or multiple arguments that<br />
follow <strong>the</strong> function and are contained in paren<strong>the</strong>ses.
9.4 <strong>PPL</strong>: Modification of Character Variables<br />
9.7 The CONTAINS and XCONTAINS Opera<strong>to</strong>rs<br />
The opera<strong>to</strong>r CONTAINS tests if a character string is contained within <strong>the</strong> value of a character variable:<br />
[ IF Address CONTAINS 'NJ', RETAIN ]<br />
If <strong>the</strong> string “NJ” is contained anywhere within <strong>the</strong> variable Address, <strong>the</strong> case is continued.<br />
CONTAINS tests for <strong>the</strong> presence or absence of a string; <strong>the</strong> location of <strong>the</strong> string may be anywhere within<br />
<strong>the</strong> specified variable. To test for <strong>the</strong> absence of a string, preface <strong>the</strong> consequence with “F.” <strong>to</strong> indicate that it is<br />
done only when <strong>the</strong> IF test is false — that is, only when <strong>the</strong> string is not contained in <strong>the</strong> variable:<br />
[ IF Address CONTAINS 'NJ', F.RETAIN ]<br />
Alternatively, DELETE could be used instead of F.RETAIN in this situation.<br />
The XCONTAINS opera<strong>to</strong>r specifies case-respecting tests. The argument string, exactly as specified, must<br />
be contained within <strong>the</strong> value of <strong>the</strong> character variable:<br />
[ IF Comment XCONTAINS '<strong>STAT</strong>', SET Code = 1 ]<br />
CONTAINS and XCONTAINS are useful in locating cases with certain value strings when you do not know <strong>the</strong><br />
complete string or when <strong>the</strong> remainder of <strong>the</strong> string differs from case-<strong>to</strong>-case.<br />
9.8 The Concatenate Opera<strong>to</strong>r<br />
Character strings can be joined using <strong>the</strong> concatenate opera<strong>to</strong>r //. This opera<strong>to</strong>r abuts <strong>the</strong> value of one variable <strong>to</strong><br />
that of ano<strong>the</strong>r:<br />
[ GENERATE Name:C32 = First.Name // Last.Name ]<br />
If First.Name is 16 characters and Last.Name is 16 characters — for example:<br />
First Name Last Name<br />
Abe Adams<br />
Millicent Murphy<br />
Sharon Elizabeth Johnson-Mayfield<br />
<strong>the</strong> variable Name, created by <strong>the</strong> concatenation of <strong>the</strong> two strings, produces <strong>the</strong> following results:<br />
Name<br />
Abe Adams<br />
Millicent Murphy<br />
Sharon ElizabethJohnson-Mayfield<br />
The concatenate opera<strong>to</strong>r joins <strong>the</strong> input strings in <strong>the</strong>ir entirety. The shorter first names may incorporate<br />
more than <strong>the</strong> desired number of blanks, and <strong>the</strong> longer names may have no intervening blanks. A blank could be<br />
included in <strong>the</strong> concatenation:<br />
[ GEN Name:C32 = First.Name // ' ' // Last.Name ]<br />
This instruction joins <strong>to</strong>ge<strong>the</strong>r <strong>the</strong> three strings, First.Name, “ ” (a blank), and Last.Name. There will be at least<br />
one blank between <strong>the</strong> first and last names. The following results would be obtained:<br />
Name<br />
Abe Adams<br />
Millicent Murphy<br />
Sharon Elizabeth Johnson-Mayfiel
<strong>PPL</strong>: Modification of Character Variables 9.5<br />
Note <strong>the</strong> truncation that resulted because <strong>the</strong> variable Name has a defined size of 32 — <strong>the</strong> final letter on <strong>the</strong> right<br />
is missing. (The squeeze concatenate opera<strong>to</strong>r, discussed next, is more appropriate for an operation of this type).<br />
Any number of strings may be concatenated. f <strong>the</strong> <strong>to</strong>tal length of <strong>the</strong> concatenated strings exceeds that of <strong>the</strong><br />
target variable, <strong>the</strong> output is truncated on <strong>the</strong> right. The things which may be joined include character variables,<br />
literals in quotes and character expressions. Character variables are items such as Name and Telephone. Literals<br />
are character strings such as “ ” (a blank), “Susan” and “945-5600”. Character expressions are <strong>the</strong> results of functions<br />
such as LAST.GOOD ( V(1) .ON. ). For example, this instruction:<br />
[ GENERATE Telephone:C11 =<br />
'1' // Area.Code // FIRST.GOOD ( Phone, Alt.Phone ) ]<br />
illustrates a literal concatenated with a variable concatenated with an expression.<br />
A character expression on <strong>the</strong> right of <strong>the</strong> equal sign, however simple or complex, produces a result whose<br />
width can range from 0 (a null string) <strong>to</strong> 50,000 characters. Only when <strong>the</strong> result is moved across <strong>the</strong> equal sign<br />
in<strong>to</strong> <strong>the</strong> target variable does a Procrustean stretching (with blanks) or truncation take place.<br />
9.9 The Trim Concatenate Opera<strong>to</strong>r<br />
The trim concatenate opera<strong>to</strong>r /// joins strings by trimming out all leading and trailing blanks in each of <strong>the</strong> strings<br />
and inserting a single blank between <strong>the</strong> strings. The concatenation of first and last names, illustrated previously<br />
using <strong>the</strong> regular concatenate opera<strong>to</strong>r, produces different results when <strong>the</strong> trim concatenate opera<strong>to</strong>r is used. This<br />
instruction,<br />
yields this result:<br />
[ GENERATE Name:C32 = First.Name /// Last.Name ]<br />
Name<br />
Abe Adams<br />
Millicent Murphy<br />
Sharon Elizabeth Johnson-Mayfiel<br />
The leading (<strong>the</strong>re were none) and trailing blanks of each name have been trimmed out, and one blank has<br />
been inserted between <strong>the</strong> names. The variable Name could be defined as :C with no specified length which provides<br />
a default value of 40. In o<strong>the</strong>r aspects, <strong>the</strong> /// opera<strong>to</strong>r works just like <strong>the</strong> // opera<strong>to</strong>r.<br />
9.10 Exactly Equal Opera<strong>to</strong>r<br />
The XEQ opera<strong>to</strong>r tests whe<strong>the</strong>r two character strings are exactly equal. The strings must be identical in case, as<br />
well as in specific characters. Normally, comparisons in P-<strong>STAT</strong> are case-independent — “BILL” equals “Bill”<br />
or “biLl”. This is useful in most situations:<br />
[ IF Last EQ 'Smi<strong>the</strong>y' AND First EQ 'Bill',<br />
SET Dependents = 1 ]<br />
Occasionally, however, a comparison that respects case is required. The XEQ opera<strong>to</strong>r is used in those situations.<br />
It is functionally similar <strong>to</strong> IVAL, described in a subsequent section.<br />
Figure 9.1 illustrates using <strong>the</strong> XEQ opera<strong>to</strong>r. Character strings, containing information about logon and logoff<br />
activity on a mainframe computer, existed in a P-<strong>STAT</strong> file. Separate counts of logons and logoffs were<br />
desired. However, <strong>the</strong> logon and logoff instructions were typically abbreviated, and <strong>the</strong> abbreviations were differentiated<br />
only by case. The XEQ opera<strong>to</strong>r specifies a test of exact equality — one that respects <strong>the</strong> case of <strong>the</strong><br />
character string. The opera<strong>to</strong>rs XNE, XLT, XLE, XGT and XGE function similarly.
9.6 <strong>PPL</strong>: Modification of Character Variables<br />
__________________________________________________________________________<br />
Figure 9.1 The XEQ Opera<strong>to</strong>r for Tests that Respect Case<br />
FILE Filelog:<br />
Text<br />
L Fred Smith<br />
L Will Roys<br />
l<br />
disc<br />
L William<br />
l<br />
L Penelope Rt<br />
PROCESS Filelog<br />
[ IF FIRST (.FILE.), GEN #LogOn = 0, GEN #LogOff = 0 ;<br />
IF TOKEN (Text) XEQ 'L', INC #LogOn ;<br />
IF TOKEN (Text) XEQ 'l', INC #LogOff;<br />
IF LAST (.FILE.),<br />
PUT #LogOn ,><br />
#LogOff > ] $<br />
There were 4 logons and 2 logoffs.<br />
__________________________________________________________________________<br />
9.11 CHARACTER FUNCTIONS<br />
There are a number of functions that are used only with character values. These functions, in alphabetical order,<br />
and <strong>the</strong> tasks <strong>the</strong>y perform, are:<br />
1. BLANK Blank out specified characters within a string.<br />
2. XBLANK Like BLANK, but case respecting.<br />
3. CAPS Capitalize <strong>the</strong> first character of each <strong>to</strong>ken.<br />
4. CENTER Center a character string.<br />
5. CHANGE Correct a substring within a string.<br />
6. CLAG Performs a lag on a character argument.<br />
7. XCHANGE Like CHANGE, but case respecting.<br />
8. CHAREX Create a character value from digits in a number.<br />
9. CHARACTER Convert a number <strong>to</strong> a character string.<br />
10. COMPRESS Squeeze out specified characters.<br />
11. CVAL Give character equivalent of specified number.<br />
12. IVAL Give number equivalent of specified character.<br />
13. LEFT Left justify a character string.<br />
14. LENGTH Locate <strong>the</strong> last non-blank character in a string.<br />
15. LOWER Convert characters <strong>to</strong> lowercase equivalents.<br />
16. NUMBER Convert a character string <strong>to</strong> a number.<br />
17. PAD Pad a character string with specified characters.<br />
18. POSITION Give <strong>the</strong> position of one string within ano<strong>the</strong>r.
<strong>PPL</strong>: Modification of Character Variables 9.7<br />
19. XPOSITION Like POSITION, but case respecting.<br />
20. RIGHT Right justify a character string.<br />
21. SIZE Determine <strong>the</strong> defined size of a character variable.<br />
22. SUBSTRING Extract substrings from character strings.<br />
23. TOKEN Access “words” within character strings.<br />
24. TRIM Trim specified characters from strings.<br />
25. UPPER Convert characters <strong>to</strong> uppercase equivalents.<br />
26. VARNAME Convert a variable name <strong>to</strong> a character value.<br />
27. VERIFY Test for unexpected characters in a string.<br />
The function name is followed by paren<strong>the</strong>ses containing one or more expressions or constants. All expressions<br />
may be complex and can consist of variable names, literals and o<strong>the</strong>r functions. Expressions may be nested<br />
within o<strong>the</strong>r expressions. The mode and number of <strong>the</strong> arguments permitted depend on <strong>the</strong> individual function.<br />
Any character expression on <strong>the</strong> right of <strong>the</strong> equal sign, no matter how simple or complex, produces a result<br />
whose width can range from 0 (a null string) <strong>to</strong> 50,000 characters. Only when <strong>the</strong> result is moved across <strong>the</strong> equal<br />
sign in<strong>to</strong> <strong>the</strong> target variable does a padding (with blanks) or truncation take place, if necessary.<br />
9.12 Centering and Justifying Strings<br />
The functions CENTER, LEFT and RIGHT affect <strong>the</strong> position of a character string within its defined field. The<br />
CENTER function centers <strong>the</strong> string. The LEFT and RIGHT functions, respectively, left and right justify <strong>the</strong><br />
strings within <strong>the</strong>ir fields:<br />
LEFT ( ' ABC' ) = 'ABC '<br />
CENTER ( 'XYZ ' ) = ' XYZ '<br />
RIGHT ( 'SPQR ' ) = ' SPQR'<br />
9.13 Changing <strong>the</strong> Case of Strings<br />
UPPER ( 'abc' ) = ABC<br />
LOWER ( 'ABC' ) = abc<br />
CAPS ( 'ann smith' ) = Ann Smith<br />
UPPER, LOWER and CAPS are <strong>the</strong> three functions which change <strong>the</strong> case of a value:<br />
SET Name = UPPER ( Name )<br />
LOWER converts a value <strong>to</strong> all lowercase characters. It is possible <strong>to</strong> nest character functions. This permits conversion<br />
of all of a name <strong>to</strong> lowercase except <strong>the</strong> first letter:<br />
SET Name = SUBSTRING ( Name, 1, 1 ) //<br />
LOWER ( SUBSTRING ( Name, 2 ) )<br />
(The SUBSTRING function is discussed subsequently.)<br />
CAPS capitalizes <strong>the</strong> initial letters of words in character variables:<br />
[ GEN Name:C = CAPS ( 'JOHN paul JoNeS' ) ]<br />
More exactly, CAPS puts all initial letters in upper case and all o<strong>the</strong>r letters in lower case. A blank is <strong>the</strong> assumed<br />
delimiter between <strong>to</strong>kens (words). The output appears like this:<br />
John Paul Jones<br />
Optionally, a second argument giving a replacement delimiter for <strong>the</strong> blank or an additional delimiter may be specified.<br />
This instruction:<br />
[ SET V(1) = CAPS ( 'abc,def,ghi', ',' ) ]<br />
produces:
9.8 <strong>PPL</strong>: Modification of Character Variables<br />
Abc,Def,Ghi<br />
The comma is specified as <strong>the</strong> <strong>to</strong>ken delimiter. It is enclosed in single or double quotes.<br />
9.14 Length and Size of Strings<br />
LENGTH ( ' abc ' ) = 5<br />
SIZE ( ' abc ' ) = 8<br />
The functions LENGTH and SIZE yield information about <strong>the</strong> actual length and <strong>the</strong> defined size of character values.<br />
LENGTH gives <strong>the</strong> location of <strong>the</strong> right-most non-blank character:<br />
[ GENERATE Count = LENGTH ( Name ) ]<br />
The SIZE function yields a numeric value giving <strong>the</strong> defined size of a character value:<br />
[ GEN Width = SIZE ( Name ) ]<br />
The variable Width is generated and set equal <strong>to</strong> <strong>the</strong> longest possible length of <strong>the</strong> variable Name. This length is<br />
<strong>the</strong> defined size of Name or <strong>the</strong> size resulting after various character function procedures or operations.<br />
9.15 Locating Strings Within Variables<br />
POSITION ( 'ABC', 'B' ) = 2<br />
POSITION ( 'ABC', 'X' ) = 0<br />
XPOSITION ( 'ABab', 'ab' ) = 3<br />
VERIFY ( 'ABCDE', 'AEIOU' ) = 2<br />
The POSITION, XPOSITION and VERIFY functions yield a numeric value which is <strong>the</strong> location of a string within<br />
a character value. The simpler usage of POSITION has an expression and a character string as arguments:<br />
[ GEN Blank.Location = POSITION ( Name, ' ' ) ]<br />
The numeric variable Blank.Location is generated and set equal <strong>to</strong> <strong>the</strong> location of <strong>the</strong> first occurrence of a blank<br />
in <strong>the</strong> variable Name. The second argument may be a character variable whose value is <strong>the</strong> string <strong>to</strong> be located:<br />
[ GEN Locale = POSITION ( Address, Zip ) ]<br />
The variable Locale is <strong>the</strong> location of <strong>the</strong> value of Zip (<strong>the</strong> zip code string) within <strong>the</strong> variable Address. If <strong>the</strong><br />
string is not located, <strong>the</strong> result is zero. Values match regardless of whe<strong>the</strong>r <strong>the</strong>y are uppercase or lowercase.<br />
The more complex usage of POSITION permits searches for multiple strings. The left-most position of any<br />
successfully located string is given as <strong>the</strong> function result. The arguments for POSITION are <strong>the</strong> expression and<br />
<strong>the</strong> character strings <strong>to</strong> be located:<br />
[ GEN XX = POSITION ( Name, 'Jr.', 'Sr.', 'Esq.' ) ]<br />
The variable XX is <strong>the</strong> location of <strong>the</strong> left-most occurrence of any of <strong>the</strong> specified strings.<br />
An optional argument for length may be provided. It should be right-most in <strong>the</strong> argument list:<br />
[ SET Extra = POSITION ( Phone, ' ()-/.', 1 ) ]<br />
The contents of <strong>the</strong> character strings, whose positions are being sought, are divided in<strong>to</strong> strings of <strong>the</strong> specified<br />
length. Thus, portions of strings are treated as separate arguments. In <strong>the</strong> preceding example, <strong>the</strong> variable Extra<br />
is set equal <strong>to</strong> <strong>the</strong> position of <strong>the</strong> first occurrence of any of <strong>the</strong> characters in <strong>the</strong> search string. The “1” specifies<br />
that each single character in <strong>the</strong> search string is itself a search string. The length argument must be an integer<br />
between 1 and 50,000. The number of characters in <strong>the</strong> search string must be evenly divisible by <strong>the</strong> length.<br />
XPOSITION is just like POSITION, except that <strong>the</strong> case (upper, lower or mixed) of <strong>the</strong> character string whose<br />
position is being sought is respected:<br />
[ GEN Fatal = XPOSITION ( Symp<strong>to</strong>m, 'D' ) ]
<strong>PPL</strong>: Modification of Character Variables 9.9<br />
The variable Fatal is generated and set equal <strong>to</strong> <strong>the</strong> position of upper-case “D”; lower-case “d” is ignored. XPO-<br />
SITION may be used with <strong>the</strong> same types of arguments as POSITION.<br />
The VERIFY function returns <strong>the</strong> location of <strong>the</strong> first character in <strong>the</strong> initial arguments that is not in any of<br />
<strong>the</strong> remaining arguments:<br />
[ GEN BAD = VERIFY ( 'ABCDE', 'EA', 'B' ) ]<br />
BAD is set <strong>to</strong> 3, since its third character, “C”, is not in any of <strong>the</strong> remaining arguments. Thus, <strong>the</strong> presence of only<br />
specified characters may be verified. Multiple character string arguments are permitted, although each character<br />
is considered as a separate string.<br />
9.16 Extracting Substrings and Words<br />
SUBSTRING ( 'ABCDE', 3, 2 ) = 'CD'<br />
TOKEN ( 'Ann Smith' ) = 'Ann'<br />
The SUBSTRING and TOKEN functions access portions of character strings. SUBSTRING accesses a string<br />
starting at a specified location and of a given length. TOKEN accesses “words” within a character string — that<br />
is, strings delimited by blanks or ano<strong>the</strong>r specified character.<br />
SUBSTRING requires an expression, a start location and a length as arguments:<br />
[ GEN Initial:C1 = SUBSTRING ( First.Name, 1, 1 ) ]<br />
The variable Initial is generated as a character variable with a defined size of 1. It is set equal <strong>to</strong> <strong>the</strong> substring of<br />
First.Name, beginning at <strong>the</strong> character in position 1 and having a length of 1. The third argument giving <strong>the</strong> length<br />
is optional. When it is omitted, <strong>the</strong> assumption is that <strong>the</strong> rest of <strong>the</strong> string is needed. If <strong>the</strong> third argument is<br />
omitted when <strong>the</strong> second argument is 1, <strong>the</strong> entire input string is <strong>the</strong> substring.<br />
An expression may be used as ei<strong>the</strong>r <strong>the</strong> location or length argument in SUBSTRING:<br />
[ GEN #Len = LENGTH ( Phone.No ) ]<br />
[ GEN Code:C2 = SUBSTRING ( Phone.No, #Len-1, 2 ) ]<br />
The variable Code is set equal <strong>to</strong> <strong>the</strong> two right-most digits in <strong>the</strong> telephone number. (The scratch variable #Len<br />
is <strong>the</strong> length of <strong>the</strong> phone number. #Len-1 identifies <strong>the</strong> start location as <strong>the</strong> next-<strong>to</strong>-last digit, and 2 is <strong>the</strong> length<br />
of <strong>the</strong> substring.)<br />
The TOKEN function accesses words or strings of characters within a character variable. The strings are typically<br />
separated by blanks:<br />
[ GEN First.Name:C = TOKEN ( Name ) ]<br />
The variable First.Name is <strong>the</strong> first word in <strong>the</strong> variable Name.<br />
Optional arguments for <strong>the</strong> TOKEN function access specific words and specify <strong>the</strong> delimiter between words.<br />
These instructions,<br />
[ GEN First.Name:C = TOKEN ( Name, 1 ) ;<br />
GEN Middle.Name:C = TOKEN ( Name, 2 ) ;<br />
GEN Last.Name:C = TOKEN ( Name, 3 ) ;<br />
IF Last.Name MISSING,<br />
SET Last.Name = Middle.Name,<br />
SET Middle.Name = ' ' ]<br />
access <strong>the</strong> second and third <strong>to</strong>kens in Name, as well as <strong>the</strong> first. When <strong>the</strong> second argument is omitted, <strong>the</strong> first<br />
<strong>to</strong>ken is accessed. Note that accessing a <strong>to</strong>ken that is not present (<strong>the</strong> third <strong>to</strong>ken in a name that has only two <strong>to</strong>kens)<br />
yields a missing value.<br />
A delimiter o<strong>the</strong>r than <strong>the</strong> blank may be specified:<br />
[ GEN Year.of.Birth:C = TOKEN ( Birthdate, 3, '/' ) ]
9.10 <strong>PPL</strong>: Modification of Character Variables<br />
In this example, <strong>the</strong> slash is specified as <strong>the</strong> <strong>to</strong>ken delimiter. Assuming Birthdate has values such as 11/19/68, <strong>the</strong><br />
value of <strong>the</strong> third <strong>to</strong>ken, 68, will be used as <strong>the</strong> value of Year.of.Birth.<br />
TOKEN accesses <strong>to</strong>kens counting from <strong>the</strong> left. TOKEN is a synonym for LTOKEN. RTOKEN accesses<br />
<strong>to</strong>kens counting from <strong>the</strong> right. In all o<strong>the</strong>r aspects, it is <strong>the</strong> same as LTOKEN.<br />
The NTOKEN function yields a count of <strong>to</strong>kens within a character string. The result is a numeric value, not<br />
a character string. This instruction:<br />
[ GEN #Number = NTOKEN ( Address ) ]<br />
generates a scratch variable #Number that equals <strong>the</strong> number of words in <strong>the</strong> variable Address. The delimiter is<br />
assumed <strong>to</strong> be a blank unless a second argument specifies one or more alternate delimiters:<br />
[ GEN Number.Read = NTOKEN ( Magazines, ', ' ) ]<br />
A comma and a blank are specified as <strong>the</strong> <strong>to</strong>ken delimiters. The variable Number.Read is <strong>the</strong> number of strings<br />
separated by commas and/or blanks in <strong>the</strong> variable Magazines.<br />
9.17 Blanking Out and Changing Strings<br />
BLANK ( 'abcde', 'bd' ) = 'a de'<br />
BLANK ( 'abcde', 2, 2 ) = 'a de'<br />
CHANGE ( 'abcde', 'bc', '999' ) = 'a999de'<br />
CHANGE ( 'abcde', 2, 3, '2222' ) = 'a222e'<br />
The BLANK, XBLANK, CHANGE and XCHANGE functions alter character variables by replacing portions<br />
with ei<strong>the</strong>r blanks or specified new strings. This provides <strong>the</strong> ability <strong>to</strong> delete or replace substrings. BLANK and<br />
CHANGE ignore <strong>the</strong> case of character strings; XBLANK and XCHANGE respect <strong>the</strong> eXact case of character<br />
strings.<br />
The BLANK function has two usage modes. The simpler usage specifies an expression, a starting location<br />
for <strong>the</strong> blank string, and <strong>the</strong> length of <strong>the</strong> string. This instruction:<br />
[ GEN Birthday:C4 = BLANK ( Date.of.Birth, 5, 2 ) ]<br />
replaces <strong>the</strong> character string beginning with <strong>the</strong> fifth character and of length 2 with two blanks. The variable Birthday<br />
becomes '1215 ' instead of '121545'.<br />
The third argument, giving <strong>the</strong> length of <strong>the</strong> string being blanked out, may be omitted. The characters from<br />
<strong>the</strong> start location through <strong>the</strong> end will be replaced by blanks:<br />
LIST Vocab.Test<br />
[ KEEP Vocab.Words Definitions ;<br />
SET Vocab.Words = BLANK ( Vocab.Words, 2 ) ] $<br />
This listing will have only <strong>the</strong> initial letter of each vocabulary word and <strong>the</strong> definitions.<br />
The alternate usage of <strong>the</strong> BLANK function specifies a particular character string <strong>to</strong> be replaced by blanks,<br />
ra<strong>the</strong>r than a location and length. XBLANK may be used with this type of argument. The first occurrence of <strong>the</strong><br />
string is blanked out:<br />
BLANK ( 'abcde', 'CD' ) produces 'ab e'<br />
XBLANK ( 'abcde', 'CD' ) produces 'abcde'<br />
Multiple occurrences of a character string may be blanked out. An optional third argument <strong>to</strong> <strong>the</strong> BLANK<br />
function specifies <strong>the</strong> maximum number of occurrences of <strong>the</strong> string <strong>to</strong> be replaced:<br />
[ SET Comments = BLANK ( Comments, 'damn', 10 ) ]<br />
Up <strong>to</strong> ten occurrences of <strong>the</strong> word “damn” in <strong>the</strong> variable Comments will be replaced with an equivalent number<br />
of blanks. The size of <strong>the</strong> resultant variable always remains <strong>the</strong> same when <strong>the</strong> BLANK function is used.
<strong>PPL</strong>: Modification of Character Variables 9.11<br />
The CHANGE and XCHANGE functions have two usage modes comparable <strong>to</strong> those of <strong>the</strong> BLANK function.<br />
The most common usage of CHANGE specifies an expression, an old string and a new string. The first<br />
occurrence of <strong>the</strong> old string is replaced with <strong>the</strong> new string:<br />
[ SET State = CHANGE ( State, 'TX', 'Texas' ) ]<br />
XCHANGE is <strong>the</strong> same as CHANGE, but <strong>the</strong> case of <strong>the</strong> old string must be exactly as specified or it is not replaced<br />
by <strong>the</strong> new string.<br />
The old and new strings may be specified with expressions, which are variable names, literals and o<strong>the</strong>r<br />
functions:<br />
[ IF New.Area.Code GOOD, SET Phone.No =<br />
CHANGE ( Phone.No, Area.Code, New.Area.Code ) ]<br />
The area code in <strong>the</strong> phone number is changed <strong>to</strong> <strong>the</strong> new area code, unless <strong>the</strong> new one is missing. The value of<br />
Area.Code is <strong>the</strong> old string and a good (non-missing) value of New.Area.Code is <strong>the</strong> new string. If <strong>the</strong> value of<br />
Area.Code is not found in Phone.No, no change is made.<br />
Multiple changes may be specified. An optional fourth argument gives <strong>the</strong> maximum number of changes:<br />
[ SET Title = CHANGE ( Title, ' ', '.', 999 ) ]<br />
All blanks are changed <strong>to</strong> periods, including leading or trailing blanks.<br />
If a new character string is not specified, <strong>the</strong> old substring is removed, making <strong>the</strong> length of <strong>the</strong> result smaller.<br />
This instruction removes occurrences of <strong>the</strong> word “damn”:<br />
[ SET Comments = CHANGE ( Comments, 'damn', 10 ) ]<br />
The alternate usage of <strong>the</strong> CHANGE function specifies an expression, a starting location of a string, <strong>the</strong> length<br />
of <strong>the</strong> string, and a new string. The new string may or may not be <strong>the</strong> same length as <strong>the</strong> old one:<br />
[ IF SUBSTRING ( Telephone, 4, 3 ) = '897',<br />
SET Telephone = CHANGE ( Telephone, 4, 3, '807' ) ]<br />
The CHANGE and XCHANGE functions may make a value longer or shorter. This does not matter until <strong>the</strong><br />
final resulting value is moved across <strong>the</strong> equal sign. At that time, truncation or blank padding will occur as needed.<br />
9.18 Squeezing Out Specified Characters<br />
COMPRESS ( '12/35/95', '/' ) = '122595'<br />
COMPRESS ( '..AB...CD..', 1, '.' ) = '.AB.CD.'<br />
The COMPRESS function squeezes out ei<strong>the</strong>r blanks or specified characters. This instruction:<br />
[ SET SS.Number = COMPRESS ( SS.Number ) ]<br />
squeezes out all leading, trailing and embedded blanks contained in <strong>the</strong> variable SS.Number. Only non-blank<br />
characters remain.<br />
An expanded usage mode of <strong>the</strong> COMPRESS function permits specification of <strong>the</strong> number of delimiters that<br />
may remain between <strong>to</strong>kens (strings or words), and <strong>the</strong> delimiter character or characters that separate <strong>to</strong>kens. This<br />
instruction will leave only a single blank between words wherever one or more blanks are found:<br />
[ SET Sentence = COMPRESS ( Sentence, 1 ) ]<br />
This next instruction will generate a numeric variable from a character one, after all <strong>the</strong> specified characters are<br />
squeezed out:<br />
[ GEN <strong>Inc</strong>ome = NUMBER ( COMPRESS ( <strong>Inc</strong>ome, ',$' ) ) ]<br />
The <strong>to</strong>ken delimiters are specified as <strong>the</strong> comma and <strong>the</strong> currency sign. Since <strong>the</strong> second argument, <strong>the</strong> number<br />
of delimiters that may remain, is missing, zero is assumed. All specified delimiters are removed, leaving only
9.12 <strong>PPL</strong>: Modification of Character Variables<br />
numbers (it is hoped). The NUMBER function (discussed subsequently) converts <strong>the</strong> resultant string in<strong>to</strong> a numeric<br />
value.<br />
9.19 Trimming Strings<br />
TRIM ( ' abcd ' ) = ' abcd'<br />
LTRIM ( ' abcd ' ) = 'abcd '<br />
LRTRIM ( ' abcd ' ) = 'abcd'<br />
TRIM ( 'abc ***', '*' ) = 'abc '<br />
TRIM ( 'abc ***', 2, '*' ) = 'abc *'<br />
The TRIM functions remove ei<strong>the</strong>r blanks or one or more specified characters from one or both ends of a<br />
string. TRIM is a synonym for RTRIM — it trims blanks or characters from <strong>the</strong> right end of a string. LTRIM<br />
trims from <strong>the</strong> left end and LRTRIM trims from both ends.<br />
The TRIM functions have a character expression and a single character or a string of characters as arguments<br />
and an optional number that limits <strong>the</strong> number of characters <strong>to</strong> be trimmed. The string of trim characters is optional<br />
and, when it is not present, blank trim characters are assumed. In this example, blank characters are trimmed<br />
from <strong>the</strong> right end of <strong>the</strong> variable First.Name before concatenating it with a blank and variable Last.Name:<br />
[ SET Name = TRIM ( First.Name ) // ' ' // Last.Name ]<br />
Multiple trim characters may be specified:<br />
[ GEN Text:C = TRIM ( TRIM (Var1), '.,-' ) ]<br />
Any of <strong>the</strong> three specified punctuation marks that occur on <strong>the</strong> right end of values of Var1 will be trimmed off,<br />
yielding values of <strong>the</strong> new variable Text. Values of “hippo-”, “closing,” and “also...” will become “hippo”, “closing”<br />
and “also”. The resultant value may have a shorter length than it did prior <strong>to</strong> trimming. Notice that a simple<br />
TRIM is used <strong>to</strong> remove excess blanks first, so that <strong>the</strong> punctuation is right-most and <strong>the</strong>refore able <strong>to</strong> be trimmed.<br />
Note that:<br />
TRIM ( VAR1, '., -' )<br />
which adds a blank <strong>to</strong> <strong>the</strong> trim characters, is slightly different; it not only trims initial blanks on <strong>the</strong> right, it also<br />
trims blanks after o<strong>the</strong>r trim characters have been found. I.e., 'ab. - , ' would become 'ab' instead of<br />
<strong>the</strong>'ab. - ' which results when <strong>the</strong> simple TRIM of blanks is done before <strong>the</strong> TRIM of <strong>the</strong> punctuation<br />
characters.<br />
9.20 Padding Strings<br />
PAD ( 'abcd', 6 ) = 'abcd '<br />
LPAD ( 'abcd', 7, '-' ) = '---abcd'<br />
PAD ( 'abcd', 3 ) = 'abcd'<br />
LRPAD ( 'abcd', 9, '-' ) = '--abcd---'<br />
The PAD functions add blanks or a specified character <strong>to</strong> <strong>the</strong> right (PAD or RPAD), <strong>to</strong> <strong>the</strong> left (LPAD), or <strong>to</strong> both<br />
ends of a string (LRPAD).<br />
The PAD functions have a character expression, a minimum length, and a fill character as arguments. Only<br />
<strong>the</strong> character expression is required. When <strong>the</strong> third argument, <strong>the</strong> fill character, is omitted in any of <strong>the</strong> PAD<br />
functions, a blank character is assumed.<br />
[ GEN ABC:C = PAD ( TRIM ( V(5) ), 10, '-' ) ]<br />
The trimmed form of variable 5 is padded with dashes <strong>to</strong> a width of 10. If <strong>the</strong> trimmed form is already 10 or more,<br />
no dashes are added. Then, when <strong>the</strong> result is moved across <strong>the</strong> equal sign in<strong>to</strong> variable ABC, it will be fur<strong>the</strong>r<br />
padded with blanks if its length is less than 16.<br />
When <strong>the</strong> second argument, <strong>the</strong> length, is omitted, a length of 1 is assumed:
<strong>PPL</strong>: Modification of Character Variables 9.13<br />
[ SET Heading = PAD ( TRIM (Heading), '.' ) ]<br />
Blanks are trimmed from <strong>the</strong> end of variable Heading. If necessary, <strong>the</strong> resultant value is padded with a dot <strong>to</strong><br />
bring it up <strong>to</strong> a length of 1. Therefore, only values of Heading that are completely blank, and thus become null<br />
strings when <strong>the</strong>y are trimmed, are padded. This usage of PAD may be useful in locating blank values or in avoiding<br />
production of null strings, which may interfere with o<strong>the</strong>r procedures.<br />
PAD is often used with TRIM. Consider:<br />
[ GEN aa:c16 = 'cow';<br />
GEN bb:c16 = LRPAD ( aa, 16, '-' ) ]<br />
This LRPAD has no effect, because variable aa, being C16, literally contains 'cow '. Variable<br />
bb will contain <strong>the</strong> same thing because <strong>the</strong> input <strong>to</strong> LRPAD already has 16 characters. However:<br />
[ GEN aa:c16 = 'cow';<br />
GEN bb:c16 = LRPAD ( LRTIRM ( aa ), 16, '-' ) ]<br />
gives just 'cow' <strong>to</strong> <strong>the</strong> LRPAD function, so it will set bb <strong>to</strong> '------cow-------'<br />
In this vein, if LRPAD ( LRTRIM ( aa ), 40, '-' ) were used, LRPAD would cheerfully produce 18 dashes<br />
'cow', and 19 dashes. Then, because bb is character 16, it is truncated <strong>to</strong> <strong>the</strong> first 16 of those 40 characters, i.e., a<br />
series of 16 dashes.<br />
9.21 Converting Numbers <strong>to</strong> Characters and Vice Versa<br />
NUMBER ( '12' // '3' ) = 123<br />
CHARACTER ( 123 ) = '123'<br />
CHAREX ( 122596, '00XX00' ) = '25'<br />
The NUMBER function converts character strings containing digits in<strong>to</strong> numbers. The CHARACTER and CHA-<br />
REX functions convert numeric values in<strong>to</strong> character representations. However, CHARACTER converts an entire<br />
numeric variable, whereas CHAREX extracts and converts only specified digits.<br />
The NUMBER function converts a character value containing digits in<strong>to</strong> numeric form. NUMBER requires<br />
a character expression as its argument:<br />
[ SET Year = NUMBER ( SUBSTRING ( Date, 7, 2 ) ) ]<br />
This instruction takes <strong>the</strong> seventh and eighth characters from <strong>the</strong> character variable Date and converts <strong>the</strong>m in<strong>to</strong><br />
numeric form. For example, when Date has <strong>the</strong> value “09/15/29”, Year has <strong>the</strong> value 29.<br />
If <strong>the</strong> character value of <strong>the</strong> argument is all blank, <strong>the</strong> number is set <strong>to</strong> missing type 1. If <strong>the</strong> result of <strong>the</strong><br />
character expression is not numeric, <strong>the</strong> number is set <strong>to</strong> missing type 2. If <strong>the</strong> input character value is missing,<br />
<strong>the</strong> number is set <strong>to</strong> missing type 3.<br />
There are three forms of <strong>the</strong> number function. They differ only when an invalid (missing 2) result occurs.<br />
NUMBER does not print any warnings when invalid values are found. NUMBER.W prints a diagnostic warning.<br />
NUMBER.E produces an error message, which ends <strong>the</strong> command at that moment.<br />
A numeric variable for a date, suitable as input <strong>to</strong> <strong>the</strong> DAYS function, may be generated. Assuming Date is<br />
a character string of <strong>the</strong> form 11/19/68, this instruction:<br />
[ GENERATE Date2 = NUMBER.W ( COMPRESS ( Date, '/' ) ) ]<br />
squeezes out all slashes in <strong>the</strong> values of Date and converts <strong>the</strong>m <strong>to</strong> numbers that can be input <strong>to</strong> <strong>the</strong> DAYS function<br />
<strong>to</strong> compute differences between dates. A warning message is issued if Date contains any invalid characters, such<br />
as “-” or “ ” (blank).<br />
The CHARACTER function converts numbers in<strong>to</strong> character strings. It requires a numeric expression as its<br />
argument. This instruction:
9.14 <strong>PPL</strong>: Modification of Character Variables<br />
[ GEN ID:C11 =<br />
CHARACTER ( Class ) // CHARACTER ( SS.Num ) ]<br />
generates a character variable named ID whose value is <strong>the</strong> concatenation of <strong>the</strong> character representations of <strong>the</strong><br />
numeric variables Class and SS.Num.<br />
It is possible <strong>to</strong> use CHARACTER followed by a second argument indicating <strong>the</strong> number of decimal places<br />
<strong>to</strong> preserve in <strong>the</strong> expression. This is particularly useful if you have income values with decimal places carried up<br />
<strong>to</strong> four places and you wish <strong>to</strong> specify only two decimal places. The second argument indicates <strong>the</strong> number of<br />
places <strong>to</strong> carry in <strong>the</strong> expression:<br />
LIST Salary [ GEN <strong>Inc</strong>ome:C = CHARACTER ( Salary, 2 ) ] $<br />
CHARACTER may also be followed by a third argument which indicates <strong>the</strong> maximum number of places <strong>to</strong> preserve.<br />
In <strong>the</strong> following example, all values have a minimum of two decimal places and those with sufficient digits<br />
in <strong>the</strong> decimal portion have a maximum of three places:<br />
LIST Salary [ GEN <strong>Inc</strong>ome:C = CHARACTER ( Salary, 2, 3 ) ] $<br />
The CHAREX function extracts specific digits from a numeric value and yields a character representation of<br />
those digits. CHAREX operates only on <strong>the</strong> integer portion of <strong>the</strong> number — any fractional portion and sign are<br />
ignored. The two required arguments are a numeric expression and a character string selection mask enclosed in<br />
quotes:<br />
([GEN Month:C2 = CHAREX ( Date, 'XX00' ) ]<br />
The selection mask is composed of X and 0 (zero) characters and may be up <strong>to</strong> ten characters in length. An<br />
X retains a digit and a 0 drops a digit. The selection mask is aligned with <strong>the</strong> right-most digit of <strong>the</strong> numeric value.<br />
Thus, <strong>the</strong> selection mask “X0X” applied <strong>to</strong> <strong>the</strong> numeric value 840921 yields <strong>the</strong> character representation “91”.<br />
The selection mask “XX00X” applied <strong>to</strong> “156” yields “006” because lead zeros pad <strong>the</strong> numeric value until it is<br />
<strong>the</strong> length of <strong>the</strong> mask. The numeric function NUMEX is similar <strong>to</strong> CHAREX, but it yields a numeric result.<br />
9.22 Character/Integer Translation<br />
CVAL ( 65 ) = 'A'<br />
IVAL ( 'A' ) = 65<br />
The <strong>PPL</strong> functions, CVAL and IVAL, short for character value and integer value, translate a character <strong>to</strong> an integer<br />
and vice versa. This permits non-printing characters <strong>to</strong> be inserted in<strong>to</strong> text strings output by <strong>the</strong> PUT instruction,<br />
<strong>the</strong> LIST command or <strong>the</strong> TITLES command. Also, all kinds of character values can be compared precisely by<br />
referring <strong>to</strong> <strong>the</strong>ir integer codes.<br />
The CVAL function requires an integer between 0 and 255 as its argument. It returns <strong>the</strong> character equivalent<br />
of that integer. The IVAL function requires a character value of any size as its argument. It returns <strong>the</strong> integer<br />
equivalent of <strong>the</strong> first character.<br />
Figure 9.2 illustrates using <strong>the</strong> CVAL function <strong>to</strong> output non-printing codes with text <strong>to</strong> a printer. The example<br />
codes are appropriate for a personal computer and an Lexmark printer. Despite <strong>the</strong> fact that <strong>the</strong> codes are<br />
specific for <strong>the</strong>se machines, <strong>the</strong> example illustrates <strong>the</strong> general procedure of imbedding codes in text strings. The<br />
character equivalent of 27:<br />
CVAL (27)<br />
is an escape code. Many printers require that an escape code, a non-printing signal, precede alphanumeric characters<br />
<strong>to</strong> specify various printer parameters. An escape code followed by <strong>the</strong> character “G”:<br />
(CVAL(27)) 'G' 'Bold On' (CVAL(27)) 'H' 'Bold Off'<br />
turns on <strong>the</strong> double-strike print mode; escape H turns it off. All text printed after escape G is double-struck until<br />
escape H is processed. O<strong>the</strong>r printer instructions require an escape code followed by <strong>the</strong> character equivalent of<br />
an integer. This is a form feed:
<strong>PPL</strong>: Modification of Character Variables 9.15<br />
(CVAL(27)) (CVAL(12))<br />
__________________________________________________________________________<br />
Figure 9.2 The CVAL Function for Bells and Whistles<br />
Enter a command:<br />
>> <strong>PPL</strong> (PUT<br />
@PAGE (CVAL(27)) 'G' 'Bold On' (CVAL(27)) 'H' 'Bold Off'<br />
@SKIP (CVAL(27)) 'M' 'Elite On' (CVAL(27)) 'P' 'Elite Off'<br />
@SKIP (CVAL(27)) (CVAL(12)) 'Form Feed'<br />
@SKIP (CVAL(27)) 'R' (CVAL(7)) 'International Character Set On'<br />
@SKIP (CVAL(221)) 'Hable Ud. Espa' (CVAL(252)) 'ol?'<br />
@SKIP (CVAL(27)) 'R' (CVAL(7)) 'International Character Set Off'<br />
@SKIP (CVAL(27)) (CVAL(14)) 'Enlarged Print On'<br />
@SKIP (CVAL(27)) '@' 'Initialize Printer'<br />
@SKIP (CVAL(27)) (CVAL(7)) 'Ring Bell' ),<br />
PR LPT1 $<br />
__________________________________________________________________________<br />
Still o<strong>the</strong>r instructions require an escape code, a character, and <strong>the</strong> character equivalent of an integer as an<br />
option. This selects <strong>the</strong> Spanish character set on <strong>the</strong> printer:<br />
(CVAL(27)) 'R' (CVAL(7)) 'International Character Set On'<br />
(CVAL(221)) 'Hable Ud. Espa' (CVAL(252)) 'ol?'<br />
(CVAL(27)) 'R' (CVAL(7)) 'International Character Set Off'<br />
The upside-down question mark and <strong>the</strong> Spanish “n” print in <strong>the</strong> subsequent text, and <strong>the</strong>n <strong>the</strong> default character<br />
set is res<strong>to</strong>red. This re-initializes <strong>the</strong> printer <strong>to</strong> <strong>the</strong> normal defaults,<br />
(CVAL(27)) '@'<br />
so that subsequent users are not unduly surprised.<br />
Notice that <strong>the</strong> <strong>PPL</strong> command is used <strong>to</strong> process <strong>PPL</strong> (P-<strong>STAT</strong> programming language) instructions. An input<br />
file is not required and an output file is not produced. The <strong>PPL</strong> command exists solely <strong>to</strong> process <strong>PPL</strong><br />
instructions, such as <strong>the</strong> PUT instruction used in this example. PUT places text strings on <strong>the</strong> output device, which<br />
is <strong>the</strong> printer in this example:<br />
PR LPT1 $<br />
(“LPT1” is <strong>the</strong> default name for <strong>the</strong> printer on many personal computers.) The TEXT.WRITER or PROCESS<br />
commands could be used if <strong>the</strong> output text strings were <strong>to</strong> incorporate values from a P-<strong>STAT</strong> system file. Both<br />
TEXT.WRITER and PROCESS require an input file. Any P-<strong>STAT</strong> command could be used — <strong>the</strong> input filename<br />
is followed directly by <strong>PPL</strong> clauses containing <strong>PPL</strong> instructions. The instructions containing “@” control<br />
text placement. See <strong>the</strong> chapter on <strong>the</strong> TEXT.WRITER command for specifics.
9.16 <strong>PPL</strong>: Modification of Character Variables<br />
__________________________________________________________________________<br />
Figure 9.3 Nesting Functions<br />
File People:<br />
Name Birthdate<br />
Susan 07/08/56<br />
Marc 01/26/52<br />
David 03/31/59<br />
>> SORT People<br />
[ GEN #Day =<br />
DAYS ( NUMBER ( COMPRESS ( Birthdate, '/' ) ), 'MMDDYY' ) ;<br />
GEN #Today = DAYS ( .NDATE., 'YYYYMMDD' ) ;<br />
GEN Age = INT ( ( 1 + #Today - #Day ) / 365.25 ) ],<br />
BY Age,<br />
OUT People.By.Age $<br />
>> LIST $<br />
Name Birthdate Age<br />
David 03/31/59 53<br />
Susan 07/08/56 55<br />
Marc 01/26/52 60<br />
__________________________________________________________________________<br />
9.23 Complex Character Expressions<br />
The arguments for character functions are expressions, which must be enclosed in paren<strong>the</strong>ses and separated by<br />
commas. The simplest expression is a variable name or position. Complex expressions are nested functions, numeric<br />
constants, quoted character constants (literals or strings), or combinations of <strong>the</strong>se. Combining character<br />
opera<strong>to</strong>rs and functions in a series of instructions and procedures permits complex manipulation of character<br />
variables.<br />
Figure 9.3 illustrates how numeric and character functions can be used <strong>to</strong>ge<strong>the</strong>r <strong>to</strong> create a numeric variable,<br />
Age, given a character variable Birthdate. COMPRESS is used <strong>to</strong> squeeze out <strong>the</strong> slashes from Birthdate. The<br />
result from COMPRESS is <strong>the</strong> input <strong>to</strong> <strong>the</strong> NUMBER function. The result from <strong>the</strong> NUMBER function is <strong>the</strong> first<br />
argument for <strong>the</strong> DAYS function.<br />
Scratch variables are used for <strong>the</strong> intermediate computations. They are not necessary. The entire series of<br />
nested functions can be placed in a single <strong>PPL</strong> phrase:<br />
[ GEN Age = INT ( ( 1 + DAYS ( .NDATE., 'YYYYMMDD' ) -<br />
DAYS ( NUMBER ( COMPRESS ( Birthdate, '/' )), 'MMDDYY' ))<br />
/ 365.25 ) ],<br />
When functions are nested this way, <strong>the</strong> possibility for an error in logic is greater than it is when <strong>the</strong> process is<br />
broken up in<strong>to</strong> several smaller steps.
<strong>PPL</strong>: Modification of Character Variables 9.17<br />
The example in Figure 9.3 was run on June 22, 2012. It should be noted that <strong>the</strong> DAYS function for .NDATE.<br />
uses 'YYYYMMDD'. This is needed because .NDATE. produces a 4-digit year.<br />
9.24 Using <strong>the</strong> Name of a Variable as a Character Value<br />
The VARNAME function provides <strong>the</strong> name of a variable:<br />
[ GEN Last.Missing:C = .M. ;<br />
DO #L USING Test.1 TO Test.8 ;<br />
IF V(#L) MISSING,<br />
SET Last.Missing = VARNAME ( #L ) ;<br />
ENDDO ]<br />
The character variable Last.Missing is generated and set equal <strong>to</strong> missing. The values of Test.1 through Test.8 are<br />
tested — each one that is missing causes <strong>the</strong> recoding of Last.Missing <strong>to</strong> <strong>the</strong> name of <strong>the</strong> variable with <strong>the</strong> missing<br />
value. Last.Missing would have values of “Test.4”, “Test.7”, “-” (missing), and so on.<br />
Figure 9.4 illustrates a more complex usage of <strong>the</strong> VARNAME function. Here, a more compact and informative<br />
listing of <strong>the</strong> file Patients is desired. Five new variables named d.Heart, d.Liver, and so on, are created,<br />
each one set <strong>to</strong> <strong>the</strong> name of <strong>the</strong> corresponding variable in <strong>the</strong> DO loop:<br />
DO #J USING Heart TO Back;<br />
GEN ?( 'd.' & ):C = VARNAME (#J) ;<br />
ENDDO ;<br />
At <strong>the</strong> end of this step, <strong>the</strong> file has twelve variables, Id, Name, Heart through Back, and d.Heart through d.Back.<br />
The values of <strong>the</strong> first case are:<br />
1001 Jones 0 1 0 1 0 Heart Liver Kidney Brain Back<br />
The next step tests <strong>the</strong> five 0/1 variables Heart through Back, and, if any are equal <strong>to</strong> zero, sets <strong>the</strong> corresponding<br />
d.variable <strong>to</strong> missing:<br />
DO #J USING Heart TO Back ;<br />
IF V(#J) EQ 0, SET V( #J+5 ) = .M1. ;<br />
ENDDO ]<br />
At <strong>the</strong> end of <strong>the</strong> second step, <strong>the</strong> first case contains:<br />
1001 Jones 0 1 0 1 0 - Liver - Brain -<br />
Variables Heart through Back are no longer needed, so <strong>the</strong>y are dropped from <strong>the</strong> file.<br />
SPLIT is <strong>the</strong>n used <strong>to</strong> break each patient case in<strong>to</strong> five cases, one for each of <strong>the</strong> disease situations. After <strong>the</strong><br />
SPLIT, <strong>the</strong> cases pertaining <strong>to</strong> patient Jones are:<br />
Id Name Disease<br />
1001 Jones -<br />
1001 Jones Liver<br />
1001 Jones -<br />
1001 Jones Brain<br />
1001 Jones -
9.18 <strong>PPL</strong>: Modification of Character Variables<br />
__________________________________________________________________________<br />
Figure 9.4 Using VARNAME, SPLIT and COLLECT<br />
File Patients:<br />
Id Name Heart Liver Kidney Brain Back<br />
1001 Jones 0 1 0 1 0<br />
1002 Brown 1 0 0 0 0<br />
1003 Davis 0 1 1 0 1<br />
1004 Mason 0 1 1 0 0<br />
1009 Smith 1 0 0 1 0<br />
LIST Patients<br />
[ DO #J USING Heart TO Back;<br />
GEN ?( 'd.' & ):C = VARNAME (#J);<br />
ENDDO;<br />
DO #J USING *;<br />
IF V(#J) EQ 0, SET V( #J+5 ) = .M1. ;<br />
ENDDO ]<br />
[ DROP Heart TO Back;<br />
SPLIT INTO 5, CARRY ( Id Name ), CREATE Disease d.?;<br />
IF Disease MISSING, DELETE ]<br />
[ COLLECT 3, BY Id, CARRY Name ] $<br />
Disease Disease Disease<br />
ID Name .1 .2 .3<br />
1001 Jones Liver Brain -<br />
1002 Brown Heart - -<br />
1003 Davis Liver Kidney Back<br />
1004 Mason Liver Kidney -<br />
1009 Smith Heart Brain -<br />
__________________________________________________________________________<br />
Next any case that has a missing value for Disease is deleted, leaving only two cases for Jones and no more than<br />
three cases for any patient (<strong>the</strong> maximum number of diseases observed for any patient). The final step:<br />
[ COLLECT 3, BY Id, CARRY Name ]<br />
collects <strong>the</strong> maximum of three cases for each patient back in<strong>to</strong> a single case. Since Jones had only two medical<br />
problems, he has a missing value for Disease.3.<br />
9.25 The MATCHES and XMATCHES Opera<strong>to</strong>rs<br />
The opera<strong>to</strong>r MATCHES tests if a pattern matches <strong>the</strong> value of a character variable. The pattern is composed of<br />
characters and symbols, such as <strong>the</strong> wildcard character “*”. The pattern is supplied within single or double quotes:<br />
[ IF Company MATCHES 'Con* Ed*', RETAIN ]
<strong>PPL</strong>: Modification of Character Variables 9.19<br />
This instruction selects cases from <strong>the</strong> file shown in Figure 9.5 with <strong>the</strong> following values of Company:<br />
Con Ed coned Consolidated Education, <strong>Inc</strong>.<br />
Consulted, <strong>Inc</strong>. Connie Edward Conway Medics<br />
__________________________________________________________________________<br />
Figure 9.5 File of Character Data for MATCHES and XMATCHES<br />
Company<br />
Con Ed<br />
coned<br />
Super Coned<br />
Corn Fed<br />
Ed Con<br />
Consolidated Education, <strong>Inc</strong>.<br />
Consulted, <strong>Inc</strong>.<br />
Con Ed<br />
Connie Edward<br />
Conway Medics<br />
*CON ED*<br />
Connie E. Dean, Assoc.<br />
__________________________________________________________________________<br />
It does not select:<br />
Super Coned Corn Fed Ed Con<br />
Con Ed *CON ED* Connie E. Dean, Assoc.<br />
This simple, common usage of MATCHES, with <strong>the</strong> wildcard character “*” in <strong>the</strong> pattern, illustrates several<br />
basic rules of matching:<br />
1. The pattern is anchored — that is, a match of <strong>the</strong> first character in <strong>the</strong> pattern is sought in <strong>the</strong> first<br />
character of <strong>the</strong> character value. (This means that lead or left-most blanks count.)<br />
2. The wildcard “*” matches zero or more occurrences of any character, including blanks.<br />
3. Spaces or blanks inside <strong>the</strong> pattern are ignored, unless <strong>the</strong>y are escaped — enclosed in < >.<br />
4. Case is not considered, unless XMATCHES is used.<br />
The pattern may be unanchored by using <strong>the</strong> wildcard “*” as <strong>the</strong> first character in <strong>the</strong> pattern:<br />
[ IF Company MATCHES '*Con* Ed*', RETAIN ]<br />
This instruction selects:<br />
Con Ed coned Consolidated Education, <strong>Inc</strong>.<br />
Consulted, <strong>Inc</strong>. Connie Edward Conway Medics<br />
Super Coned Con Ed *CON ED*<br />
Trailing or right-most blanks are ignored. This instruction, without <strong>the</strong> “*” as <strong>the</strong> final character in <strong>the</strong> pattern,<br />
selects:<br />
[ IF Company MATCHES '*Con* Ed', RETAIN ]<br />
Con Ed coned Super Coned Con Ed<br />
A complete set of pattern symbols (meta-characters) and syntax exists for MATCHES and XMATCHES,<br />
making possible any arbitrary selections. Additional <strong>PPL</strong> may be used with <strong>the</strong>se opera<strong>to</strong>rs <strong>to</strong> provide conse-
9.20 <strong>PPL</strong>: Modification of Character Variables<br />
quences after selections, recode data values and fur<strong>the</strong>r refine <strong>the</strong> selection criteria. The next section discusses <strong>the</strong><br />
MATCHES meta-characters and syntax, and Figure 9.6 summarizes <strong>the</strong>m.<br />
9.26 MATCHES: Meta-Characters and Syntax<br />
The asterisk “*”, which matches zero or more occurrences of any character, is <strong>the</strong> most useful and general wildcard<br />
character. Additional wildcard characters fur<strong>the</strong>r limit <strong>the</strong> pattern <strong>to</strong> be matched. The at-sign “@” matches<br />
zero or more blanks. It is useful if <strong>the</strong>re may be lead blanks that should be ignored. For example, this instruction:<br />
selects:<br />
selects:<br />
[IF Company MATCHES '@Con* Ed', RETAIN ]<br />
Con Ed coned<br />
Con Ed<br />
The question mark “?” matches any single character. This instruction:<br />
[ IF Company MATCHES '?Con* Ed?', RETAIN ]<br />
*CON ED*<br />
O<strong>the</strong>r wildcards match specific single characters. The crosshatch or number sign “#” matches any single digit, <strong>the</strong><br />
dollar sign “$” matches any single letter, and <strong>the</strong> underscore “_” matches a single blank. This instruction:<br />
selects this case:<br />
[ IF Company MATCHES 'Con_Ed', RETAIN ]<br />
Con Ed<br />
The underscore matches <strong>the</strong> single blank in <strong>the</strong> center. Strings with lead blanks are not selected because <strong>the</strong> pattern<br />
is anchored on <strong>the</strong> left. Trailing blanks are ignored.<br />
Character strings that contain meta-characters may be matched by escaping <strong>the</strong> meta-characters. Escaping<br />
removes <strong>the</strong> special meaning of a meta-character. The backslash “\” and <strong>the</strong> angle signs “< >” are escape characters.<br />
Any character directly after <strong>the</strong> slash or enclosed between <strong>the</strong> angle signs is treated as a literal character:<br />
[ IF Company MATCHES '\* * ', RETAIN ]<br />
This instruction selects:<br />
*CON ED*<br />
The first and third asterisks in <strong>the</strong> pattern are literal characters. They match only asterisks. The middle asterisk<br />
is a meta-character that matches zero or more of any characters. Thus, a string of two or more characters that begins<br />
and ends with an asterisk is selected.<br />
The escape characters are also used <strong>to</strong> match characters that may not print. The characters are referenced by<br />
<strong>the</strong>ir decimal or octal (base 8) integer equivalents. The slash is used for octal numbers and <strong>the</strong> angle signs are used<br />
for decimal numbers. This instruction:<br />
[ IF Company MATCHES '* *', RETAIN ]<br />
selects character strings containing a tab character in <strong>the</strong>m. (009 is <strong>the</strong> decimal equivalent of <strong>the</strong> tab character in<br />
<strong>the</strong> ASCII character codes.)<br />
Paren<strong>the</strong>ses and square brackets are enclosures that specify, respectively, a literal string of characters and a<br />
single character <strong>to</strong> match. A literal string of characters may be specified with or without paren<strong>the</strong>ses. These two<br />
instructions are equivalent:
<strong>PPL</strong>: Modification of Character Variables 9.21<br />
__________________________________________________________________________<br />
Figure 9.6 MATCHES and XMATCHES: Meta-Characters<br />
In General:<br />
< > [ ] * @ _ ? # $ 0 1 +<br />
Within [ ]:<br />
^ - _ # $ ]<br />
Within ( ):<br />
| _ )<br />
Escape Characters:<br />
\ < ><br />
Pattern Syntax:<br />
@ zero or more blanks<br />
* zero or more of any character<br />
? a single character<br />
# a single digit<br />
$ a single letter<br />
_ a single blank<br />
a literal character (an asterisk )<br />
\# a literal character (a crosshatch)<br />
a decimal number<br />
\009 an octal number<br />
Enclosures:<br />
( abc ) a literal string of characters<br />
abc same as ( abc )<br />
( abc | xyz ) abc or xyz<br />
[ abc ] a single character: a or b or c<br />
[ a-z ] a single letter in <strong>the</strong> range a through z<br />
[ $ ] same as [ a-z ]<br />
[ 0-9 ] a single number in <strong>the</strong> range 0 through 9<br />
[ # ] same as [ 0-9 ]<br />
[ _ ] a single blank<br />
[ ^$ ] a single character that is NOT a letter<br />
Repetitions after Enclosures:<br />
1 1 a single match (<strong>the</strong> default)<br />
0 1 zero or one matches<br />
0 + zero or more matches<br />
1 + one or more matches<br />
0 same as 0 1<br />
+ same as 1 +<br />
__________________________________________________________________________
9.22 <strong>PPL</strong>: Modification of Character Variables<br />
[ IF Company MATCHES '(Con) * (Ed) *', DELETE ]<br />
[ IF Company MATCHES ' Con * Ed *', DELETE ]<br />
Notice that <strong>the</strong> blanks in <strong>the</strong> pattern contribute <strong>to</strong> its readability. Blanks are ignored unless <strong>the</strong>y are escaped, and<br />
<strong>the</strong>y may be omitted if desired.<br />
Paren<strong>the</strong>ses are typically used when one character string or ano<strong>the</strong>r is <strong>to</strong> be matched:<br />
[ IF Company MATCHES '(Con | Corn) * Ed', RETAIN ]<br />
This instruction selects:<br />
Con Ed coned Corn Fed<br />
The vertical bar character “|” means “or” and <strong>the</strong> paren<strong>the</strong>ses limit <strong>the</strong> character strings that are <strong>to</strong> be “or-ed”. Note<br />
that merely <strong>the</strong> juxtaposition of character strings means “and” — that is, this pattern:<br />
[ IF Company MATCHES 'Con Corn * Ed', RETAIN ]<br />
matches only character values that have “ConCorn” followed by zero or more of any character followed by “Ed”.<br />
If a blank is sought between “Con” and “Corn”, <strong>the</strong> pattern should be specified with an underscore:<br />
[ IF Company MATCHES 'Con_Corn * Ed', RETAIN ]<br />
Square bracket enclosures specify a single character <strong>to</strong> match. Typically, that character may be one of several<br />
in <strong>the</strong> enclosure that is repeated a specified number of times. For example, this instruction:<br />
[ IF Company MATCHES 'Con [ _ n s ] * Ed *', RETAIN ]<br />
selects <strong>the</strong>se values:<br />
Con Ed Consolidated Education, <strong>Inc</strong>.<br />
Consulted, <strong>Inc</strong>. Connie Edward<br />
The string “Con” is followed by a blank, an “n” or an “s” and that is followed by zero or more of any characters,<br />
<strong>the</strong> string “Ed” and zero or more of any characters.<br />
A repetition specification may follow ei<strong>the</strong>r type of enclosure. Possible repetitions are: 11, 01, 0+ and 1+.<br />
The repetition 1 1 is <strong>the</strong> default that is assumed when nothing follows an enclosure — one match. The repetition<br />
0 1 means zero or one matches, 0 + means zero or more matches and 1 + means one or more matches. This<br />
instruction:<br />
selects:<br />
This:<br />
selects:<br />
[ IF Company MATCHES 'Co (n)1 + *', RETAIN ]<br />
Con Ed coned Consolidated Education, <strong>Inc</strong>.<br />
Consulted, <strong>Inc</strong>. Connie Edward Conway Medics<br />
Connie E. Dean, Assoc.<br />
[ IF Company MATCHES 'Co (n)0 1 *', RETAIN ]<br />
Con Ed coned Consolidated Education, <strong>Inc</strong>.<br />
Consulted, <strong>Inc</strong>. Connie Edward Conway Medics<br />
Corn Fed Connie E. Dean, Assoc.<br />
Notice that <strong>the</strong> repetition 1 + matches one or more occurrences of “n” and that 0 1 matches zero or one occurrence.<br />
Thus, Corn Fed is included in <strong>the</strong> second group of matches (it has zero occurrences of “n” after <strong>the</strong> “Co”). The
<strong>PPL</strong>: Modification of Character Variables 9.23<br />
strings with two occurrences of “n” are also included because of <strong>the</strong> wildcard “*” that makes any character valid<br />
after <strong>the</strong> zero or one “n”.<br />
It is also possible <strong>to</strong> specify a range from which a single character is valid as a match and characters that are<br />
not valid as matches. The square brackets are used. This instruction:<br />
selects:<br />
[ IF Company MATCHES '[ ^$ ^# ] *', RETAIN ]<br />
*CON ED*<br />
Con Ed<br />
The meta-character “^” means not. Thus, <strong>the</strong> enclosure above specifies a single character that is not a letter and<br />
not a number as <strong>the</strong> first character. (The “$” means any letter and <strong>the</strong> “#” means any number.) This instruction<br />
means <strong>the</strong> same thing, but uses ranges in <strong>the</strong> pattern instead of <strong>the</strong> meta-characters “$” and “#”:<br />
[ IF Company MATCHES '[ ^a-z ^A-Z ^0-9 ] *', RETAIN ]<br />
The hyphen “-” is used in ranges. When <strong>the</strong> “^” is omitted, <strong>the</strong>n any character in <strong>the</strong> specified range is a valid<br />
match. Figure 9.6 summarizes <strong>the</strong> meta-characters.<br />
9.27 CLAG: A Lag using a character argument<br />
CLAG is a function that performs a lag on a character argument, which can be an expression.<br />
GEN PREVIOUS.TITLE:c30 = CLAG( JOB.TITLE, 12 )<br />
This would take <strong>the</strong> JOB.TITLE value from 12 cases ago and copy it in<strong>to</strong> PREVIOUS.TITLE of <strong>the</strong> current case.<br />
The second argument, <strong>the</strong> lag depth, must be an integer constant from 1 <strong>to</strong> 500.<br />
9.28 CONCATENATION OF CHARACTER CONSTANTS<br />
There is a special opera<strong>to</strong>r (&&) that permits dynamic concatenation of character constants in a command or in<br />
<strong>PPL</strong>. It is most useful in situations such as macros where <strong>the</strong>re is an 80 character limit on record size.<br />
MAKE Myfile, FILE<br />
<br />
&&<br />
;<br />
There is no particular limit on <strong>the</strong> number of pieces and <strong>the</strong>y can be enclosed in ei<strong>the</strong>r angle brackets or quotation<br />
marks.<br />
[ GEN text:c200 =<br />
<br />
&&<br />
"The sentence in <strong>the</strong> text contains a single >. "<br />
&&<br />
'A third piece is needed <strong>to</strong> complete <strong>the</strong> variable.' ]<br />
In command text and in <strong>PPL</strong> <strong>the</strong> && structure may be used anywhere that a character constant<br />
may be used.
9.24 <strong>PPL</strong>: Modification of Character Variables<br />
<strong>PPL</strong><br />
SUMMARY<br />
Character variables are modified by functions and opera<strong>to</strong>rs. Some are specifically for character variables<br />
and o<strong>the</strong>rs may be used with ei<strong>the</strong>r character or numeric variables. Functions and opera<strong>to</strong>rs are<br />
grouped below according <strong>to</strong> <strong>the</strong>ir usages.<br />
<strong>PPL</strong> Functions: Character<br />
The arguments for character functions are expressions. The simplest expression is a variable name or<br />
position (vnp). Complex expressions are nested functions, numeric constants (nn), quoted character constants<br />
or strings ('cs'), or combinations of <strong>the</strong>se.<br />
Abbreviations following <strong>the</strong> functions indicate <strong>the</strong> type of argument that should result from <strong>the</strong> evaluation<br />
of <strong>the</strong> expression.<br />
BLANK (exp, loc, len)<br />
specifies a character value that is <strong>to</strong> have blank characters replace existing characters. The second argument<br />
in <strong>the</strong> BLANK function gives <strong>the</strong> start location. The third argument, which is optional, gives <strong>the</strong><br />
length of <strong>the</strong> area <strong>to</strong> be made blank. This example:<br />
[ GEN New.Tel:C8 = BLANK ( Tel, 5, 4 ) ]<br />
will replace characters five through eight of a telephone number with blanks. When <strong>the</strong> third argument<br />
is omitted, <strong>the</strong> expression is filled with blanks from <strong>the</strong> start location through <strong>the</strong> end.<br />
This example will blank out all but <strong>the</strong> first letter of Last.Name:<br />
LIST Diet.Clients<br />
[ KEEP Last.Name Weight Pounds.Lost ;<br />
SET Last.Name = BLANK ( Last.Name, 2 ) ] $<br />
An alternate usage mode exists for <strong>the</strong> BLANK function. Its general format is:<br />
BLANK (exp, old, nn)<br />
The first argument specifies a character value that may contain a specified substring; if so, it is <strong>to</strong> be<br />
blanked. The second argument yields <strong>the</strong> character string that may be present in <strong>the</strong> first expression.<br />
When it is present, it is replaced by blank characters. The optional third argument is <strong>the</strong> number of times<br />
<strong>to</strong> find and replace <strong>the</strong> character string; one change is assumed. This <strong>PPL</strong> phrase:<br />
[ SET Comments = BLANK ( Comments, 'damn', 10 ) ]<br />
replaces up <strong>to</strong> ten occurrences of <strong>the</strong> word “damn” in <strong>the</strong> variable Comments with an equivalent number<br />
of blanks.<br />
XBLANK (exp, old, nn)<br />
blanks out specified characters — functions just like BLANK's alternate mode of operation, but respects<br />
<strong>the</strong> case (upper, lower or mixed) of <strong>the</strong> “old” string:<br />
[ SET Symp<strong>to</strong>m = XBLANK ( Symp<strong>to</strong>m, 'D', 9 ) ]<br />
Upper-case “D” is blanked out in values of Symp<strong>to</strong>m; lower-case “d" is ignored.<br />
loc=location len=length lim=delimiter exp=expression nn=number cs=char string
<strong>PPL</strong>: Modification of Character Variables 9.25<br />
CAPS (exp)<br />
capitalizes <strong>the</strong> first letter of each <strong>to</strong>ken (word) in a character value:<br />
[ GEN Name:C = CAPS ( 'JOHN paul JoNeS' ) ]<br />
Letters o<strong>the</strong>r than <strong>the</strong> first are changed <strong>to</strong> lower case. The default is that <strong>to</strong>kens are separated by blanks.<br />
The output appears like this:<br />
John Paul Jones<br />
In a more complex usage, an additional or replacement <strong>to</strong>ken delimiter is supplied in quotes as a second<br />
argument:<br />
[ GEN Name:C = CAPS ( 'ann hayden-jones', '- ' ) ]<br />
CENTER (exp)<br />
centers a character value in its field:<br />
[ SET Surname = CENTER ( Surname ) ]<br />
CHANGE (exp, old, new, nn)<br />
specifies a character value possibly containing an “old” character string that is <strong>to</strong> be changed <strong>to</strong> a “new”<br />
character string. The first argument <strong>to</strong> <strong>the</strong> CHANGE function is a character expressio.n The second argument<br />
is <strong>the</strong> old string and <strong>the</strong> third is <strong>the</strong> new string. An optional fourth argument is <strong>the</strong> maximum<br />
number of changes <strong>to</strong> make per value; one change is assumed.<br />
In this example:<br />
[ SET College = CHANGE ( College, 'University', 'Univ', 3 ) ]<br />
<strong>the</strong> character variable College will have old values of “University” changed <strong>to</strong> new values of “Univ”. A<br />
maximum of three such changes per value of College is specified. The resultant values of College will<br />
be shorter wherever this change is made, and <strong>the</strong> listing may be more attractive with <strong>the</strong> abbreviation.<br />
CHANGE, without a new argument, removes <strong>the</strong> old string:<br />
[ SET College = CHANGE ( College, 'ersity', 3 ) ]<br />
The old string “ersity” is changed <strong>to</strong> a null string. This (probably) achieves <strong>the</strong> same result as <strong>the</strong> previous<br />
example.<br />
An alternate usage of <strong>the</strong> CHANGE function has <strong>the</strong> format:<br />
CHANGE ( exp, loc, len, new )<br />
The first argument is a character expression, <strong>the</strong> second is <strong>the</strong> start location of <strong>the</strong> old string, <strong>the</strong> third is<br />
<strong>the</strong> length of <strong>the</strong> old string, and <strong>the</strong> fourth is <strong>the</strong> new character string:<br />
[ SET Date = CHANGE ( Date, 7, 2, '85' ) ]<br />
Values of Date in <strong>the</strong> form 11/08/84 are changed <strong>to</strong> 11/08/85.<br />
XCHANGE (exp, old, new, nn)<br />
changes character strings — functions just like CHANGE, but respects <strong>the</strong> case of <strong>the</strong> “old” string:<br />
[ SET Sal = CHANGE ( Sal, 'm', 'Ms.', 1 ) ]<br />
Only lower-case “m” would be changed.<br />
CHARACTER (exp)<br />
converts a number in<strong>to</strong> its character equivalent:<br />
[ GEN Code:C3 = CHARACTER ( Area.Code ) ]<br />
exp=expression nn=number cs=char string loc=location len=length lim=delimiter
9.26 <strong>PPL</strong>: Modification of Character Variables<br />
It is possible <strong>to</strong> use CHARACTER followed by a second argument indicating <strong>the</strong> number of decimal<br />
places <strong>to</strong> preserve in <strong>the</strong> expression. This might be useful if you have income values with decimal places<br />
carried up <strong>to</strong> four places and you wish <strong>to</strong> specify only two decimal places. The second argument indicates<br />
<strong>the</strong> number of places <strong>to</strong> carry in <strong>the</strong> expression:<br />
LIST Salary [ GEN <strong>Inc</strong>ome:C = CHARACTER ( Salary, 2 ) ] $<br />
CHARACTER may also be followed by a third argument that indicates <strong>the</strong> maximum number of places<br />
<strong>to</strong> print:<br />
LIST Salary [ GEN <strong>Inc</strong>ome:C = CHARACTER ( Salary, 2, 3 ) ] $<br />
MAKE.CHARACTER described in <strong>the</strong> manual “P-<strong>STAT</strong>: File Management” can be used <strong>to</strong> change a<br />
numeric variable in<strong>to</strong> a character variable or <strong>to</strong> resize a character variable.<br />
CHAREX (exp, 'XX00')<br />
extracts specific digits from a numeric value and yields a character representation of those digits. CHA-<br />
REX operates only on <strong>the</strong> integer portion of <strong>the</strong> number — any fractional portion and sign are ignored.<br />
The two required arguments are a numeric expression and a character string mask enclosed in quotes:<br />
[ GEN Month:C2 = CHAREX ( Date, 'XX00' ) ]<br />
The selection mask is composed of X and 0 (zero) characters and may be up <strong>to</strong> twelve characters in<br />
length. An X retains a digit and a 0 drops a digit. The selection mask is aligned with <strong>the</strong> right-most digit<br />
of <strong>the</strong> numeric value. The numeric function NUMEX does much <strong>the</strong> same thing but yields a numeric<br />
result without <strong>the</strong> lead zeros.<br />
CLAG (exp, nn )<br />
CLAG is function that performs a lag on a character argument, which can be an expression.<br />
COMPRESS (exp)<br />
squeezes all blanks out of a character value:<br />
[ SET Text = COMPRESS ( Text ) ]<br />
Leading, trailing and embedded blanks are removed.<br />
An expanded mode of usage has optional arguments:<br />
COMPRESS ( exp, nn, lim )<br />
The second argument is <strong>the</strong> number of delimiter characters that should remain between <strong>to</strong>kens (“words”).<br />
The third argument is an alternate delimiter character or characters o<strong>the</strong>r than <strong>the</strong> blank — a <strong>to</strong>ken separa<strong>to</strong>r.<br />
This example:<br />
[ SET Text = COMPRESS ( Text, 1 ) ]<br />
squeezes out all blanks but one from between words. This:<br />
[ GEN Amount = NUMBER ( COMPRESS ( Money, ', $' ) ) ]<br />
generates a numeric variable, Amount, equal <strong>to</strong> <strong>the</strong> character variable Money after all commas, blanks<br />
and currency signs are compressed out.<br />
CVAL (nn)<br />
gives <strong>the</strong> character equivalent of <strong>the</strong> specified decimal number. This is used when unusual characters<br />
that cannot be entered or printed on <strong>the</strong> terminal screen are desired. Often <strong>the</strong>se characters can be produced<br />
on a printer. An example of this is:<br />
[ PUT (CVAL(27)) 'R' (CVAL(7))<br />
(CVAL(221)) “Hable Ud. Espa” (CVAL(252)) “ol?” ]<br />
loc=location len=length lim=delimiter exp=expression nn=number cs=char string
<strong>PPL</strong>: Modification of Character Variables 9.27<br />
where <strong>the</strong> CVAL of <strong>the</strong> number 27 followed by “R” and <strong>the</strong> CVAL of 7 specifies <strong>the</strong> Spanish international<br />
character set. The CVAL of <strong>the</strong> numbers 221 and 252 yields <strong>the</strong> upside-down question mark and<br />
<strong>the</strong> Spanish “n”, respectively.<br />
IVAL ('c')<br />
gives <strong>the</strong> integer equivalent of <strong>the</strong> first character of a character value. This is <strong>the</strong> opposite of CVAL, described<br />
above. Since <strong>the</strong> integer equivalents of uppercase and lowercase characters are different, IVAL<br />
can be used in tests of equality of character values that respect case. (See <strong>the</strong> XEQ opera<strong>to</strong>r also.)<br />
LEFT (exp)<br />
left-justifies a character value in its field:<br />
[ SET Street.Address = LEFT ( Street.Address ) ]<br />
LENGTH (exp)<br />
yields a numeric value length, which is <strong>the</strong> location of <strong>the</strong> right-most non-blank character:<br />
[ GEN HS.Length = LENGTH ( High.School ) ]<br />
(Leading and embedded blanks are included in <strong>the</strong> count.)<br />
LOWER (exp)<br />
converts a character value <strong>to</strong> lowercase characters:<br />
[ SET Region = LOWER ( Region ) ]<br />
NUMBER (exp)<br />
converts a character value in<strong>to</strong> a number:<br />
[ GEN Year = NUMBER ( SUBSTRING ( Date, 7, 2 ) ) ]<br />
If <strong>the</strong> character value is all blank, <strong>the</strong> result is set <strong>to</strong> missing type 1. If <strong>the</strong> character value contains characters<br />
o<strong>the</strong>r than numbers, <strong>the</strong> result is set <strong>to</strong> missing type 2. If <strong>the</strong> character value is missing, <strong>the</strong> result<br />
is set <strong>to</strong> missing type 3.<br />
NUMBER may be suffixed with ei<strong>the</strong>r “.W” or “.E”. NUMBER.W issues a warning and NUMBER.E<br />
s<strong>to</strong>ps <strong>the</strong> command with an error message when <strong>the</strong> character value contains characters o<strong>the</strong>r than<br />
numbers.<br />
If you wish <strong>to</strong> change <strong>the</strong> type of a character variable <strong>to</strong> numeric, you may use <strong>the</strong> MAKE.NUMERIC<br />
command which is described in manual “P-<strong>STAT</strong>” File Management”.<br />
PAD (exp, len, fill)<br />
specifies a character value which is <strong>to</strong> be “padded” on <strong>the</strong> right side. The first argument <strong>to</strong> PAD is <strong>the</strong><br />
character expression <strong>to</strong> be padded, <strong>the</strong> second is <strong>the</strong> minimum length, and <strong>the</strong> third is an optional fill character<br />
<strong>to</strong> be used for padding. If only one argument is supplied, a minimum length of 1 and a blank fill<br />
character are assumed. This example:<br />
[ GEN Zip:C10 = PAD ( Zipcode, 10, '-' ) ]<br />
pads <strong>the</strong> variable Zipcode with dashes on <strong>the</strong> right side. If Zipcode initially has a length of five, it is padded<br />
until its length is ten. If Zipcode initially has a length of ten or more characters, it is not be padded.<br />
RPAD is a synonym for PAD.<br />
LPAD (exp, len, fill)<br />
specifies a character value which is <strong>to</strong> be “padded” with blanks or a supplied fill character on <strong>the</strong> left side.<br />
This example:<br />
exp=expression nn=number cs=char string loc=location len=length lim=delimiter
9.28 <strong>PPL</strong>: Modification of Character Variables<br />
[ SET Message = LPAD ( LRTRIM ( Message ), 16, '>' ) ]<br />
pads <strong>the</strong> trimmed variable Message with <strong>the</strong> “>” character on <strong>the</strong> left side. See also PAD and LRPAD.<br />
LRPAD (exp, len, fill)<br />
specifies a character value, a minimum length, and an optional fill character <strong>to</strong> be used for padding <strong>the</strong><br />
character value. Padding will occur evenly on both <strong>the</strong> left and right sides of <strong>the</strong> expression. The right<br />
side will be padded first. If <strong>the</strong> character expression is already equal <strong>to</strong> or greater than <strong>the</strong> specified<br />
length, no padding will take place. A length of 1 and a blank fill character are assumed when none are<br />
specified. See also LPAD and PAD. This is often used on an LRTRIM result.<br />
POSITION (exp, 'cs')<br />
yields a numeric value which is <strong>the</strong> position of <strong>the</strong> character string within <strong>the</strong> character value:<br />
[ GEN Blank.Location = POSITION ( Name, ' ' ) ]<br />
Values match regardless of whe<strong>the</strong>r <strong>the</strong>y are uppercase or lowercase. If <strong>the</strong> second value is not found,<br />
<strong>the</strong> result is zero.<br />
A more complex usage of POSITION is also possible. The general form is:<br />
POSITION ( exp, 'cs', 'cs', 'cs',..., len )<br />
Additional optional arguments are multiple character strings whose positions in <strong>the</strong> character variable are<br />
sought. Only <strong>the</strong> left-most position of any successfully located string is given.<br />
An integer between 1 and 50,000 may be supplied as <strong>the</strong> right-most argument giving a length. It permits<br />
<strong>the</strong> contents of <strong>the</strong> character strings, whose positions are being sought, <strong>to</strong> be divided in<strong>to</strong> strings of <strong>the</strong><br />
specified length. Each portion of <strong>the</strong> divided string is treated as a separate argument and its position is<br />
located. The character values must be evenly divisible by <strong>the</strong> length.<br />
Examples of usages include:<br />
POSITION ( City.State, ',' , '.' )<br />
POSITION ( 'ABCDEF', 'AC', 'BC', 'DE', 'DF' )<br />
POSITION ( City.State, 'NYNJ', 2 )<br />
POSITION ( 'ABCDEF', 'AEIOU', 1 )<br />
XPOSITION (exp, 'cs')<br />
gives <strong>the</strong> position of <strong>the</strong> specified character string in <strong>the</strong> value — works just like POSITION, but respects<br />
<strong>the</strong> case of <strong>the</strong> character string in searching.<br />
RIGHT (exp)<br />
right-justifies a character value in its field:<br />
[ SET Zip = RIGHT ( Zip) ]<br />
SIZE (exp)<br />
yields a numeric value giving <strong>the</strong> size of <strong>the</strong> specified character value. The size includes any blanks, embedded<br />
or o<strong>the</strong>rwise. It is typically ei<strong>the</strong>r <strong>the</strong> defined size or <strong>the</strong> size resulting after various character<br />
function procedures or operations.<br />
SUBSTRING (exp, loc, len)<br />
yields a character value which is <strong>the</strong> string beginning in <strong>the</strong> location specified by <strong>the</strong> second argument<br />
and of <strong>the</strong> length specified by <strong>the</strong> third argument. If <strong>the</strong> optional starting location is not given, it is assumed<br />
<strong>to</strong> be 1. If <strong>the</strong> optional length is not given, it is assumed <strong>to</strong> be <strong>the</strong> remainder of <strong>the</strong> string. For<br />
example:<br />
loc=location len=length lim=delimiter exp=expression nn=number cs=char string
<strong>PPL</strong>: Modification of Character Variables 9.29<br />
[ GEN Initial:C1 = SUBSTRING ( LEFT ( Name ), 1, 1 ]<br />
yields <strong>the</strong> substring of name beginning at <strong>the</strong> first character and one character long.<br />
TOKEN (exp, nn, lim)<br />
accesses a portion of a longer character string:<br />
[ GEN First.Name:C = TOKEN ( Name ) ]<br />
The first <strong>to</strong>ken starts with <strong>the</strong> first non-delimiter on <strong>the</strong> left and continues until a subsequent delimiter is<br />
found. TOKEN accesses <strong>the</strong> first <strong>to</strong>ken unless <strong>the</strong> optional second argument specifies a <strong>to</strong>ken in ano<strong>the</strong>r<br />
position. The assumed delimiter is a blank unless <strong>the</strong> optional third argument specifies ano<strong>the</strong>r delimiter<br />
or delimiters. LTOKEN is a synonym for TOKEN.<br />
RTOKEN (exp, nn, lim)<br />
accesses <strong>to</strong>kens counting from <strong>the</strong> right:<br />
[ GEN Last.Name:C = RTOKEN ( Name ) ;<br />
IF RTOKEN ( Name ) CONTAINS '.',<br />
SET Last.Name = RTOKEN ( Name, 2 ) ]<br />
NTOKEN (exp, lim)<br />
yields a numeric value that is a count of <strong>the</strong> <strong>to</strong>kens in <strong>the</strong> first character value. If <strong>the</strong> optional second<br />
argument is not provided, <strong>the</strong> delimiter between <strong>to</strong>kens is assumed <strong>to</strong> be <strong>the</strong> blank:<br />
[ GEN #Middle.Name:C;<br />
GEN #Number = NTOKEN ( Name ) ;<br />
IF #Number GT 2,<br />
SET Middle.Name = TOKEN ( Name, 2 ) ]<br />
TRIM (exp, nn, 'cs')<br />
specifies a character value that may have characters trimmed from <strong>the</strong> right side, and <strong>the</strong> characters <strong>to</strong><br />
trim. An optional number limits that number of characters <strong>to</strong> be trimmed. The resultant character value<br />
will have a shorter size if <strong>the</strong> specified characters exist on <strong>the</strong> right and trimming occurs. If a trim character<br />
is not specified, <strong>the</strong> blank character is assumed:<br />
[ SET Name = TRIM ( Name ) ]<br />
In this example, <strong>the</strong> variable Name will be set equal <strong>to</strong> Name with all blank characters trimmed off from<br />
<strong>the</strong> right side. Multiple trim characters may be specified:<br />
[ GEN Text:C = TRIM ( Var1, '.,-' ) ]<br />
Any of <strong>the</strong> specified characters occurring on <strong>the</strong> right end of <strong>the</strong> variable Var1 will be removed. RTRIM<br />
is a synonym for TRIM. See also LTRIM and LRTRIM.<br />
LTRIM (exp, nn, 'cs')<br />
specifies a character value that may have characters trimmed from <strong>the</strong> left side, and <strong>the</strong> characters <strong>to</strong> trim.<br />
All matching characters will be trimmed unless <strong>the</strong> optional second argument specifies a limit. See also<br />
RTRIM and LRTRIM.<br />
LRTRIM (exp, nn, 'cs')<br />
specifies a character value that may have characters trimmed from both <strong>the</strong> left and right sides. The characters<br />
<strong>to</strong> be trimmed are an optional argument. If no characters are specified, blank characters will be<br />
trimmed. When trimming takes place, <strong>the</strong> resultant variable size may be shorter than it was initially. LR-<br />
exp=expression nn=number cs=char string loc=location len=length lim=delimiter
9.30 <strong>PPL</strong>: Modification of Character Variables<br />
PAD is often done on an LRTRIM result. The second argument is optional. If it is used, it limits <strong>the</strong><br />
number of characters that are trimmed.<br />
UPPER (exp)<br />
converts a character value <strong>to</strong> uppercase characters:<br />
[ SET State = UPPER ( State ) ]<br />
VARNAME (exp)<br />
yields a character value that is <strong>the</strong> name of <strong>the</strong> variable in <strong>the</strong> expression:<br />
[ GEN Primary.Disease:C = .M. ;<br />
DO #J USING Heart TO Skin ;<br />
IF V(#J) EQ 1 AND Primary.Disease EQ .M. ,<br />
SET Primary.Disease = VARNAME ( #J );<br />
ENDDO ]<br />
The new character variable, Primary.Disease, will have values of missing, unless any of <strong>the</strong> variables<br />
Heart through Skin has a value of 1. Then Primary.Disease will have <strong>the</strong> name of <strong>the</strong> first of those variables<br />
as its value.<br />
VERIFY (exp, 'cs', 'cs')<br />
yields a numeric value which is <strong>the</strong> location of <strong>the</strong> first character in <strong>the</strong> initial argument which is NOT<br />
found in any of <strong>the</strong> remaining arguments. Thus, <strong>the</strong> presence of only specified characters may be<br />
verified:<br />
[ GEN Error =<br />
VERIFY ( Char.<strong>Inc</strong>ome, '0123456789', ' $.,' ]<br />
Arguments 2 and 3 could have been combined in<strong>to</strong> a single argument.<br />
<strong>PPL</strong> Functions: Character and Numeric<br />
The following functions operate on ei<strong>the</strong>r character or numeric variable lists. However, numeric and<br />
character variables may not be combined in one list. The list may reference variables by name or position.<br />
The functions operate on character variables in <strong>the</strong> same manner that <strong>the</strong>y operate on numeric<br />
variables.<br />
COUNT.GOOD (vnp, vnp)<br />
gives <strong>the</strong> number of non-missing values of <strong>the</strong> variables specified in <strong>the</strong> list. Only variable names or positions<br />
may be included in <strong>the</strong> list.<br />
FIRST.GOOD (vnp, vnp)<br />
gives <strong>the</strong> first good (non-missing) value of <strong>the</strong> variables specified in <strong>the</strong> list. Only variable names or positions<br />
may be included in <strong>the</strong> list.<br />
LAST.GOOD (vnp, vnp)<br />
gives <strong>the</strong> last good (non-missing) value of <strong>the</strong> variables specified in <strong>the</strong> list. Only variable names or positions<br />
may be included in <strong>the</strong> list.<br />
<strong>PPL</strong> Opera<strong>to</strong>rs: Character<br />
Concatenation opera<strong>to</strong>rs are used <strong>to</strong> combine character values or expressions.<br />
loc=location len=length lim=delimiter exp=expression nn=number cs=char string
<strong>PPL</strong>: Modification of Character Variables 9.31<br />
MODIFY List<br />
[ GEN Mail.Name:C36;<br />
IF Sex EQ 1, SET Mail.Name =<br />
'Mr. ' // First.Name /// Last.Name ;<br />
IF Sex EQ 2, SET Mail.Name =<br />
'Ms. ' // First.Name /// Last.Name ], OUT Mail $<br />
// concatenation<br />
connects <strong>the</strong> character strings before and after <strong>the</strong> double slashes:<br />
[ GEN Name:C32 = First.Name // ' ' // Last.Name ]<br />
The double slash opera<strong>to</strong>r abuts <strong>the</strong> character strings end-<strong>to</strong>-end. Blank portions of each field are<br />
included:<br />
Jennifer Smith<br />
/// trim concatenation<br />
concatenation connects <strong>the</strong> two character strings after trimming leading and trailing blanks.<br />
[ GEN Name:C32 = First.Name /// Last.Name ]<br />
The triple slash opera<strong>to</strong>r abuts <strong>the</strong> trimmed character strings end-<strong>to</strong>-end and <strong>the</strong>n inserts a blank between<br />
<strong>the</strong> strings:<br />
Jennifer Smith<br />
&& dynamic concatenation of character constants<br />
character constants can be dynamically concatenated in <strong>the</strong> command language and <strong>PPL</strong> by using <strong>the</strong> &&<br />
opera<strong>to</strong>r.<br />
[ GEN Cvar:c130 = && "bbb" && 'ccc' ]<br />
<strong>PPL</strong> Opera<strong>to</strong>rs: Logical<br />
In general, <strong>the</strong> following logical opera<strong>to</strong>rs evaluate two expressions. The expressions may be variables,<br />
values and functions, except for:<br />
• AMONG and NOTAMONG, whose arguments are lists of values and variables,<br />
• GOOD and MISSING, which do not have arguments, and<br />
• MATCHES, which has a character string argument.<br />
All of <strong>the</strong>se logical opera<strong>to</strong>rs are appropriate for character data. AMONG, NOTAMONG, GOOD and<br />
MISSING are also appropriate for numeric data. Numeric and character expressions may not be mixed<br />
in one argument list. Character constants must be enclosed in quotes.<br />
AMONG (list of values and variables)<br />
is true, false or missing depending on whe<strong>the</strong>r a value is one of <strong>the</strong> specified values:<br />
[ IF State AMONG ( 'NJ', 'N.J.', 'New Jersey' ), RETAIN ]<br />
or in <strong>the</strong> specified range:<br />
[ IF Name AMONG ('A' TO 'FZZ' ), RETAIN ]<br />
exp=expression nn=number cs=char string loc=location len=length lim=delimiter
9.32 <strong>PPL</strong>: Modification of Character Variables<br />
<strong>Inc</strong>lusion in <strong>the</strong> range is based on <strong>the</strong> sort order of <strong>the</strong> character strings, which may differ among<br />
computers.<br />
XAMONG (list of values and variables)<br />
specifies case-respecting comparisons — like AMONG in all o<strong>the</strong>r aspects.<br />
CONTAINS 'cs' or exp<br />
is true, false or missing, depending on whe<strong>the</strong>r <strong>the</strong> character value argument is present:<br />
[ IF Address CONTAINS '08540' , RETAIN ]<br />
[ IF Address CONTAINS TRIM( Zip ), RETAIN ]<br />
In <strong>the</strong> first example, cases with values of Address containing “08540” are retained; in <strong>the</strong> second, cases<br />
in which <strong>the</strong> Zip characters are also present in Address are retained.<br />
XCONTAINS 'cs' or exp<br />
specifies case-respecting evaluations — like CONTAINS in all o<strong>the</strong>r aspects.<br />
XEQ exactly EQ<br />
GOOD<br />
tests whe<strong>the</strong>r two character expressions are exactly equal in both specific characters and case.<br />
[ IF Initials XEQ 'JW', SET Name = 'Jim Wolf, Sr.' ]<br />
[ IF Initials XEQ 'jw', SET Name = 'Jim Wolf, Jr.' ]<br />
The opera<strong>to</strong>rs XNE, XLT, XLE, XGT and XGE are similar — case and characters must be identical in<br />
string comparisons.<br />
is true or false depending on whe<strong>the</strong>r <strong>the</strong> value is present (good) or missing. GOOD combines = and .G. :<br />
[ IF Address GOOD , RETAIN ] or<br />
[ IF Address = .G. , RETAIN ]<br />
MATCHES 'cs'<br />
is true, false or missing, depending on whe<strong>the</strong>r <strong>the</strong> character string argument matches <strong>the</strong> value of a character<br />
variable. The case of <strong>the</strong> characters is not significant. The character string argument may include<br />
meta-characters that define or limit matches:<br />
[ IF Food MATCHES '*beef*', RETAIN ]<br />
In this example, <strong>the</strong> meta-character “*” is a wildcard that matches zero or more occurrences of any character.<br />
Thus, any cases in which <strong>the</strong> variable Food contains <strong>the</strong> string “beef” are continued. Values of<br />
Food such as <strong>the</strong> following are considered matches:<br />
Beef Roast Beef Beefsteak<br />
Some of <strong>the</strong> meta-characters that may be used in <strong>the</strong> character string argument are:<br />
* zero or more of any character @ zero or more blanks<br />
? a single character # a single digit<br />
_ a single blank $ a single letter<br />
\# a literal character (<strong>the</strong> #) a literal character (<strong>the</strong> *)<br />
(abc) a literal string (ab|bc) ab or bc<br />
abc same as (abc) [abc] a or b or c<br />
[a-z] a single letter in this range [$] same as [a-z]<br />
loc=location len=length lim=delimiter exp=expression nn=number cs=char string
<strong>PPL</strong>: Modification of Character Variables 9.33<br />
[0-9] a single number in this range [#] same as [0-9]<br />
[ _ ] a single blank [^$] a single character that is not a letter<br />
[#]11 a single match [#]01 zero or one matches<br />
[#]0+ zero or more matches [#]1+ one or more matches<br />
XMATCHES 'cs'<br />
MISSING<br />
specifies case-respecting matches — like MATCHES in all o<strong>the</strong>r aspects.<br />
is true or false depending on whe<strong>the</strong>r <strong>the</strong> value is present (good) or missing. MISSING combines = and<br />
.M. :<br />
[ IF Address MISSING , DELETE ] or<br />
[ IF Address EQ .M. , DELETE ]<br />
NOTAMONG (list of values and variables)<br />
is true, false or missing depending on whe<strong>the</strong>r a value is not among <strong>the</strong> specified values:<br />
[ IF State NOTAMONG<br />
( 'NJ', 'N.J.', 'New Jersey' ), DELETE ]<br />
or not in <strong>the</strong> specified range:<br />
[ IF Name NOTAMONG ( 'A' TO 'FZZ' ), DELETE )<br />
XNOTAMONG (list of values and variables)<br />
specifies case-respecting comparisons — like NOTAMONG in all o<strong>the</strong>r aspects.<br />
exp=expression nn=number cs=char string loc=location len=length lim=delimiter
10<br />
<strong>PPL</strong>: Date and Time<br />
Commands and Functions<br />
The first section of this chapter describes <strong>the</strong> default format of date values, and describes <strong>the</strong> extensive set of date<br />
and time functions such as ADD.DAYS. The second section describes eight commands that may be used <strong>to</strong><br />
change <strong>the</strong> default ordering and appearance of new date values. The third section describes <strong>the</strong> six date-related<br />
logical opera<strong>to</strong>rs in <strong>PPL</strong> that compare dates. The final section contains complete details on <strong>the</strong> FORMAT.DATE<br />
function which is used <strong>to</strong> provide templates that describe exactly how a date should appear in <strong>the</strong> prin<strong>to</strong>ut.<br />
10.1 DATE ANDTIME FUNCTIONS<br />
A date value is not a special datatype, it is simply a P-<strong>STAT</strong> character value that contains a 4-digit year from 1753<br />
<strong>to</strong> 2999, a month value and a day from 1 <strong>to</strong> 31. It may have time in hh:mm:ss or hh:mm form. The seconds may<br />
have up <strong>to</strong> 3 places, like 12:13:14.567 . It may also have <strong>the</strong> day of <strong>the</strong> week.<br />
Most date functions read an input date, do something <strong>to</strong> it, and write a date result, formatted in <strong>the</strong> same way<br />
as <strong>the</strong> input value. By formatting, we mean <strong>the</strong> ordering of <strong>the</strong> fields within <strong>the</strong> date value, such as ‘jan 1 1992’<br />
or ‘1992 January 1’ or such.<br />
A function like CURRENT.DATE has no input <strong>to</strong> serve as a format for <strong>the</strong> output, so it uses <strong>the</strong> default format.<br />
A P-<strong>STAT</strong> run begins with <strong>the</strong> default format looking like<br />
'Tues Jan 1, 2002 18:52:04' .<br />
Note <strong>the</strong> size: <strong>the</strong>se formats can use 30 or more characters.<br />
This default appearance can be changed by <strong>the</strong> DATE.ORDER command, and by several o<strong>the</strong>r commands<br />
that control things like a month name appearing as Jan or jan or January, etc. FORMAT.DATE, described in <strong>the</strong><br />
final section is a general and powerful function for specifying <strong>the</strong> exact appearance <strong>to</strong> be used when including<br />
dates in <strong>the</strong> printed output.<br />
10.2 Functions Which create or Use Dates<br />
1. DAY.MONTH.YEAR creates date from integer or character argument.<br />
2. DAY.YEAR.MONTH creates date from integer or character argument.<br />
3. MONTH.DAY.YEAR creates date from integer or character argument.<br />
4. MONTH.YEAR.DAY creates date from integer or character argument.<br />
5. YEAR.DAY.MONTH creates date from integer or character argument.<br />
6. YEAR.MONTH.DAY creates date from integer or character argument.<br />
7. MAKE.DATE creates a date from numeric input.<br />
8. CURRENT.DATE provides <strong>to</strong>day’s date and time.<br />
9. REFORMAT.DATE changes <strong>the</strong> format of a date value.<br />
10. <strong>STAT</strong>US.DATE shows if a date is valid, if it has time, etc.
10.2 <strong>PPL</strong>:Date and Time Commands and Functions<br />
11. DAYS returns days since 1/1/1753 for a date.<br />
12. SECONDS returns seconds since 1/1/1753 for a date.<br />
13. SECONDS.MIDNIGHT returns seconds since midnight for a date.<br />
14. UNDO.DAYS reverses <strong>the</strong> DAYS function.<br />
15. UNDO.SECONDS reverses <strong>the</strong> SECONDS function.<br />
16. FISCAL.YEAR returns <strong>the</strong> fiscal year of a date.<br />
17. FISCAL.QUARTER returns <strong>the</strong> fiscal quarter of a date.<br />
18. QUARTER returns <strong>the</strong> calendar quarter of a date.<br />
19. DAY.WITHIN.WEEK returns 1 <strong>to</strong> 7, <strong>the</strong> day within a week..<br />
20. DAY.WITHIN.YEAR returns 1 <strong>to</strong> 366, <strong>the</strong> day within a year<br />
21. WEEK.WITHIN YEAR retursn 0 <strong>to</strong> 53, <strong>the</strong> week withi<br />
22. ADD.MONTHS add some months <strong>to</strong> a date.<br />
23. ADD.DAYS add some days <strong>to</strong> a date.<br />
24. ADD.HOURS add some hours <strong>to</strong> a date.<br />
25. ADD.MINUTES add some minutes <strong>to</strong> a date.<br />
26. ADD.SECONDS add some seconds <strong>to</strong> a date.<br />
27. SUBTRACT.YEARS subtract some years from a date.<br />
28. SUBTRACT.MONTHS subtract some months from a date.<br />
29. SUBTRACT.DAYS subtract some days from a date.<br />
30. SUBTRACT.HOURS subtract some hours from a date.<br />
31. SUBTRACT.MINUTES subtract some minutes from a date.<br />
32. SUBTRACT.SECONDS subtract some seconds from a date.<br />
33. EXTRACT.YEARS return numeric years from a date.<br />
34. EXTRACT.MONTHS return numeric months from a date.<br />
35. EXTRACT.DAYS return numeric days from a date.<br />
36. EXTRACT.HOURS return numeric hours from a date.<br />
37. EXTRACT.MINUTES return numeric minutes from a date.<br />
38. EXTRACT.SECONDS return numeric seconds from a date.<br />
39. EXTRACT.CC return 2-digit numeric century from a date. (19 from 1983)<br />
40. EXTRACT.YY return 2-digit numeric year within century. (83 from 1983)<br />
41. EXTRACT.DATE return a copy of <strong>the</strong> input, dropping time.<br />
42. EXTRACT.TIME return a copy of <strong>the</strong> input, dropping date.
<strong>PPL</strong>: Date and Time Commands and Functions 10.3<br />
43. EXTRACT WEEKDAY return <strong>the</strong> weekday name.<br />
44. CHANGE.YEARS change <strong>the</strong> years field in a date.<br />
45. CHANGE.MONTHS change <strong>the</strong> months field in a date.<br />
46. CHANGE.DAYS change <strong>the</strong> days field in a date.<br />
47. CHANGE.HOURS change <strong>the</strong> hours field in a date.<br />
48. CHANGE.MINUTES change <strong>the</strong> minutes field in a date.<br />
49. CHANGE.SECONDS change <strong>the</strong> seconds field in a date.<br />
50. DIF.YEARS difference between 2 dates in years.<br />
51. DIF.MONTHS difference between 2 dates in months.<br />
52. DIF.DAYS difference between 2 dates in days.<br />
53. DIF.HOURS difference between 2 dates in hours.<br />
54. DIF.MINUTES difference between 2 dates in minutes.<br />
55. DIF.SECONDS difference between 2 dates in seconds.<br />
10.3 Six Simple Date Functions<br />
The 6 simple date functions make a character date value from ei<strong>the</strong>r numeric input like 12252005 or<br />
20051225, or from character input like ’12/25/2005’. The order of <strong>the</strong> three segments should be consistent<br />
with <strong>the</strong> function name. In o<strong>the</strong>r words, if 12252005 is meant <strong>to</strong> be month 12, day 25, and year<br />
2005, <strong>the</strong> MONTH.DAY.YEAR function should be chosen.<br />
For example:<br />
MONTH.DAY.YEAR ( 12252005 ) = ’Sun Dec 25, 2005’<br />
MONTH.DAY.YEAR ( ’12--25--2005’ ) = ’Sun Dec 25, 2005’<br />
A numeric argument can be an integer of 3 <strong>to</strong> 8 digits. If 3, 4 or 5 digits, lead zeros are assumed <strong>to</strong><br />
bring it up <strong>to</strong> 6 <strong>to</strong>tal digits, and <strong>the</strong> yy form of year is assumed. In that case, <strong>the</strong> default is <strong>to</strong> assume<br />
20yy. Thus, year.month.day( 225) produces Friday Feb 25, 2000.<br />
If 7 digits, one lead zero is assumed, and <strong>the</strong> yyyy form of year is assumed.<br />
A character argument should contain ei<strong>the</strong>r one or three integers, like ’12252005’ or ’12/25/2005’.<br />
If just one integer, it is treated like <strong>the</strong> numeric input.<br />
Non-digits are treated as separa<strong>to</strong>rs. Thus, ’***12abc25///2005 ’ will produce 12, 25 and 2005.<br />
A second argument may be supplied <strong>to</strong> specify <strong>the</strong> century <strong>to</strong> be used for 2-digit years. Centuries from<br />
1700 through 2900 are allowed, shown by values of 17 throught 29 , or by 1700, 1800, 1900, etc.<br />
PUT ( DAY.MONTH.YEAR ( 10042006 )) $ or<br />
PUT ( DAY.MONTH.YEAR ( "4/10/2006" )) $<br />
produces “Mon April 10, 2006”<br />
PUT ( DAY.YEAR.MONTH ( 10200604 )) $<br />
also produces Mon April 10, 2006.<br />
PUT ( MONTH.YEAR.DAY ( 4200610 )) $
10.4 <strong>PPL</strong>:Date and Time Commands and Functions<br />
You can forget <strong>the</strong> lead zero and <strong>the</strong> function can figure out what you mean. But note in <strong>the</strong> first 2<br />
examples above <strong>the</strong> 0 in 04 for <strong>the</strong> day is necessary. The following 3 examples also produce <strong>the</strong> same<br />
result.<br />
PUT ( MONTH.YEAR.DAY ( 4102006 )) $<br />
PUT ( YEAR.DAY.MONTH ( 20060410 )) $<br />
PUT ( YEAR.MONTH.DAY ( 20061004 )) $<br />
10.4 DATE and TIME function details.<br />
1. MAKE.DATE (year, month, day ) >>> date<br />
MAKE.DATE (year, month, day, hour, minute, second) >>> date<br />
MAKE.DATE (year, month, day, hms, 'mask' ) >>> date<br />
MAKE.DATE (ymd, 'mask' ) >>> date<br />
MAKE.DATE (ymd, 'mask', hour, minute, second) >>> date<br />
MAKE.DATE (ymd, 'mask', hms, 'mask' ) >>> date<br />
This makes a character date value from numeric values. The function must have 2 or 3 date arguments,<br />
and may also have 2 or 3 time arguments. These examples show <strong>the</strong> input as constants, but<br />
<strong>the</strong>y can be variables or expressions of any complexity.<br />
The result is a character date, formatted in <strong>the</strong> current default format. P-<strong>STAT</strong> starts a run with <strong>the</strong><br />
default format set <strong>to</strong> <strong>the</strong> following template:<br />
'Tues Jan 1, 2002 18:52:04' .<br />
The date can be provided using separate arguments for year, month and day. Alternatively, those three<br />
values can be compressed in<strong>to</strong> one integer, like 19971225, followed by a mask or template in quotes<br />
<strong>to</strong> show how <strong>to</strong> parse <strong>the</strong> compressed integer.<br />
Time is provided similarly. However, seconds may have a fractional part of up <strong>to</strong> three places; in that<br />
case <strong>the</strong> three argument form must be used. Some examples:<br />
MAKE.DATE ( 1997, 12, 25 ) = 'Thurs Dec 25, 1997'<br />
MAKE.DATE ( 1997, 12, 25,<br />
23, 59, 59 ) = 'Thurs Dec 25, 1997 23:59:59'<br />
A mask like ‘yyyymmdd’ is used when year, month and day are combined in<strong>to</strong> one integer, like<br />
19971225.<br />
A mask like ‘hhmmss’ is used when hour, minute and second are combined in<strong>to</strong> one integer, like<br />
235959.<br />
MAKE.DATE ( 12251997, 'mmddyyyy',<br />
235959, 'hhmmss' )<br />
= 'Thurs Dec 25, 1997 23:59:59'<br />
If <strong>the</strong> year mask in a template has yy (ra<strong>the</strong>r than yyyy), <strong>the</strong> yy may be preceded by a 2-digit century<br />
from 17 <strong>to</strong> 29; if not, 20 is assumed. For example:<br />
MAKE.DATE ( 122595, 'mmdd19yy' )<br />
MAKE.DATE ( 122595, 'mmddyy' )<br />
The first would produce 'Mon Dec 25, 1995'.<br />
The second would produce 'Sun Dec 25, 2095'.<br />
2. CURRENT.DATE () >>> date<br />
Make a character date value from <strong>the</strong> current date and time. The current default format is used <strong>to</strong> construct<br />
<strong>the</strong> result.
<strong>PPL</strong>: Date and Time Commands and Functions 10.5<br />
CURRENT.DATE () = ‘Sun June 23, 2002 12:26:49’<br />
The empty paren<strong>the</strong>ses are needed <strong>to</strong> persuade <strong>the</strong> <strong>PPL</strong> processor that this is indeed a function.<br />
3. REFORMAT.DATE ( date1 ) >>> date<br />
REFORMAT.DATE ( date1, date2 ) >>> date<br />
Take a date input and reformat it. If a second argument is supplied, it will be used as a formatting template.<br />
If not, <strong>the</strong> current default format is used.<br />
REFORMAT.DATE ( 'June 23, 2002' ) = 'Sun June 23, 2002'<br />
Several points about <strong>the</strong> above example:<br />
(1) There was no second argument, <strong>the</strong>refore <strong>the</strong> current default format would be used which, unless<br />
changed earlier in <strong>the</strong> run, would produce <strong>the</strong> above result.<br />
(2) The default has day-of-week, <strong>the</strong>refore this result gets day-of-week also.<br />
(3) The default has time, but this input does not have time, so none is put in<strong>to</strong> <strong>the</strong> result.<br />
REFORMAT.DATE ( 'JUNE 23, 2002', '1997, dec, 25' ) =<br />
'2002, June, 23'<br />
Note in <strong>the</strong> above that <strong>the</strong> use of ‘dec’ was used for <strong>the</strong> ordering of elements, but did not change <strong>the</strong><br />
naming style. The MONTH.LENGTH and similar commands can be used <strong>to</strong> change how names are<br />
written.<br />
4. <strong>STAT</strong>US.DATE ( date ) >>> integer<br />
Take a date input and produce a numeric result, from -3 <strong>to</strong> 2, which indicates <strong>the</strong> usability of <strong>the</strong> input.<br />
<strong>STAT</strong>US.DATE ( 'jan 1, 1997 12:13:14' ) = 2<br />
<strong>STAT</strong>US.DATE ( 'jan 1, 1997 ' ) = 1<br />
<strong>STAT</strong>US.DATE ( 'jan 1 ' ) = 0<br />
<strong>STAT</strong>US.DATE ( .M1. ) = -1<br />
<strong>STAT</strong>US.DATE ( .M2. ) = -2<br />
<strong>STAT</strong>US.DATE ( .M3. ) = -3<br />
A result of 2 means <strong>the</strong> input is a valid date value which also contains time.<br />
A result of 1 means <strong>the</strong> input is a valid date value but does not contain time.<br />
A result of 0 means <strong>the</strong> input is not missing, but is none<strong>the</strong>less not a valid date value (missing year).<br />
A result of -1, -2 or -3 means <strong>the</strong> imput is missing: -1 is missing 1, etc.<br />
5. DAYS ( date ) >>> integer<br />
The DAYS function takes an input date value and produces <strong>the</strong> number of days in that value since Jan<br />
1, 1753. Time, if <strong>the</strong>re, is ignored. The result of <strong>the</strong> DAYS function can be used <strong>to</strong> sort on <strong>the</strong> date,<br />
with no concern about time within date.<br />
DAYS ( 'jan 1, 1753' ) = 1<br />
DAYS ( 'jan 1, 2002 12:13:14' ) = 90,946<br />
There is an older form of DAYS function that has two arguments, a 6 or 8 digit year-month-day integer<br />
like 19981225, and a mask like ‘yyyymmdd’. This form, which returned days since jan 1,1900, is<br />
being de-documented but will be supported for some years.<br />
The new form is recognized by its having just 1 argument.
10.6 <strong>PPL</strong>:Date and Time Commands and Functions<br />
6. SECONDS ( date ) >>> number<br />
The SECONDS function takes an input date value and produces <strong>the</strong> number of seconds since 00:00:00<br />
on Jan 1, 1753. If <strong>the</strong> input lacks a time field, <strong>the</strong> result is missing. The result of <strong>the</strong> SECONDS function<br />
can be used <strong>to</strong> sort on time within date.<br />
SECONDS ( 'jan 1, 2002 12:13:14' ) = 7,857,691,994<br />
7. SECONDS.MIDNIGHT ( date ) >>> number<br />
The SECONDS.MIDNIGHT function takes <strong>the</strong> time element of an input date value and produces <strong>the</strong><br />
number of seconds since midnight. The date element is ignored. If time is not <strong>the</strong>re, <strong>the</strong> result is missing.<br />
The result of <strong>the</strong> SECONDS.MIDNIGHT function can be used <strong>to</strong> sort on <strong>the</strong> time, with no<br />
concern about <strong>the</strong> date.<br />
SECONDS.MIDNIGHT ( 'jan 1, 2002 00:00:00' ) = 0<br />
SECONDS.MIDNIGHT ( 'jan 1, 2002 12:13:14' ) = 43,994<br />
SECONDS.MIDNIGHT ( 'jan 1, 2002 23:59:59.12' ) = 86,399.12<br />
8. UNDO.DAYS ( number ) >>> date<br />
The UNDO.DAYS function takes <strong>the</strong> result of <strong>the</strong> DAYS function and re-creates <strong>the</strong> date. Note, only<br />
<strong>the</strong> date is recovered, since time information is not carried in <strong>the</strong> DAYS result.<br />
UNDO.DAYS ( 90946 ) = 'Tues Jan 1, 2002'<br />
UNDO.DAYS ( DAYS('jan 1 2002 12:13:14') = 'Tues Jan 1, 2002'<br />
9. UNDO.SECONDS ( number ) >>> date<br />
The UNDO.SECONDS function takes <strong>the</strong> result of <strong>the</strong> SECONDS function and re-creates <strong>the</strong> date<br />
and time.<br />
UNDO.SECONDS ( 7857691994 ) = 'Tues Jan 1, 2002 12:13:14'<br />
10. FISCAL.YEAR ( date, integer ) >>> integer<br />
The FISCAL.YEAR function requires a second argument: <strong>the</strong> ending month of <strong>the</strong> fiscal year. This<br />
can be 1 through 12, but is usually 6, 9 or 12:<br />
6 for fiscal years ending on June 30.<br />
9 for <strong>the</strong> Sept 30 fiscal year end (U.S.Government).<br />
12 for a Dec 31 ending of a calendar year.<br />
FISCAL.YEAR ( 'sept 15,2001', 9 ) = 2001<br />
FISCAL.YEAR ( 'oct 15,2001', 9 ) = 2002<br />
FISCAL.YEAR ( 'dec 31,2001', 6 ) = 2002<br />
FISCAL.YEAR ( 'dec 31,2001', 12 ) = 2001<br />
11. FISCAL.QUARTER ( date, integer ) >>> integer<br />
The FISCAL.QUARTER function requires a second argument: <strong>the</strong> ending month of <strong>the</strong> fiscal year.<br />
This can be 1 through 12, but is usually 6, 9 or 12:<br />
6 for fiscal years ending on June 30.<br />
9 for <strong>the</strong> Sept 30 fiscal year end (U.S.Government).<br />
12 for a Dec 31 ending of a calendar year.<br />
FISCAL.QUARTER ( 'jan 10,2001', 6 ) = 3<br />
FISCAL.QUARTER ( 'jan 10,2001', 9 ) = 2<br />
FISCAL.QUARTER ( 'jan 10,2001', 12 ) = 1
<strong>PPL</strong>: Date and Time Commands and Functions 10.7<br />
12. QUARTER ( date ) >>> integer<br />
The QUARTER function returns <strong>the</strong> calendar year quarter; it is <strong>the</strong> same as FISCAL.QUARTER with<br />
a second argument of twelve.<br />
QUARTER ( 'jan 10,2001' ) = 1<br />
QUARTER ( 'dec 10,2001' ) = 4<br />
13. DAY.WITHIN.WEEK ( date ) >>> integer<br />
The DAY.WITHIN.WEEK function returns an integer from 1 <strong>to</strong> 7. The default is for <strong>the</strong> week <strong>to</strong> begin<br />
on Monday, so that a Monday returns 1, Tuesday 2, and Sunday 7. If a weekday name in quotes<br />
is given as a second argument, that day will be treated as day 1 in <strong>the</strong> function.<br />
Note, Dec 25,2002 is a Wednesday.<br />
DAY.WITHIN.WEEK ( 'dec 25, 2002' ) = 3<br />
DAY.WITHIN.WEEK ( 'dec 25, 2002', 'Sunday' ) = 4<br />
DAY.WITHIN.WEEK ( 'dec 25, 2002', 'sat' ) = 5<br />
14. DAY.WITHIN.YEAR ( date ) >>> integer<br />
The DAY.WITHIN.YEAR function returns an integer from 1 <strong>to</strong> 366. January 1 is always 1, and December<br />
31 will return 365 in non-leap years, and 366 in leap years.<br />
DAY.WITHIN.YEAR ( 'jan 11, 2001' ) = 11<br />
DAY.WITHIN.YEAR ( 'feb 11, 2001' ) = 42<br />
DAY.WITHIN.YEAR ( 'dec 25, 2001' ) = 359<br />
DAY.WITHIN.YEAR ( 'dec 25, 2004' ) = 360<br />
15 WEEK.WITHIN.YEAR ( data, integer) >>> integer<br />
This function returns <strong>the</strong> week number within <strong>the</strong> year for <strong>the</strong> supplied date. The range can be 0 <strong>to</strong> 53,<br />
depending on <strong>the</strong> date and on <strong>the</strong> calculation method.<br />
There are two methods for determining what constitutes week one of a given year.<br />
The first method is simple: <strong>the</strong> first week goes from Jan 1 through Jan 7. This can be called an AB-<br />
SOLUTE week.<br />
The second method makes use of a calendar week, which is defined by ISO 8061 as going from Monday<br />
through Sunday.<br />
The first week is <strong>the</strong> first CALENDAR week that contains a sufficient number of days within <strong>the</strong> current<br />
year. Sufficient can be set <strong>to</strong> 1 through 7; <strong>the</strong> ISO standard is 4. This function assumes a Mon-<br />
Sun calendar week; a different calendar week can be given in <strong>the</strong> function. The arguments are:<br />
1. A character date value, variable or expression like ’Jan 4,2004’.<br />
2. An integer constant from 0 <strong>to</strong> 7. This selects <strong>the</strong> method <strong>to</strong> be used <strong>to</strong> define <strong>the</strong> first week of <strong>the</strong><br />
year.<br />
0: This uses <strong>the</strong> absolute week. The first week is Jan 1 through Jan 7. The result can be from<br />
1 <strong>to</strong> 53. The third argument, if provided, is ignored.<br />
1-7: These use <strong>the</strong> calendar week. The 1 <strong>to</strong> 7 specify <strong>the</strong> minimum number of days needed <strong>to</strong><br />
constitute an acceptable first week.<br />
For example, suppose <strong>the</strong> calendar week is Mon-Sun and Jan 4 is a Sunday. Is an initial 4-day<br />
week sufficient <strong>to</strong> be used as week 1 ? If this argument is 1 <strong>to</strong> 4, yes. If insufficient, <strong>the</strong> partial<br />
week becomes week 0, and <strong>the</strong> next calendar week is week 1.
10.8 <strong>PPL</strong>:Date and Time Commands and Functions<br />
The ISO 8061 standard is 4, which <strong>the</strong>refore accepts <strong>the</strong> first calendar week that has a majority<br />
of its days in <strong>the</strong> current year.<br />
Using 7 would cause <strong>the</strong> first full calendar week <strong>to</strong> be week 1.<br />
3. An optional character constant which contains <strong>the</strong> starting day of <strong>the</strong> calendar week <strong>to</strong> be used<br />
instead of <strong>the</strong> default Monday <strong>to</strong> Sunday week. This can be a full name like ’Tuesday’, or an<br />
abbreviation like ’Wed’.<br />
***********************************<br />
* examples using ABSOLUTE weeks *<br />
***********************************<br />
Week.within.year ( ’jan 3 2004’, 0 ) = 1<br />
Week.within.year ( ’jan 5 2004’, 0 ) = 1<br />
Week.within.year ( ’jan 8 2004’, 0 ) = 2<br />
Week.within.year ( ’dec 31 2004’, 0 ) = 53<br />
************************************<br />
* examples using <strong>the</strong> default *<br />
* Monday <strong>to</strong> Sunday calendar week *<br />
************************************<br />
mon tue wed thu fri sat sun<br />
1 2 3 4<br />
5 6 7 8 9 10 11<br />
12 13 14 15 16 17 18<br />
Week.within.year ( ’jan 5 2004’, 1 ) = 2<br />
Week.within.year ( ’jan 3 2004’, 4 ) = 1<br />
Week.within.year ( ’jan 5 2004’, 4 ) = 2<br />
Week.within.year ( ’jan 5 2004’, 4 ) = 2<br />
Week.within.year ( ’jan 5 2004’, 7 ) = 1<br />
**************************************<br />
* examples using an alternative *<br />
* Sunday <strong>to</strong> Saturday calendar week *<br />
**************************************<br />
sun mon tue wed thu fri sat<br />
1 2 3<br />
4 5 6 7 8 9 10<br />
11 12 13 14 15 16 17<br />
Week.within.year ( ’jan 3 2004’, 1, ’sun’ ) = 1<br />
Week.within.year ( ’jan 3 2004’, 4, ’sun’ ) = 0<br />
Week.within.year ( ’jan 5 2004’, 4, ’sun’ ) = 1<br />
Week.within.year ( ’jan 1 2004’, 7, ’sun’ ) = 0<br />
Week.within.year ( ’jan 2 2004’, 7, ’sun’ ) = 0<br />
Week.within.year ( ’jan 3 2004’, 7, ’sun’ ) = 0<br />
Week.within.year ( ’jan 4 2004’, 7, ’sun’ ) = 1
<strong>PPL</strong>: Date and Time Commands and Functions 10.9<br />
16. ADD.YEARS ( date, 1 <strong>to</strong> 6 numbers ) >>> date<br />
17. ADD.MONTHS ( date, 1 <strong>to</strong> 5 numbers ) >>> date<br />
18. ADD.DAYS ( date, 1 <strong>to</strong> 4 numbers ) >>> date<br />
19. ADD.HOURS ( date, 1 <strong>to</strong> 3 numbers ) >>> date<br />
20. ADD.MINUTES ( date, 1 <strong>to</strong> 2 numbers ) >>> date<br />
21. ADD.SECONDS ( date, 1 number ) >>> date<br />
22. SUBTRACT.YEARS ( date, 1 <strong>to</strong> 6 numbers ) >>> date<br />
23. SUBTRACT.MONTHS ( date, 1 <strong>to</strong> 5 numbers ) >>> date<br />
24. SUBTRACT.DAYS ( date, 1 <strong>to</strong> 4 numbers ) >>> date<br />
25. SUBTRACT.HOURS ( date, 1 <strong>to</strong> 3 numbers ) >>> date<br />
26. SUBTRACT.MINUTES ( date, 1 <strong>to</strong> 2 numbers ) >>> date<br />
27. SUBTRACT.SECONDS ( date, 1 number ) >>> date<br />
Each of <strong>the</strong>se has a date value as its first argument, followed by one or more date/time amounts <strong>to</strong> be<br />
added or subtracted.<br />
The ADD.YEARS function, for example, treats <strong>the</strong> required second argument as <strong>the</strong> number of years<br />
<strong>to</strong> be added; months, days, hours, minutes and seconds can also be supplied. For example:<br />
ADD.YEARS ( 'jan 1, 1991', 3 ) = 'Jan 1, 1994'<br />
ADD.YEARS ( 'jan 1, 1991', 3, 1 ) = 'Feb 1, 1994'<br />
ADD.YEARS ( 'jan 1, 1991', 3, 1,10 ) = 'Feb 11, 1994'<br />
Since <strong>the</strong> function in <strong>the</strong> above 3 examples was ADD.YEARS, <strong>the</strong> initial element (i.e., argument two)<br />
is years. If yet ano<strong>the</strong>r argument follows, it is treated as months, and so on.<br />
ADD.YEARS ( 'jan 1, 1991 10:10:10', 1,2,3,4,5,6)<br />
= 'March 4, 1992 14:15:16'<br />
The above adds 1 year, 2 months, 3 days, 4 hours, 5 minutes and 6 seconds <strong>to</strong> jan 1,1991 at 10:10:10.<br />
A subtract of <strong>the</strong> same amount could be done:<br />
SUBTRACT.YEARS ( 'march 4, 1992 14:15:16', 1,2,3,4,5,6)<br />
= 'Jan 1, 1991 10:10:10'<br />
Some additional examples using scratch variable ##d, which is set <strong>to</strong> ‘jan 1 1991 10:10:10’ for <strong>the</strong><br />
function input:<br />
ADD.DAYS ( ##d, 100 ) = 'April 11 1991 10:10:10'<br />
ADD.DAYS ( ##d, 1000 ) = 'Sept 27 1993 10:10:10'<br />
ADD.DAYS ( ##d, 1000,0,0,1 ) = 'Sept 27 1993 10:10:11'<br />
ADD.DAYS ( ##d, 1000,0,0,1.5) = 'Sept 27 1993 10:10:11.5'<br />
ADD.MINUTES ( ##d, 1000 ) = 'Jan 2 1991 02:50:10'<br />
ADD.MINUTES ( ##d, -20 ) = missing, invalid argument<br />
ADD.MINUTES ( ##d, 1,2,3 ) = error, <strong>to</strong>o many arguments<br />
These functions process <strong>the</strong> years field first, <strong>the</strong>n <strong>the</strong> months field (which could fur<strong>the</strong>r change <strong>the</strong><br />
years field), and so on.
10.10 <strong>PPL</strong>:Date and Time Commands and Functions<br />
*********************************<br />
* limitations in using *<br />
* ADD.YEARS SUBTRACT.YEARS *<br />
* ADD.MONTHS SUBTRACT.MONTHS *<br />
*********************************<br />
These 4 functions are of limited usefulness because <strong>the</strong>y can quite easily produce an invalid date,<br />
which causes <strong>the</strong> function <strong>to</strong> issue a missing result.<br />
For example, adding one year <strong>to</strong> feb 29,1992 would produce feb 29 in 1993, which is invalid because<br />
1993 was not a leap year. Similarly, adding 1 month <strong>to</strong> aug 31,2001 produces sept 31,2001, invalid<br />
because september hath but 30 days.<br />
These functions first check <strong>the</strong> date validity after processing year and month; it is checked again after<br />
any additional elements have been processed.<br />
The o<strong>the</strong>r 8 functions in this group (ADD.DAYS and such) all produce sensible, reversible results.<br />
28. EXTRACT.YEARS ( date ) >>> integer, 1753 <strong>to</strong> 2999<br />
29. EXTRACT.MONTHS ( date ) >>> integer, 1 <strong>to</strong> 12<br />
30. EXTRACT.DAYS ( date ) >>> integer, 1 <strong>to</strong> 31<br />
31. EXTRACT.HOURS ( date ) >>> integer, 0 <strong>to</strong> 23<br />
32. EXTRACT.MINUTES ( date ) >>> integer, 0 <strong>to</strong> 59<br />
33. EXTRACT.SECONDS ( date ) >>> integer, 0 <strong>to</strong> 59<br />
34. EXTRACT.CC ( date ) >>> integer, 17 <strong>to</strong> 29<br />
35. EXTRACT.YY ( date ) >>> integer, 0 <strong>to</strong> 99<br />
36. EXTRACT.DATE ( date ) >>> character date value<br />
37. EXTRACT.TIME ( date ) >>> character time value<br />
38. EXTRACT.WEEKDAY ( date ) >>> character weekday name<br />
EXTRACT.YEARS ( 'jan 5 1991 10:11:12' ) = 1991<br />
EXTRACT.MONTHS ( 'jan 5 1991 10:11:12' ) = 1<br />
EXTRACT.DAYS ( 'jan 5 1991 10:11:12' ) = 5<br />
EXTRACT.HOURS ( 'jan 5 1991 10:11:12' ) = 10<br />
EXTRACT.MINUTES( 'jan 5 1991 10:11:12' ) = 11<br />
EXTRACT.SECONDS( 'jan 5 1991 10:11:12' ) = 12<br />
EXTRACT.CC ( 'jan 5 1991 10:11:12' ) = 19 (century)<br />
EXTRACT.YY ( 'jan 5 1991 10:11:12' ) = 91<br />
EXTRACT.DATE ( 'jan 5 1991 10:11:12' ) = 'jan 5 1991'<br />
EXTRACT.TIME ( 'jan 5 1991 10:11:12' ) = '10:11:12'<br />
EXTRACT.WEEKDAY( 'jan 5 1991 10:11:12' ) = 'Sat'<br />
The result of extract.date will contain <strong>the</strong> day of week only if <strong>the</strong> input argument has day-of-week.
<strong>PPL</strong>: Date and Time Commands and Functions 10.11<br />
39. CHANGE.YEARS ( date, 1 <strong>to</strong> 6 numbers ) >>> date<br />
40. CHARGE.MONTHS ( date, 1 <strong>to</strong> 5 numbers ) >>> date<br />
41. CHANGE.DAYS ( date, 1 <strong>to</strong> 4 numbers ) >>> date<br />
42. CHANGE.HOURS ( date, 1 <strong>to</strong> 3 numbers ) >>> date<br />
43. CHANGE.MINUTES ( date, 1 <strong>to</strong> 2 numbers ) >>> date<br />
44. CHANGE.SECONDS ( date, 1 number ) >>> date<br />
These six functions are used <strong>to</strong> change specific elements within a date-time value without affecting<br />
<strong>the</strong> o<strong>the</strong>r elements of <strong>the</strong> value.<br />
Each of <strong>the</strong>se has a date value as its first argument, and <strong>the</strong>n one or more date or time elements as<br />
subsequent arguments.<br />
In <strong>the</strong> CHANGE.MONTHS function, for example, <strong>the</strong> argument after <strong>the</strong> input date must be an integer<br />
from 1 <strong>to</strong> 12. This provides <strong>the</strong> changed month element <strong>to</strong> be placed in<strong>to</strong> <strong>the</strong> function result. A<br />
third argument, if given, would <strong>the</strong>n be treated as a days element, and so forth.<br />
In <strong>the</strong>se examples, we assume that character scratch variable ##d has been set <strong>to</strong>:<br />
'jan 1 1991 10:10:10'.<br />
CHANGE.YEARS ( ##d, 1992 ) = 'Jan 1 1992 10:10:10'<br />
CHANGE.MONTHS ( ##d, 2 ) = 'Feb 1 1991 10:10:10'<br />
CHANGE.DAYS ( ##d, 8 ) = 'Jan 8 1991 10:10:10'<br />
CHANGE.HOURS ( ##d, 11 ) = 'Jan 1 1991 11:10:10'<br />
CHANGE.MINUTES( ##d, 12 ) = 'Jan 1 1991 10:12:10'<br />
CHANGE.SECONDS( ##d, 13 ) = 'Jan 1 1991 10:10:13'<br />
As with functions like ADD.YEARS, additional arguments can be supplied <strong>to</strong> change several fields<br />
at once.<br />
CHANGE.YEARS ( ##d, 1992, 2, 8, 11, 12, 13 ) =<br />
'Feb 8 1992 11:12:13'<br />
CHANGE.HOURS (##d, 11, 12, 13) = 'Jan 1 1991 11:12:13'.<br />
CHANGE.DAYS (##d, 8, 11 ) = 'Jan 8 1991 11:10:10'.<br />
The above change.hours example has three values after <strong>the</strong> input date. Since <strong>the</strong> function is<br />
change.hours,<br />
argument 2 (11) is treated as an HOURS change,<br />
argument 3( 12) is treated as a MINUTES change, and<br />
argument 4( 13) is treated as a SECONDS change.<br />
The above change.days has two arguments after <strong>the</strong> input date; <strong>the</strong>se are treated as days (because of<br />
<strong>the</strong> function name) and hours (<strong>the</strong> next time element after days).<br />
CHANGE.DAYS ( 'Jan 1 1991', 3, 21, 22, 23 ) =<br />
'Jan 3 1991 21:22:23'.<br />
If <strong>the</strong> function has values for hours, minutes and seconds, <strong>the</strong>y are placed in <strong>the</strong> result even when <strong>the</strong><br />
input did not have any time fields.
10.12 <strong>PPL</strong>:Date and Time Commands and Functions<br />
45. DIF.YEARS ( date, date ) >>> number<br />
46. DIF.MONTHS ( date, date ) >>> number<br />
47. DIF.DAYS ( date, date ) >>> number<br />
48. DIF.HOURS ( date, date ) >>> number<br />
49. DIF.MINUTES ( date, date ) >>> number<br />
50. DIF.SECONDS ( date, date ) >>> number<br />
The first two arguments are <strong>the</strong> date values being compared. It does not matter which is <strong>the</strong> first argument,<br />
i.e.,<br />
DIF.DAYS( date1, date2 ) = DIF.DAYS( date2, date1 ).<br />
An optional third argument can be used <strong>to</strong> limit <strong>the</strong> calculation; using 2, for example, causes only <strong>the</strong><br />
first two elements, years and months, <strong>to</strong> be looked at.<br />
DIF.YEARS ( 'jan 1,1992', 'feb 3,1993' ) = 1.090411<br />
DIF.YEARS ( 'jan 1,1992', 'feb 3,1993', 1 ) = 1.<br />
DIF.MONTHS( 'jan 1,1992', 'feb 3,1993' ) = 13.071429<br />
DIF.MONTHS( 'jan 1,1992', 'feb 3,1993', 2 ) = 13.<br />
DIF.DAYS ( 'jan 1,1992', 'feb 3,1993' ) = 399.<br />
DIF.DAYS ( 'jan 1,1992 12:00:00',<br />
'feb 3,1993 15:00:00' ) = 399.125<br />
DIF.HOURS ( 'jan 1,1992 12:00:00',<br />
'feb 3,1993 15:00:00' ) = 9,579.<br />
DIF.MINUTES('jan 1,1992 12:00:00',<br />
'feb 3,1993 15:00:00' ) = 574,740.<br />
DIF.SECONDS('jan 1,1992 12:00:00',<br />
'feb 3,1993 15:00:00' ) = 34,484,400.<br />
DIF.SECONDS('jan 1,1992 12:00:00.2',<br />
'feb 3,1993 15:00:00' ) = 34,484,399.8<br />
***********************************************<br />
* NOTE: DIF.YEARS and DIF.MONTHS are both *<br />
* counting time elements of varying lengths *<br />
***********************************************<br />
Since years can have differing lengths ( 365 or 366 days), and months are even worse ( 28 or 29 or 30<br />
or 31 days), <strong>the</strong> dif.years and dif.months functions produce results which reflect <strong>the</strong> somewhat arbitrary<br />
choices on how <strong>to</strong> compute <strong>the</strong>m.<br />
DIF.YEARS ( 'feb 4,1992', 'mar 7,1993' ) = 1.0849315<br />
DIF.MONTHS( 'feb 4,1992', 'mar 7,1993' ) = 13.0967742<br />
In <strong>the</strong> above dif.years example, <strong>the</strong>re is one full year from feb 4,1992 <strong>to</strong> feb 4,1993, and <strong>the</strong>n 31 more<br />
days <strong>to</strong> march 7,1993. The fractional year is <strong>the</strong>n 31/365, which is 0.0849315. The 365 is <strong>the</strong> distance<br />
from feb 4,1993 <strong>to</strong> <strong>the</strong> end of <strong>the</strong> next full year, feb 4,1994.<br />
If <strong>the</strong> earlier date is a feb 29, one day is subtracted from both dates <strong>to</strong> simplify <strong>the</strong> calculations.<br />
In <strong>the</strong> above dif.months example, <strong>the</strong>re are 13 full months from feb 4,1992 <strong>to</strong> march 4,1993. The fractional<br />
part is 3/31, or 0.0967742. The 3 is <strong>the</strong> distance from march 4 <strong>to</strong> march 7, and <strong>the</strong> 31 is <strong>the</strong><br />
distance from march 4 <strong>to</strong> april 4, <strong>the</strong> end of <strong>the</strong> next full month.
<strong>PPL</strong>: Date and Time Commands and Functions 10.13<br />
If <strong>the</strong> day of <strong>the</strong> month of <strong>the</strong> earlier date is more than 28, from one <strong>to</strong> three days are subtracted from<br />
both dates <strong>to</strong> simplify <strong>the</strong> calculations.<br />
These two functions could well be coded in a different manner that gives slightly different results in<br />
<strong>the</strong> fractional part. The coding and results of DIF.DAYS, DIF.HOURS, DIF.MINUTES and<br />
DIF.SECONDS, on <strong>the</strong> o<strong>the</strong>r hand, are straightforward.<br />
*******************************************<br />
* using <strong>the</strong> third argument: *<br />
* doing DIF.YEARS, DIF.DAYS, etc. *<br />
* on just <strong>the</strong> initial parts of <strong>the</strong> date *<br />
*******************************************<br />
A third argument may be supplied: it is <strong>the</strong> extent of <strong>the</strong> year-month-day-hour-minute-second fields<br />
that should be used in computing <strong>the</strong> difference. The fields beyond that level will be ignored.<br />
DIF.YEARS ( 'feb 4,1992', 'mar 7,1993', 1 ) = 1.<br />
DIF.YEARS ( 'feb 4,1992', 'mar 7,1993', 2 ) = 1.0833333<br />
DIF.YEARS ( 'feb 4,1992', 'mar 7,1993', 3 ) = 1.0849315<br />
The limit of 1 in DIF.YEARS nullifies all but <strong>the</strong> year field in <strong>the</strong> two arguments. Therefore, <strong>the</strong> difference<br />
between 1992 and 1993 is simply 1.<br />
The limit of 2 in DIF.YEARS nullifies all but <strong>the</strong> year and month fields in <strong>the</strong> two arguments. Therefore,<br />
<strong>the</strong> difference is calculated between feb 1992 and mar 1993. The result is 1 1/12 years.<br />
The limit of 3 in DIF.YEARS nullifies all but <strong>the</strong> year, month and day fields in <strong>the</strong> two arguments. As<br />
it happens, <strong>the</strong> arguments did not have time fields so using <strong>the</strong> limit had no effect.<br />
The result is 1 plus 32/366, <strong>the</strong> 32 because <strong>the</strong> days field is being used.<br />
*********************************************<br />
* doing DIF.YEARS, DIF.MONTHS or DIF.DAYS *<br />
* while ignoring <strong>the</strong> time fields *<br />
*********************************************<br />
Suppose you are using DIF.YEARS, DIF.MONTHS or DIF.DAYS and have no interest in <strong>the</strong> time<br />
fields of <strong>the</strong> arguments. If <strong>the</strong> arguments lack time fields anyhow, <strong>the</strong>re is obviously no problem, but<br />
suppose some do and some don’t ?<br />
The default for a function like DIF.DAYS is <strong>to</strong> use all available fields, so if one argument has a time<br />
field and <strong>the</strong> o<strong>the</strong>r does not, <strong>the</strong> result will be set <strong>to</strong> missing. You could use<br />
DIF.DAYS ( extract.date( arg1 ), extract.date( arg2) ).<br />
However, using a limit of 3 does <strong>the</strong> same thing.<br />
DIF.DAYS ( arg1, arg2, 3).<br />
10.5 DATE AND TIME COMMANDS<br />
A P-<strong>STAT</strong> run begins with <strong>the</strong> date language set <strong>to</strong> English. Therefore, date values being read are expected <strong>to</strong> have<br />
English month and weekday names, and date values being created will be given English names. Also, names being<br />
written will be capitalized and abbreviated, like Jan or Tues.<br />
If a function takes an input date value and creates a resulting date value, <strong>the</strong> result will be ordered in <strong>the</strong> same way<br />
as <strong>the</strong> input. In o<strong>the</strong>r words, if <strong>the</strong> input starts with <strong>the</strong> monthname, so will <strong>the</strong> output.<br />
If <strong>the</strong>re is no input <strong>to</strong> be used as a format, a default ordering that looks like ‘Wed Aug 7, 2002 10:15:22’ is used.<br />
The following eight commands may be used <strong>to</strong> change <strong>the</strong> default language, ordering and name style of dates.
10.14 <strong>PPL</strong>:Date and Time Commands and Functions<br />
DATE.LANGUAGE changes <strong>the</strong> language of month and weekday names. English and German<br />
are supported.<br />
DATE.ORDER changes <strong>the</strong> default format ordering.<br />
MONTH.CASE changes <strong>the</strong> case of month names. Uppercase, lowercase and capitalized are<br />
supported.<br />
WEEKDAY.CASE same for weekday names.<br />
MONTH.LENGTH changes <strong>the</strong> length of month names. Full length or abbreviated are<br />
supported.<br />
WEEKDAY.LENGTH same for weekday names.<br />
MONTH.NAMES provides month name abbreviations <strong>to</strong> be used.<br />
WEEKDAY.NAMES provides weekday name abbreviations <strong>to</strong> be used.<br />
10.6 The DATE.LANGUAGE Command<br />
P-<strong>STAT</strong> carries full and abbreviated month and weekday names in both English and German. The default language<br />
is English.<br />
DATE.LANGUAGE GERMAN $<br />
would switch <strong>the</strong> active language <strong>to</strong> German.<br />
A function like ADD.DAYS ( ‘Oct 10, 1992’, 1 ) will compare OCT <strong>to</strong> <strong>the</strong> full month names of <strong>the</strong> currently<br />
active language, and accept a full match or <strong>the</strong> best partial match. It will use <strong>the</strong> abbreviated names or, if requested,<br />
<strong>the</strong> full names of <strong>the</strong> current language <strong>to</strong> construct a result date.<br />
The default English month name abbreviations are:<br />
jan feb march april may june july aug sept oct nov dec.<br />
The default English weekday name abbreviations are:<br />
mon tues wed thurs fri sat sun.<br />
The default German month name abbreviations are:<br />
jan feb marz apr mai juni juli aug sept okt nov dez.<br />
The default German weekday name abbreviations are:<br />
mo di mi do fr sa so.<br />
When reading a date in German, both SAMSTAG and SONNABEND are recognized as Saturday, but what<br />
of abbreviations like SO, SON or SONN? They are all accepted as SONNTAG (Sunday).<br />
10.7 The DATE.ORDER Command<br />
DATE.ORDER '2 june 2002 (sun) 12:12:12' $<br />
The DATE.ORDER command changes <strong>the</strong> default ordering for a date <strong>to</strong> <strong>the</strong> order shown in <strong>the</strong> command.<br />
Blanks, dashes, commas, slashes and paren<strong>the</strong>ses may be freely used <strong>to</strong> create a particular date appearance.<br />
When one of <strong>the</strong> date functions writes a date value, <strong>the</strong> components of <strong>the</strong> value will be written in a certain<br />
ORDER. The order determines things like where <strong>the</strong> year should be, if <strong>the</strong> weekday name should be included,<br />
and if time should be included.<br />
Also, <strong>the</strong> value will be written in a certain style. Style consists of language (English or German), case (MAY<br />
or may or May), and length (Jan or January). For example,<br />
’Wed Aug 7, 2002 10:05:26’
<strong>PPL</strong>: Date and Time Commands and Functions 10.15<br />
is an ordering that consists of weekday, month, day, comma, year and time, with blanks as shown. The names are<br />
abbreviated and capitalized (first letter uppercase, <strong>the</strong> rest lowercase). This is, in fact, <strong>the</strong> default date format.<br />
The default date order is used only when <strong>the</strong>re is nothing else <strong>to</strong> use. If a date function has an input date, like<br />
ADD.DAYS, <strong>the</strong> result will have <strong>the</strong> same ordering as <strong>the</strong> input. However, <strong>the</strong> naming style of <strong>the</strong> input can be<br />
ambiguous: is May an abbreviation or a full month name? Therefore, <strong>the</strong> default style (case, length and language)<br />
is used for names. If a date function does not have an input date, like CURRENT.DATE(), default ordering and<br />
style are used.<br />
Therefore, using<br />
1. The default ordering is: 'Tues Jan 1, 2002 12:34:56'.<br />
2. The default style is: English, abbreviated, capitalized.<br />
PUT (CURRENT.DATE())$<br />
would produce something like:<br />
Wed June 2, 2002 14:01:19.<br />
The time field can be omitted from dates, as can <strong>the</strong> weekday name.<br />
DATE.ORDER 'june 2 (mon) 1999' $<br />
Here, since time is not included, functions that do not have a character date input <strong>to</strong> use as an output template will<br />
write a date output that does not include time.<br />
10.8 Changing <strong>the</strong> Case and Length of names<br />
The default for both month names and weekday names is capitalized and abbreviated (ie.e, Jan or Tues). There<br />
are 2 commands which affect <strong>the</strong> case of names as <strong>the</strong>y are written. MONTH.CASE affects month names,<br />
WEEKDAY.CASE affects weekday names.<br />
MONTH.CASE upper $<br />
WEEKDAY.CASE capitalized $<br />
UPPER causes names <strong>to</strong> be entirely in upper case. LOWER causes names <strong>to</strong> be entirely in lower case. CAP-<br />
ITALIZED causes names <strong>to</strong> have <strong>the</strong> initial letter in upper case, and <strong>the</strong> rest in lower case.<br />
The following commands affect <strong>the</strong> length of names as <strong>the</strong>y are written. FULL causes names <strong>to</strong> be written<br />
in <strong>the</strong>ir entirety: January. ABBREVIATED causes names <strong>to</strong> be written in a short form: Jan.<br />
MONTH.LENGTH FULL $<br />
WEEKDAY.LENGTH ABBREVIATED $<br />
10.9 Month and Weekday Names<br />
There are 2 commands which can be used <strong>to</strong> alter <strong>the</strong> default abbreviations: MONTH.NAMES and WEEK-<br />
DAY.NAMES. These commands override <strong>the</strong> default abbreviations. They must, however <strong>the</strong>mselves be<br />
abbreviations of <strong>the</strong> current full names. MONTH.NAMES requires 12 arguments and WEEKDAY.NAMES requires<br />
7 arguments.<br />
MONTH.NAMES jan feb mar apr may jun jul aug sep oct nov dec $<br />
WEEKDAY.NAMES mon tue wed thu fri sat sun $<br />
The names can each be quoted or unquoted, or <strong>the</strong> entire set of names can be in one quoted string.<br />
WEEKDAY.NAMES mon 'tue' wed thu fri 'sat' sun $<br />
WEEKDAY.NAMES 'mon tue wed thu fri sat sun' $
10.16 <strong>PPL</strong>:Date and Time Commands and Functions<br />
__________________________________________________________________________<br />
Figure 10.1 DATE Logical Opera<strong>to</strong>rs<br />
Test Values<br />
date1 = ’jan 12,1991 12:01:00’<br />
date2 = ’may 23,1991 12:08:00’<br />
date3 = ’may 23,1991 12:08:00’<br />
date4 = ’may 23,1991 22:08:00’<br />
date5 = ’may 23,1991 ’<br />
Tests using logical opera<strong>to</strong>rs<br />
The tests The Result<br />
[ if date1 DATE.GT date2, false ]<br />
[ if date1 AFTER date2, false, same as DATE.GT ]<br />
[ if date3 DATE.GE date2, true ]<br />
[ if date1 DATE.EQ date2, false ]<br />
[ if date1 DATE.LE date2, true ]<br />
[ if date1 DATE.LT date2, true ]<br />
[ if date1 BEFORE date2, true, same as DATE.LT ]<br />
[ if date1 DATE.EQ date5, false ]<br />
[ if date4 DATE.EQ date5, missing ]<br />
[ if extract.date(date4) DATE.EQ date5, true ]<br />
__________________________________________________________________________<br />
10.10 DATE LOGICAL OPERATORS<br />
There are 6 logical opera<strong>to</strong>rs that can be used <strong>to</strong> compare date values. They are:<br />
1. DATE.GT (AFTER can also be used)<br />
2. DATE.GE<br />
3. DATE.EQ<br />
4. DATE.NE<br />
5. DATE.LE<br />
6. DATE.LT (BEFORE can also be used)<br />
Each examines two date values, which can be expressions. A date value MUST have a date field (year-monthday)<br />
and MAY have a time field (hour-minute-second). These date and time fields are treated in <strong>the</strong> date compares<br />
as if <strong>the</strong>y were two separate BY variables in a sort.<br />
If <strong>the</strong> two date values differ at <strong>the</strong> year-month-day level, <strong>the</strong>re is no need <strong>to</strong> look at time, so it doesn’t matter<br />
if one value has a time field and <strong>the</strong> o<strong>the</strong>r does not. However, if <strong>the</strong> two year-month-day fields are <strong>the</strong> same, what<br />
happens if one value has a time field and <strong>the</strong> o<strong>the</strong>r does not?<br />
1. If nei<strong>the</strong>r has time, <strong>the</strong> result is equal.
<strong>PPL</strong>: Date and Time Commands and Functions 10.17<br />
2. If both have time, <strong>the</strong> times are compared, yielding a result.<br />
3. If one has time and <strong>the</strong> o<strong>the</strong>r doesn’t, <strong>the</strong> result is missing<br />
The final three examples in Figure 10.1 deal with TIME issues.<br />
[ if date1 DATE.EQ date5, false ]<br />
[ if date4 DATE.EQ date5, missing ]<br />
[ if extract.date(date4) DATE.EQ date5, true ]<br />
When we compare date1 with date5, <strong>the</strong> year-month-day values differ, so we can get a FALSE result even though<br />
one has time and <strong>the</strong> o<strong>the</strong>r does not. Date4 and date5, however, do not differ on year-month-day. If nei<strong>the</strong>r had<br />
time, <strong>the</strong> result would be equal, but since one has time and <strong>the</strong> o<strong>the</strong>r does not, <strong>the</strong> result is missing.<br />
Using EXTRACT.DATE gets rid of <strong>the</strong> time field in date4, so <strong>the</strong> compare with timeless date5 produces a<br />
non-missing result.<br />
10.11 FORMAT.DATE<br />
FORMAT.DATE is a date/dime function that provides considerable flexibility in formatting a date-time value.<br />
It has two arguments: <strong>the</strong> character value <strong>to</strong> be formatted, and <strong>the</strong> format <strong>to</strong> be used for it. A P-<strong>STAT</strong> date/<br />
time value is an ordinary variable, often sized character*40, that holds date time information. Creating date-time<br />
variables was covered in considerable detail in <strong>the</strong> early parts of this chapter. This section describes how <strong>to</strong> print<br />
this information in <strong>the</strong> formats that you prefer.<br />
PUT ( Current.date ( ) )$ results in something like<br />
Mon Oct 24, 2011 11:21:36<br />
The current.date function has no arguments. It produces <strong>the</strong> current date and time in <strong>the</strong> default form. FOR-<br />
MAT.DATE expects <strong>the</strong> initial argument <strong>to</strong> hold a value in a similar format. A format consists of format specifiers<br />
(like dd for days) and separa<strong>to</strong>r characters (like :). The format determines which date/time elements are separa<strong>to</strong>r<br />
characters.<br />
The format will often be provided by a character constant within <strong>the</strong> function. It can, however, be placed in<br />
a permanent character scratch variable, as in FORMAT.DATE ( ddd, ##someformat ) . Blanks are significant.<br />
Given aug 28, 2011,<br />
'yyyymmdd' produces 20110828 .<br />
'yyyy mm dd' produces 2011 08 28<br />
The caret (^) will not be placed in <strong>the</strong> result, and can <strong>the</strong>refore be used <strong>to</strong> make a format more readable.<br />
'yyyy^mm^dd' does <strong>the</strong> same thing as 'yyyymmdd'.<br />
Any o<strong>the</strong>r character is copied as is, such as <strong>the</strong> : in hh:mm:ss or <strong>the</strong> / in mm/dd/yyyy.<br />
yyyy year, in 4-digit form, like 2011.<br />
yy year, in 2-digit form, like 11.<br />
month month, full name, like september.<br />
mon month, abbreviated name, like sept.<br />
n.month month, 1 <strong>to</strong> 12, ie, numeric month.<br />
mm month, 1 <strong>to</strong> 12 if usage is clear,<br />
like yy/mm/dd. same as n.month .<br />
dd day, 1 <strong>to</strong> 31.<br />
hh hour, 0 <strong>to</strong> 23.<br />
n.minute minute, 0 <strong>to</strong> 59. ie, numeric minute.<br />
mm minute, 0 <strong>to</strong> 59 if usage is clear,<br />
like hh:mm:ss. same as n.minute .<br />
ss second, 0 <strong>to</strong> 59, can have up <strong>to</strong> 3 places,<br />
like 34.178 .
10.18 <strong>PPL</strong>:Date and Time Commands and Functions<br />
ord ordinal, <strong>the</strong> day within year, 1 <strong>to</strong> 366.<br />
jjj (for julian) does <strong>the</strong> same.<br />
Ordinal has become <strong>the</strong> accepted name.<br />
day.of.week weekday, full name, like monday.<br />
dow weekday, abbreviation, like mon.<br />
am puts hours in 1-12 form, and <strong>the</strong>n uses<br />
am, pm, noon, midnight <strong>to</strong> clarify.<br />
These are placed where <strong>the</strong> 'am' was found.<br />
a.m. same thing, but uses a.m. and p.m. .<br />
date causes mm/dd/yyyy <strong>to</strong> be used.<br />
time causes hh:mm:ss <strong>to</strong> be used.<br />
The default is <strong>to</strong> show hours in 0 <strong>to</strong> 23 form, which is sometimes called military time. The format specifier 'am'<br />
causes hours <strong>to</strong> appear in 1 <strong>to</strong> 12 form, along with one of am, pm, noon, and midnight. The am (or pm, etc) is<br />
placed where <strong>the</strong> specifier was. Using a specifier of a.m. causes a.m. (or p.m.) <strong>to</strong> be used instead.<br />
Examples of converting 24-hour <strong>to</strong> 12-hour mode.<br />
00:00:00 becomes 12:00:00 midnight.<br />
00:00:01 becomes 12:00:01 am.<br />
01:00:00 becomes 01:00:00 am.<br />
12:00:00 becomes 12:00:00 noon.<br />
12:00:01 becomes 12:00:01 pm.<br />
13:00:00 becomes 01:00:00 pm.<br />
The case used for names like Monday in <strong>the</strong> result is controlled by <strong>the</strong> case of <strong>the</strong> format word that was used.<br />
Using day.of.week will get 'monday'. Using Day.of.week will get 'Monday'. Using DAY.OF.WEEK will get<br />
'MONDAY'. This is done for full and abbreviated month names, full and abbreviated weekday names, and for a.m.<br />
and am. Lead zeros are printed, except for days when month is a name. Consider Jan 2, 1995 5:06:07.<br />
'date time' gets 01/02/1995 05:06:07 .<br />
However<br />
'Month dd, yyyy' gets January 2, 1995.<br />
__________________________________________________________________________<br />
Figure 10.2 FORMAT.DATE<br />
MAKE work1, VARS year month day hour min sec;<br />
1995 3 1 10 13 15<br />
2004 2 9 21 22 23 $<br />
LIST work1 [ GENERATE dt1:c40 TO MAKE.DATE<br />
(year, month, day, hour, min, sec ) ]<br />
[ GENERATE dt2:c40 TO FORMAT.DATE<br />
( dt1, 'yyyy-mm-dd time a.m. dow' ) ]<br />
[ KEEP dt1 dt2 ] $<br />
dt1 dt2<br />
Wed March 1, 1995 10:13:15 1995-03-01 10:13:15 a.m. wed<br />
Mon Feb 9, 2004 21:22:23 2004-02-09 09:22:23 p.m. mon<br />
__________________________________________________________________________
<strong>PPL</strong>: Date and Time Commands and Functions 10.19<br />
The first step in using P-<strong>STAT</strong>’s date routines is <strong>to</strong> s<strong>to</strong>re <strong>the</strong> date in date variable format.<br />
The second step if <strong>to</strong> provide one or more date templates <strong>to</strong> use <strong>the</strong>n <strong>the</strong> date is printed. Here are four different<br />
date templates and <strong>the</strong> resulting character string s<strong>to</strong>red in variable “this.date” for ##FMT1 and ##FMT3<br />
__________________________________________________________________________<br />
Figure 10.3 FORMAT.DATE Example<br />
GEN ##DAT1:c = DAY.MONTH.YEAR ( 13042012 )<br />
##FMT1:c = 'Month-dd-yyyy' April-13-2012<br />
##FMT2.c = 'mon dd yy’ april 13 12<br />
##FMT3.c = 'dd/n.month/yy’ 13/05/12<br />
##FMT4.c = 'Dow Mon dd yyyy; Fri April 13, 2012<br />
GEN ##this.date:c = FORMAT.DATE ( ##dat1, ##FMT1 ) $<br />
PUT ##this.date $<br />
APRIL-13-2012<br />
GEN ##this.date:c = FORMAT.DATE ( ##dat1, ##FMT3 ) $<br />
PUT ##this.date $<br />
13/04/12<br />
__________________________________________________________________________
10.20 <strong>PPL</strong>:Date and Time Commands and Functions<br />
DATE AND TIME FUNCTIONS<br />
DAY.MONTH.YEAR nn or “cs”<br />
converts an integer or character argument day.month.year order <strong>to</strong> a character date.<br />
DAY.YEAR.MONTH nn or “cs”<br />
converts an integer or character argument in day.year.month order <strong>to</strong> a character date.<br />
MONTH.DAY.YEAR nn or “cs”<br />
converts an integer or character argument in month.day.year order <strong>to</strong> a character date.<br />
MONTH.YEAR.DAY nn or “cs”<br />
converts an integer or character argument in month.year.day order <strong>to</strong> a character date.<br />
YEAR.DAY.MONTH nn or “cs”<br />
converts an integer or character argument in year.day.month order <strong>to</strong> a character date.<br />
YEAR.MONTH.DAY nn or “cs”<br />
MAKE.DATE<br />
converts an integer or character argument in year.month.day order <strong>to</strong> a character date.<br />
creates a date from numeric input.<br />
SUMMARY<br />
MAKE.DATE (year, month, day ) >>> date<br />
MAKE.DATE (year, month, day, hour, minute, second) >>> date<br />
MAKE.DATE (year, month, day, hms, ‘mask’ ) >>> date<br />
MAKE.DATE (ymd, ‘mask’ ) >>> date<br />
MAKE.DATE (ymd, ‘mask’, hour, minute, second) >>> date<br />
MAKE.DATE (ymd, ‘mask’, hms, ‘mask’ ) >>> date<br />
CURRENT.DATE<br />
provides <strong>to</strong>day’s date and time.<br />
REFORMAT.DATE ( ddd, ddd )<br />
changes <strong>the</strong> format of a date value. If <strong>the</strong> second argument is supplied it is used as a formatting template.<br />
REFORMAT.DATE ( date1, date2 ) >>> date<br />
REFORMAT.DATE ( ‘June 23, 2002’ ) >>> date<br />
<strong>STAT</strong>US.DATE ( ddd )<br />
shows if a date is valid, if it has time, etc. Produces a number from 2 <strong>to</strong> -3 which indicates <strong>the</strong> useability<br />
of <strong>the</strong> date. 2 indicates both date and time. 1 indicates date only. 0 indicates invalid date value. -1, -2,<br />
and -3 indicate missing values.<br />
DAYS ( ddd )<br />
returns days since 1/1/1753 for a date.<br />
nn=number nopt=optional number ddd=date variable copt=optional char constant
<strong>PPL</strong>: Date and Time Commands and Functions 10.21<br />
SECONDS ( ddd )<br />
returns seconds since 1/1/1753 for a date.<br />
SECONDS.MIDNIGHT ( ddd )<br />
returns seconds since midnight for a date.<br />
UNDO.DAYS ( nn )<br />
reverses <strong>the</strong> DAYS function.<br />
UNDO.SECONDS ( nn )<br />
reverses <strong>the</strong> SECONDS function.<br />
FISCAL.YEAR ( ddd, nn )<br />
returns <strong>the</strong> fiscal year of a date.<br />
FISCAL.QUARTER ( ddd, nn )<br />
returns <strong>the</strong> fiscal quarter of a date.<br />
QUARTER ( ddd )<br />
returns <strong>the</strong> calendar quarter of a date.<br />
DAY WITHIN.WEEK ( ddd, 'name' )<br />
returns an integer from 1 <strong>to</strong> 7. Name is an optional weekday name such as ‘Sunday’.<br />
DAY.WITHIN.YEAR ( ddd )<br />
returns 1 <strong>to</strong> 366, <strong>the</strong> day within a year.<br />
WEEK.WITHIN YEAR ( ddd, nn, occ )<br />
ADD.YEARS ( ddd, nn, nopt, nopt, nopt, nopt, nopt )<br />
add some years <strong>to</strong> a date.<br />
ADD.MONTHS ( ddd, nn, nopt, nopt, nopt, nopt )<br />
add some months <strong>to</strong> a date.<br />
ADD.DAYS ( ddd, nn, nopt, nopt, nopt )<br />
add some days <strong>to</strong> a date.<br />
ADD.HOURS ( ddd, nn, nopt, nopt )<br />
add some hours <strong>to</strong> a date.<br />
ADD.MINUTES ( ddd, nn, nopt )<br />
add some minutes <strong>to</strong> a date.<br />
ADD.SECONDS ( ddd, nn )<br />
add some seconds <strong>to</strong> a date.<br />
SUBTRACT.YEARS ( ddd, nn, nopt, nopt, nopt, nopt, nopt )<br />
subtract some years from a date.<br />
ddd=date variable copt=optional char constant nn=number nopt=optional number
SUBTRACT.MONTHS ( ddd, nn, nopt, nopt, nopt, nopt )<br />
subtract some months from a date.<br />
SUBTRACT.DAYS ( ddd, nn, nopt, nopt, nopt )<br />
subtract some days from a date.<br />
SUBTRACT.HOURS ( ddd, nn, nopt, nopt )<br />
subtract some hours from a date.<br />
SUBTRACT.MINUTES ( ddd, nn, nopt )<br />
subtract some minutes from a date.<br />
SUBTRACT.SECONDS ( ddd, nn )<br />
subtract some seconds from a date.<br />
EXTRACT.YEARS ( ddd )<br />
return numeric years from a date.<br />
EXTRACT.MONTHS ( ddd )<br />
return numeric months from a date.<br />
EXTRACT.DAYS ( ddd )<br />
return numeric days from a date.<br />
EXTRACT.HOURS ( ddd )<br />
return numeric hours from a date.<br />
EXTRACT.MINUTES ( ddd )<br />
return numeric minutes from a date.<br />
EXTRACT.SECONDS ( ddd )<br />
return numeric seconds from a date.<br />
EXTRACT.CC ( ddd )<br />
return 2-digit numeric century from a date.<br />
EXTRACT.YY ( ddd )<br />
return 2-digit numeric year from a date.<br />
EXTRACT.DATE ( ddd )<br />
make a copy of <strong>the</strong> input, dropping time.<br />
EXTRACT.TIME ( ddd )<br />
make a copy of <strong>the</strong> input, dropping date.<br />
EXTRACT.WEEKDAY ( ddd )<br />
return <strong>the</strong> character weekday name.<br />
CHANGE.YEARS ( ddd, nn, nopt, nopt, nopt, nopt, nopt )<br />
change <strong>the</strong> years field in a date.
<strong>PPL</strong>: Date and Time Commands and Functions 10.23<br />
CHANGE.MONTHS ( ddd, nn, nopt, nopt, nopt, nopt )<br />
change <strong>the</strong> months field in a date.<br />
CHANGE.DAYS ( ddd, nn, nopt, nopt, nopt )<br />
change <strong>the</strong> days field in a date.<br />
CHANGE.HOURS ( ddd, nn, nopt, nopt )<br />
change <strong>the</strong> hours field in a date.<br />
CHANGE.MINUTES ( ddd, nn, nopt )<br />
change <strong>the</strong> minutes field in a date.<br />
CHANGE.SECONDS ( ddd, nn )<br />
change <strong>the</strong> seconds field in a date.<br />
DIF.YEARS ( ddd, ddd, nn )<br />
difference between 2 dates in years. The optional numeric argument can be used <strong>to</strong> limit <strong>the</strong> elements of<br />
<strong>the</strong> data that are <strong>to</strong> be looked at, thus a 2 means use just years and months<br />
DIF.MONTHS ( ddd, ddd, nopt )<br />
difference between 2 dates in months. The optional numeric argument can be used <strong>to</strong> limit <strong>the</strong> elements<br />
of <strong>the</strong> data that are <strong>to</strong> be looked at.<br />
DIF.DAYS ( ddd, ddd, nopt )<br />
difference between 2 dates in days. The optional numeric argument can be used <strong>to</strong> limit <strong>the</strong> elements of<br />
<strong>the</strong> data that are <strong>to</strong> be looked at.<br />
DIF.HOURS ( ddd, ddd, nopt )<br />
difference between 2 dates in hours. The optional numeric argument can be used <strong>to</strong> limit <strong>the</strong> elements<br />
of <strong>the</strong> data that are <strong>to</strong> be looked at.<br />
DIF.MINUTES ( ddd, ddd, nopt )<br />
difference between 2 dates in minutes. The optional numeric argument can be used <strong>to</strong> limit <strong>the</strong> elements<br />
of <strong>the</strong> data that are <strong>to</strong> be looked at.<br />
DIF.SECONDS ( ddd, ddd )<br />
difference between 2 dates in seconds.<br />
DATE FORMATTING COMMANDS<br />
FORMAT.DATE ( ddd, date.format ) $<br />
<strong>the</strong> first argument is a P-<strong>STAT</strong> date variable. The second argument is a character variable that contains<br />
<strong>the</strong> desired format. Almost any arrangement of numeric or character day/month values, dates and years<br />
can be specified. The following are 3 simple examples.<br />
‘Month-dd-yyyy’ ‘mon dd yy’ dd/n.month/yy<br />
DATE.LANGUAGE<br />
DATE.LANGUAGE GERMAN $<br />
ddd=date variable copt=optional char constant nn=number nopt=optional number
10.24 <strong>PPL</strong>:Date and Time Commands and Functions<br />
DATE.LANGUAGE ENGLISH $<br />
Select <strong>the</strong> language for <strong>the</strong> dates. GERMAN and ENGLISH are supported.<br />
DATE.ORDER<br />
DATE.ORDER 'Jan 1, 2002 12:34:56' $<br />
DATE.ORDER changes <strong>the</strong> default format ordering. The supplied date must be a legal date. The default<br />
order is:<br />
'Tues Jan 1, 2002 12:34:56'<br />
The default style is: English, abbreviated, capitalized.<br />
MONTH.CASE<br />
MONTH.CASE UPPER $<br />
Changes <strong>the</strong> case of month names. UPPER, LOWER and CAPITALIZED are supported.<br />
WEEKDAY.CASE<br />
WEEKDAY.CASE LOWER $<br />
Changes <strong>the</strong> case of weekday names. UPPER, LOWER and CAPITALIZED are supported.<br />
MONTH.LENGTH<br />
MONTH.LENGTH FULL $<br />
Changes <strong>the</strong> length of month names. FULL and ABBREVIATED are supported.<br />
WEEKDAY.LENGTH<br />
WEEKDAY.LENGTH ABBREVIATED $<br />
Changes <strong>the</strong> length of weekday names. FULL and ABBREVIATED are supported.<br />
MONTH.NAMES<br />
MONTH.NAMES jan feb mar apr may june july aug sept oct nov dec $<br />
Changes <strong>the</strong> default month names. These 12 names must be abbreviations of <strong>the</strong> current fullmonth<br />
names.<br />
WEEKDAY.NAMES<br />
WEEKDAY.NAMES mo tu we th fr sa su $<br />
changes <strong>the</strong> default weekday names. These 7 names must be abbreviations of <strong>the</strong> current full weekday<br />
names.<br />
DATE LOGICAL OPERATORS<br />
Each date logical opera<strong>to</strong>r examines two date values, which can be expressions. A date value MUST have a date<br />
field (year-month-day) and MAY have a time field (hour-minute-second). These date and time fields are treated<br />
in <strong>the</strong> date compares as if <strong>the</strong>y were two separate BY variables in a sort. The 6 opera<strong>to</strong>rs are:<br />
1. DATE.EQ<br />
nn=number nopt=optional number ddd=date variable copt=optional char constant
<strong>PPL</strong>: Date and Time Commands and Functions 10.25<br />
2. DATE.NE<br />
3. DATE.LE<br />
4. DATE.LT (BEFORE can also be used)<br />
5. DATE.GT (AFTER can also be used)<br />
6. DATE.GE<br />
ddd=date variable copt=optional char constant nn=number nopt=optional number
11<br />
TEXTWRITER:<br />
Report Writing<br />
The TEXTWRITER command produces text or reports that summarize <strong>the</strong> data in a P-<strong>STAT</strong> system file. The<br />
text is formatted much <strong>the</strong> same way as text produced by a word processing software package, with justification,<br />
paragraphs and pagination. In addition, <strong>the</strong> reports can include character strings, values from <strong>the</strong> file, and evaluations<br />
of complex expressions containing functions and opera<strong>to</strong>rs.<br />
TEXTWRITER uses <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong> (<strong>PPL</strong>) instructions PUT and PUTL <strong>to</strong> specify <strong>the</strong><br />
strings, values and expressions. Additional <strong>PPL</strong> may be included <strong>to</strong> test values and output appropriate strings.<br />
Thus, if Sex equals “M”, <strong>the</strong> string “Mr.” is written, but if Sex equals “F”, “Ms.” is output. Control words format<br />
<strong>the</strong> text and position it in specific columns and lines. (The previous eight chapters explain all aspects of <strong>PPL</strong>. The<br />
first two <strong>PPL</strong> chapters cover <strong>the</strong> basics which, with this chapter, provides sufficient information for using<br />
TEXTWRITER.)<br />
11.1 OVERVIEW<br />
TEXTWRITER requires an input file and instructions specifying <strong>the</strong> contents of a report. The input file, whose<br />
data values are typically included or summarized in <strong>the</strong> report, is named directly after <strong>the</strong> command:<br />
TEXTWRITER Accounts<br />
Here <strong>the</strong> input file is named “Cus<strong>to</strong>mers”. (No comma follows <strong>the</strong> filename.) The report instructions are <strong>PPL</strong><br />
clauses enclosed in brackets:<br />
TEXTWRITER Accounts<br />
[ IF FIRST ( .FILE. ), PUT @SKIP ;<br />
PUT @JUST Company Tel.No <br />
First.Name Last.Name ],<br />
WIDTH 56 $<br />
The bulk of a TEXTWRITER command is <strong>the</strong> set of <strong>PPL</strong> instructions that follow <strong>the</strong> input filename. Many<br />
of <strong>the</strong>se are PUT statements. Identifiers specific <strong>to</strong> TEXTWRITER may follow <strong>the</strong> <strong>PPL</strong> in <strong>the</strong> usual manner.<br />
The format of a PUT is:<br />
1. PUT (or PUTL)<br />
2. one or more values, character strings and control words (like @20 or @NEXT)<br />
3. <strong>the</strong> PUT phrase end character which, depending on <strong>the</strong> context, is a comma, a semicolon or a right<br />
bracket.<br />
In <strong>the</strong> PUT instructions, character strings are enclosed in quotes or between <strong>the</strong> directional signs “ > ”. Variable names are not in quotes or directional signs. Expressions are enclosed in paren<strong>the</strong>ses. Control<br />
words (beginning with “@”) specify placement and format options.<br />
The report produced by this TEXTWRITER command might look like:
11.2 TEXTWRITER: Report Writing<br />
A REPORT:<br />
At Smith and Bro<strong>the</strong>rs, <strong>Inc</strong>., telephone: (312) 457-8700,<br />
<strong>the</strong> person <strong>to</strong> contact is Jim Glidden.<br />
A similar sentence is <strong>the</strong>n output for each case in <strong>the</strong> input file.<br />
11.2 Justification<br />
When justification is specified, <strong>the</strong> text in <strong>the</strong> report is aligned at <strong>the</strong> right edge as well as <strong>the</strong> left edge. Extra<br />
blanks are inserted after certain punctuation and between words <strong>to</strong> achieve justification. Up <strong>to</strong> a maximum of five<br />
blanks may come between words, although a smaller number may be specified. Typically, only two blanks appear<br />
between some of <strong>the</strong> words. The concluding line of a paragraph, as well as any single line, is not justified.<br />
To avoid excess blank spaces in <strong>the</strong> report, trailing blanks are trimmed off character values and character expressions.<br />
Thus, <strong>the</strong> variable University occupies only three columns when its value is “MIT”, but nine columns<br />
when its value is “Prince<strong>to</strong>n”. Similarly, <strong>the</strong> values of numeric variables occupy only as many columns as necessary<br />
for a specific value, not <strong>the</strong> number of columns needed for <strong>the</strong> largest value. A blank space is au<strong>to</strong>matically<br />
inserted between successive values of variables and expressions.<br />
A large print buffer accumulates text. This permits <strong>the</strong> formatting and justification of large blocks of text.<br />
Strings that belong <strong>to</strong>ge<strong>the</strong>r, such as a word and its apostrophe, are kept <strong>to</strong>ge<strong>the</strong>r even though <strong>the</strong>y may be specified<br />
in separate instructions. Each text string follows <strong>the</strong> next, until control words such as @PARA (new<br />
paragraph) or @NEXT (next line) cause <strong>the</strong> start of a new line. Then <strong>the</strong> text in <strong>the</strong> buffer is flushed (emptied<br />
out) and printed, and accumulation of text for subsequent lines begins.<br />
11.3 The “No-Break” Character<br />
The “not” sign, which is generally a caret in <strong>the</strong> ASCII character set and a bar-like character in <strong>the</strong> EBCDIC set,<br />
is <strong>the</strong> no-break character. It keeps two character strings <strong>to</strong>ge<strong>the</strong>r on <strong>the</strong> same line and translates <strong>to</strong> a single blank<br />
space when printing takes place. Thus, “Mr.^Lee” prints as “Mr. Lee” and does not break or widen between <strong>the</strong><br />
two words.<br />
11.4 <strong>PPL</strong> INSTRUCTIONS PUT AND PUTL<br />
PUT and PUTL, two <strong>PPL</strong> instructions, specify character strings, values of variables and scratch variables, and expressions<br />
<strong>to</strong> position in <strong>the</strong> text. These instructions produce <strong>the</strong> actual report. PUT places only <strong>the</strong> values of<br />
variables in <strong>the</strong> text, whereas PUTL places <strong>the</strong> names of <strong>the</strong> variables as well as <strong>the</strong> values in <strong>the</strong> text.<br />
PUT can be used in <strong>the</strong> same way that, for example, SET is used, ei<strong>the</strong>r starting a new <strong>PPL</strong> instruction or as<br />
a consequent of an IF. The PUT is followed by character strings, variable names and expressions. Many PUT<br />
items (strings and variable names) may follow one PUT. Control words, such as @NEXT, may be used as needed:<br />
[ PUT .DATE. @SKIP;<br />
Here a character string, a system variable, and a control word follow one PUT instruction. The @SKIP, described<br />
later, causes <strong>the</strong> current line and <strong>the</strong>n a blank line <strong>to</strong> be written.<br />
In addition, any o<strong>the</strong>r <strong>PPL</strong> instructions, functions and opera<strong>to</strong>rs may be included in <strong>the</strong> <strong>PPL</strong> clauses. This is<br />
typically <strong>the</strong> case when <strong>the</strong> choice of which character string <strong>to</strong> place in <strong>the</strong> report depends on testing and evaluating<br />
values. For example, this instruction tests <strong>the</strong> value of <strong>the</strong> scratch variable “#Recent”:<br />
IF #Recent EQ , INCREASE #Count, PUT Hospital<br />
> Date.Last.Call ;<br />
The scratch variable “#Count” is increased and an appropriate character string is put in <strong>the</strong> output line of text when<br />
<strong>the</strong> result of <strong>the</strong> IF test is true.
TEXTWRITER: Report Writing 11.3<br />
11.5 Character Strings<br />
Any set of arbitrary characters, enclosed in single or double quotes or between <strong>the</strong> directional signs “ > ”, is a character string. The string may contain letters, numbers, punctuation and blanks, and it may be from<br />
1 <strong>to</strong> 50,000 characters long. The string should not contain <strong>the</strong> names of variables, scratch variables and expressions,<br />
because <strong>the</strong>se items will be printed literally — substitution of <strong>the</strong>ir appropriate values will not take place.<br />
The instruction:<br />
PUT First Last ;<br />
yields a line such as this in <strong>the</strong> report:<br />
This instruction:<br />
yields:<br />
The client is: Mary Roberts.<br />
PUT ;<br />
The client is: First Last.<br />
Character strings should contain only <strong>the</strong> exact text desired in <strong>the</strong> report. Some TEXTWRITER applications have<br />
hundreds of lines of <strong>PPL</strong>. Using instead of 'string' or “string” helps you see <strong>the</strong> strings more easily.<br />
Also, is more quickly<br />
flagged than omitting a string-terminating ’ or ”.<br />
11.6 Values of Variables<br />
The current values of variables, scratch variables, system variables and positions in <strong>the</strong> V vec<strong>to</strong>r (case vec<strong>to</strong>r) or<br />
P vec<strong>to</strong>r (permanent vec<strong>to</strong>r) may be placed in reports. None of <strong>the</strong>se values is enclosed in quotes or between directional<br />
signs. Complex expressions must be enclosed in paren<strong>the</strong>ses. This instruction includes a variable, a<br />
scratch variable and a system variable, as well as four quoted strings:<br />
[ PUT 'The balance for account number ' Acct.Number<br />
' is $' #Balance “ on ” .DATE. '.' ]<br />
This is <strong>the</strong> same instruction using directional signs instead of quotes:<br />
[ PUT Acct.Number<br />
> #Balance > .DATE. ]<br />
Quotes or directional signs enclose only <strong>the</strong> character strings. Given a width of 50 and this data, <strong>the</strong> previous instruction<br />
yields:<br />
The balance for account number 1268004 is $752.35<br />
on Apr 22, 1986.<br />
Note that a sentence such as this is produced for each case in <strong>the</strong> input file. The value of <strong>the</strong> variable<br />
Acct.Number is likely <strong>to</strong> change as each case is processed. The value of <strong>the</strong> scratch variable #Balance will not<br />
change unless it is reset for each case. This instruction would reset #Balance:<br />
[ SET #Balance = Balance + Interest ]<br />
The SET should precede <strong>the</strong> PUT instruction that places <strong>the</strong> value of #Balance in <strong>the</strong> report. The value of <strong>the</strong><br />
system variable .DATE. will not change unless <strong>the</strong> TEXTWRITER command is run again on ano<strong>the</strong>r day.<br />
A blank is au<strong>to</strong>matically inserted after a variable or expression value if <strong>the</strong> next value is ano<strong>the</strong>r variable or<br />
expression, and <strong>the</strong> final character of <strong>the</strong> current value is not a blank or a “not” sign.
11.4 TEXTWRITER: Report Writing<br />
11.7 Expressions and Functions<br />
Complex expressions containing functions and opera<strong>to</strong>rs, as well as variables and values, may be included in PUT<br />
instructions. Expressions must be enclosed in paren<strong>the</strong>ses and many nested levels of paren<strong>the</strong>ses may be used.<br />
The expressions are evaluated and <strong>the</strong> result is placed in <strong>the</strong> output line.<br />
The ability <strong>to</strong> use expressions makes it possible <strong>to</strong> use <strong>the</strong> full power of <strong>PPL</strong> in report writing. Complex numeric<br />
items and trigonometry functions may be computed, character strings may be padded and concatenated, and<br />
values in cases may be tested and recoded, all within <strong>the</strong> <strong>PPL</strong> instructions that comprise <strong>the</strong> bulk of<br />
TEXTWRITER.<br />
A sampling of expressions that may included in PUT instructions are:<br />
(CAPS (Name) ) (12 ** 3)<br />
(Salary + Commission) (LOG10 (Value1 / Value2) )<br />
(MEAN (Test.?) ) (CHAREX (Date, 'XX00') )<br />
(V(4) - P(Area + 2) ) (SUBSTRING (LEFT (Name), 1, 1 ) )<br />
__________________________________________________________________________<br />
Figure 11.1 Producing a Report: The Input Files<br />
File Hospital.lab<br />
Hospital (1) Mercy Hospital (2) Children's (3) Eye and Ear<br />
(4) Crans<strong>to</strong>n Memorial (5) Willis (6) St Agnes /<br />
File Sales<br />
Date Date Amt<br />
Last Last Last Sales<br />
Hospital Call Order Order No Salesman<br />
4 86-03-17 86-04-15 318.00 2 Will Moore<br />
2 85-09-20 85-06-25 112.60 4 Ted Ryan<br />
3 86-01-12 - - 4 Ted Ryan<br />
6 86-05-15 85-06-11 430.99 4 Ted Ryan<br />
5 86-02-07 86-02-10 775.25 6 Liz Brown<br />
1 86-04-12 86-06-01 450.67 6 Liz Brown<br />
__________________________________________________________________________<br />
11.8 A Sample Report<br />
Figures 11.1, 11.2 and 11.3 illustrate producing a report using PUT instructions, quoted strings, values and expressions.<br />
Figure 11.1 shows <strong>the</strong> input files. The file Hispital.lab contains value labels for <strong>the</strong> Hospital variable<br />
in file Sales. A preliminary SORT by Sales.No and Date.Last.Call has grouped <strong>to</strong>ge<strong>the</strong>r each salesperson’s cus<strong>to</strong>mers<br />
and orders <strong>the</strong>m by <strong>the</strong>ir date of <strong>the</strong> last sales call. (The report has paragraphs for each salesperson, with<br />
<strong>the</strong> sentences describing <strong>the</strong> status of each account.)<br />
Figure 11.2 shows <strong>the</strong> TEXTWRITER command and <strong>the</strong> PUT instructions. (The numbers at <strong>the</strong> left are not<br />
part of <strong>the</strong> commands, but merely correspond <strong>to</strong> <strong>the</strong> subsequent explanation.) Notice <strong>the</strong> general format of <strong>the</strong><br />
entire command — TEXTWRITER is followed by <strong>the</strong> input filename, and <strong>the</strong> filename is followed directly by<br />
<strong>PPL</strong> clauses of PUT (and o<strong>the</strong>r) instructions. Quoted strings, variables and expressions are included in <strong>the</strong> PUTs.<br />
Control words (beginning with "@") specify text placement. Notice also that <strong>the</strong> command identifiers (LABELS,<br />
STREAM, JUSTIFY and WIDTH) follow <strong>the</strong> <strong>PPL</strong> and are <strong>the</strong>mselves preceded by commas. (STREAM mode<br />
groups information from several cus<strong>to</strong>mers in<strong>to</strong> a single paragraph.)
TEXTWRITER: Report Writing 11.5<br />
__________________________________________________________________________<br />
Figure 11.2 Producing a Report: The TEXTWRITER Command<br />
TEXTWRITER Sales<br />
1. [ IF FIRST (.FILE.), PUT @PAGE<br />
.DATE. @SKIP ]<br />
2. [ IF FIRST (Sales.No), GEN #Count = 0, PUT @PARA ]<br />
3. [ GEN #Recent:C = 'no';<br />
IF Date.Last.Call GT '86-05-00', SET #Recent = 'yes' ]<br />
4. [ IF #Recent EQ THEN;<br />
INCREASE #Count, PUT Hospital<br />
> Date.Last.Call ;<br />
5. IF Date.Last.Order GOOD, PUT<br />
> Date.Last.Order<br />
@PLACES=0 Amt.Last.Order ;<br />
6. IF Date.Last.Order MISSING,<br />
PUT > ;<br />
ENDIF ]<br />
7. [ IF #Count GT 0 AND LAST (Sales.No),<br />
PUT Salesman > ] ,<br />
LABELS 'Hospital.lab', STREAM, JUSTIFY, WIDTH 61 $<br />
__________________________________________________________________________<br />
The general procedure of <strong>the</strong> instructions in Figure 11.2 is:<br />
1. Supply a heading for <strong>the</strong> report. This is done only once, when <strong>the</strong> first case or cus<strong>to</strong>mer in <strong>the</strong> file<br />
is processed.<br />
2. Generate a scratch variable #Count <strong>to</strong> keep track of <strong>the</strong> number of cus<strong>to</strong>mers a salesperson has. A<br />
salesperson is in <strong>the</strong> report only if he has some cus<strong>to</strong>mers without recent sales calls.<br />
3. Generate a scratch variable #Recent and reset it if a cus<strong>to</strong>mer has had a recent sales call. Only cus<strong>to</strong>mers<br />
without recent calls are <strong>to</strong> be in <strong>the</strong> report. Note: The two digit year will be a problem after<br />
1999.<br />
4. Specify <strong>the</strong> text strings <strong>to</strong> go in <strong>the</strong> report for cus<strong>to</strong>mers without recent calls. Also, put <strong>the</strong> date of<br />
<strong>the</strong>ir call in <strong>the</strong> report.<br />
5. If <strong>the</strong> cus<strong>to</strong>mer has placed an order as a result of a prior sales call, put an appropriate text string and<br />
<strong>the</strong> date of that last order in <strong>the</strong> report. Also, put <strong>the</strong> amount of that order in <strong>the</strong> report.<br />
6. If <strong>the</strong>re has not been an order, put a text string saying so in <strong>the</strong> report.<br />
7. When <strong>the</strong> last of a salesperson’s cus<strong>to</strong>mers is processed and at least one has not had a recent sales<br />
call, place <strong>the</strong> salesperson’s name in <strong>the</strong> report.<br />
The report produced is shown in Figure11.3. Wherever variable Hospital is referenced, <strong>the</strong> text in <strong>the</strong> labels<br />
file is used instead of <strong>the</strong> numeric value. A report of this type often conveys information more easily than a table
11.6 TEXTWRITER: Report Writing<br />
or listing of numbers. It is also obvious how <strong>to</strong> read it. On <strong>the</strong> o<strong>the</strong>r hand, a report summarizing many cases could<br />
be lengthy and repetitive.<br />
__________________________________________________________________________<br />
Figure 11.3 Producing a Report: The Report<br />
Hospital Supply Sales Report: Apr 16, 1986<br />
Crans<strong>to</strong>n Memorial has not received a sales call since<br />
86-03-17 and has not placed an order since 86-04-15. That<br />
order <strong>to</strong>talled $318. Will Moore is <strong>the</strong>ir salesperson.<br />
Children’s has not received a sales call since 85-09-20<br />
and has not placed an order since 85-06-25. That order<br />
<strong>to</strong>talled $113. Eye and Ear has not received a sales call<br />
since 86-01-12 and has not placed a subsequent order. Ted<br />
Ryan is <strong>the</strong>ir salesperson.<br />
Willis has not received a sales call since 86-02-07 and<br />
has not placed an order since 86-02-10. That order <strong>to</strong>talled<br />
$775. Mercy Hospital has not received a sales call since<br />
86-04-12 and has not placed an order since 86-06-01. That<br />
order <strong>to</strong>talled $451. Liz Brown is <strong>the</strong>ir salesperson.<br />
__________________________________________________________________________<br />
Report writing shines when <strong>the</strong> output report is actually many reports, each summarizing a single case (or<br />
related group of cases) and possibly going <strong>to</strong> different recipients. Figures 11.4, 11.5 and 11.6 illustrate a more<br />
complex report. Test results for each case are summarized and a separate report is produced about each individual.<br />
To make <strong>the</strong> report more readable, <strong>the</strong> text is changed slightly for sentences after <strong>the</strong> first one.<br />
11.9 Comments in <strong>PPL</strong> Clauses<br />
There may be many <strong>PPL</strong> clauses in a single TEXTWRITER command. Comments interspersed among <strong>the</strong> clauses<br />
document what is being done. They begin with /* and end with */:<br />
[ /* Comment: Generate a scratch variable counter.*/;<br />
GEN #Counter = 0 ]<br />
Any text may come between <strong>the</strong> beginning and end of <strong>the</strong> comment, and <strong>the</strong> comment may extend across records<br />
(lines). Comments may be part of <strong>the</strong> <strong>PPL</strong> clauses in any command, as well as in TEXTWRITER. The lengthy<br />
TEXTWRITER command in Figure 11.10 includes numerous comments <strong>to</strong> document <strong>the</strong> <strong>PPL</strong> instructions.<br />
11.10 OPTIONAL IDENTIFIERS<br />
TEXTWRITER has optional identifiers that control its operation and some format features. The CASE and<br />
STREAM identifiers specify <strong>the</strong> mode of operation of <strong>the</strong> TEXTWRITER command. The JUSTIFY, BLANKS,<br />
PUTL.CHAR and SPREAD identifiers control <strong>the</strong> way text is output in <strong>the</strong> report. MARGIN, LEADBLANK and<br />
WIDTH alter <strong>the</strong> format of <strong>the</strong> report. LABELS and OUT refer <strong>to</strong> optional files. The LABELS file is an input<br />
file of value labels. The OUT file is a P-<strong>STAT</strong> system file. PostScript identifiers are discussed later:<br />
11.11 CASE and STREAM: The Modes of Operation<br />
The CASE mode is assumed by <strong>the</strong> TEXTWRITER command, and thus <strong>the</strong> CASE identifier does not need <strong>to</strong> be<br />
explicitly included in <strong>the</strong> command. In CASE mode, <strong>the</strong> text starts on a new line at <strong>the</strong> start of each case. All
TEXTWRITER: Report Writing 11.7<br />
accumulated text is flushed and printed, and <strong>the</strong>n accumulation of text for <strong>the</strong> next case begins. STREAM mode<br />
is specified by using <strong>the</strong> STREAM identifier. In STREAM mode, text prints continuously.<br />
Often it is unnecessary <strong>to</strong> specify a mode when a control word such as @PAGE causes a page change as each<br />
new case is processed. This is <strong>the</strong> situation in both Figure 11.5 and Figure 11.10, when @PAGE is <strong>the</strong> initial control<br />
word after <strong>the</strong> first PUT instruction. @PAGE flushes and prints all accumulated text before moving <strong>to</strong> a new<br />
page. However, CASE mode also resets all control words <strong>to</strong> <strong>the</strong>ir initial default values as processing of each new<br />
case starts. STREAM does not reset <strong>the</strong> indent or <strong>the</strong> line width. This is discussed fur<strong>the</strong>r in <strong>the</strong> summary portion<br />
of <strong>the</strong> section “CONTROL WORDS.”<br />
11.12 JUSTIFY, BLANKS, PUTL.CHAR and SPREAD<br />
The JUSTIFY identifier specifies that <strong>the</strong> text in <strong>the</strong> report is <strong>to</strong> have <strong>the</strong> right as well as <strong>the</strong> left edge aligned (justified).<br />
Justification is achieved by <strong>the</strong> addition of extra blank spaces after certain punctuation and between words.<br />
Two blank spaces are inserted after periods, exclamation points and question marks, when <strong>the</strong>y end sentences.<br />
Blanks are not inserted after ellipses (...) and o<strong>the</strong>r punctuation, or if <strong>the</strong> user has already included two blanks after<br />
periods in <strong>the</strong> text strings. If necessary, an additional blank is inserted between one or more words. After each<br />
space has an extra blank, additional blanks are inserted if justification has not yet been achieved.<br />
When JUSTIFY is not specified, only <strong>the</strong> left edge of <strong>the</strong> text is aligned. A line is filled with text until no<br />
room remains for <strong>the</strong> next word, and that word is placed in <strong>the</strong> next line. The right edge of <strong>the</strong> text has a slightly<br />
ragged appearance due <strong>to</strong> <strong>the</strong> differing amounts of blank spaces remaining at <strong>the</strong> end of each line.<br />
The BLANKS identifier may be used when justification is in effect <strong>to</strong> reset <strong>the</strong> maximum number of blanks<br />
that may be added <strong>to</strong> <strong>the</strong> space between words. The argument for BLANKS is an integer whose smallest value<br />
may be 1. When BLANKS is not used, a maximum of four blanks is assumed. Thus, <strong>to</strong> achieve justification,<br />
TEXTWRITER may add up <strong>to</strong> four additional blanks <strong>to</strong> <strong>the</strong> existing single blank space between words. Typically,<br />
it is necessary <strong>to</strong> add only one extra blank <strong>to</strong> <strong>the</strong> spaces between some of <strong>the</strong> words <strong>to</strong> justify a line. However, if<br />
a line contains many long words or if <strong>the</strong> next word is very long, up <strong>to</strong> four additional blanks may need <strong>to</strong> be inserted<br />
between words.<br />
When PUTL is used <strong>to</strong> format a variable name and its value, <strong>the</strong> name and value are separated by <strong>the</strong> three<br />
characters “ = “. This can be changed by providing an alternate set of 1-3 characters. Given <strong>the</strong> variable name<br />
Age and <strong>the</strong> value 15 <strong>the</strong> control sequence:<br />
PUTL Age prints Age = 15<br />
The following examples of PUTL.CHARS produce:<br />
PUTL.CHARS ' ' Age 15<br />
PUTL.CHARS '---' Age---15<br />
PUTL.CHARS ' / ' Age / 15<br />
SPREAD and NO SPREAD are additional controls over <strong>the</strong> insertion of blanks. SPREAD is assumed. If NO<br />
SPREAD is used <strong>the</strong> values of adjacent variables will be concatenated without <strong>the</strong> usual intervening blank.<br />
11.13 MARGIN, LEADBLANK and WIDTH<br />
The MARGIN identifier specifies <strong>the</strong> number of columns that text is <strong>to</strong> be indented from <strong>the</strong> left. MARGIN 0 is<br />
assumed when MARGIN is not used and <strong>the</strong> text is not indented. The control word “@INDENT”, discussed in<br />
<strong>the</strong> subsequent section, specifies an additional indent that is measured from <strong>the</strong> current margin setting. It is used<br />
within a PUT instruction.<br />
Usually TEXTWRITER output is printed with a blank at <strong>the</strong> beginning of each line. This is useful when <strong>the</strong><br />
output is sent <strong>to</strong> a printer which uses <strong>the</strong> first character in <strong>the</strong> line for carriage control instructions. It is not needed<br />
and probably not wanted when <strong>the</strong> output is saved in a diskfile for use in ano<strong>the</strong>r program or document. The identifier<br />
NO LEADBLANK can be used <strong>to</strong> remove that blank. LEADBLANK, <strong>the</strong> assumed setting can also be used.
11.8 TEXTWRITER: Report Writing<br />
The WIDTH identifier specifies <strong>the</strong> number of columns <strong>to</strong> be used for a report; that is, <strong>the</strong> width of an output<br />
line in columns. (Note that <strong>the</strong> width is measured from <strong>the</strong> first column, not from <strong>the</strong> end of <strong>the</strong> margin or indent.)<br />
When WIDTH is not used, <strong>the</strong> current output width defines <strong>the</strong> width of <strong>the</strong> report up <strong>to</strong> a maximum of 400.<br />
WIDTH, in <strong>the</strong> TEXTWRITER command, overrides <strong>the</strong> output width setting. It can be from 2 <strong>to</strong> 400 The<br />
WIDTH identifier may be overridden by <strong>the</strong> control word @WIDTH used within PUT clauses.<br />
Regardless of how <strong>the</strong> width of a report is set, one column of that width is reserved for carriage control characters<br />
(necessary <strong>to</strong> tell a printer when <strong>to</strong> page or skip lines). Thus, <strong>the</strong> actual report has a width one column less<br />
than <strong>the</strong> specified width. This is generally not of concern, but if it is, WIDTH 71, for example, should be specified<br />
for a report of actual width 70.<br />
11.14 Optional Files: LABELS and OUT<br />
The LABELS identifier is used <strong>to</strong> provide <strong>the</strong> names of one mor more labels files. If <strong>the</strong> labels files contains values<br />
for a numeric variable, that text is used in place of <strong>the</strong> number. The TEXTWRITER command does NOT<br />
make use of <strong>the</strong> extended labels when variable names are used in a PUTL or VARNAME reference.<br />
The OUT identifier is used <strong>to</strong> produce an output file that contains any modifications that are made <strong>to</strong> <strong>the</strong> input<br />
file.<br />
11.15 CONTROL WORDS<br />
Control words, used in PUT instructions, control <strong>the</strong> formatting and placement of text. They begin with “@” and<br />
may be anywhere in <strong>the</strong> PUT clause, except before <strong>the</strong> PUT. The basic control words are:<br />
@nn @PARA @PAGE @TRIM @COMMAS @MISS<br />
@PLUS @NEXT @INDENT @JUST @PLACES @M @M1<br />
@MINUS @SKIP @WIDTH @BEFORE @EQUAL @M2 @M3<br />
@LABEL @SPREAD<br />
(“nn” represents a positive whole number.)<br />
Some control words require a numeric argument, such as <strong>the</strong> number of lines <strong>to</strong> skip. This number may follow<br />
directly after <strong>the</strong> control word or after an equal-sign. These are equivalent instructions:<br />
@SKIP3 @SKIP=3<br />
Ei<strong>the</strong>r one skips three lines. Although <strong>the</strong> argument directly following <strong>the</strong> control word is typically a number, it<br />
may be any expression that evaluates <strong>to</strong> a numeric value:<br />
@PLUS(#Count-1) @PLUS=(#Count-1)<br />
Ei<strong>the</strong>r of <strong>the</strong>se instructions moves <strong>the</strong> column pointer <strong>to</strong> <strong>the</strong> right <strong>the</strong> number of spaces specified by <strong>the</strong> value of<br />
#Count minus one.<br />
When a control word has a simple argument such as <strong>the</strong> number 3, it may be placed directly (no spaces) after<br />
<strong>the</strong> control word or it may be separated from <strong>the</strong> control word by an equal sign. Again <strong>the</strong>re are no spaces around<br />
<strong>the</strong> equal sign. When <strong>the</strong> argument is an expression it must be enclosed in paren<strong>the</strong>ses. The paren<strong>the</strong>ses must<br />
immediately follow <strong>the</strong> control word or <strong>the</strong> equal sign. However, <strong>the</strong> expression within <strong>the</strong> paren<strong>the</strong>ses may contain<br />
spaces for readability.<br />
11.16 Control Words <strong>to</strong> Produce a Letter<br />
Figures 11.4, 11.5, and 11.6 illustrate <strong>the</strong> use of TEXTWRITER <strong>to</strong> produce a form letter. The letter is personalized<br />
by including information specific <strong>to</strong> each case in <strong>the</strong> input file. Control words described in <strong>the</strong> sections which<br />
follow position <strong>the</strong> heading, <strong>the</strong> salutation, <strong>the</strong> body and <strong>the</strong> closing portions of <strong>the</strong> letter. Each letter is on a separate<br />
page.
TEXTWRITER: Report Writing 11.9<br />
The TEXTWRITER command and <strong>PPL</strong> clauses can select cases from a file, calculate information, write appropriate<br />
text and control <strong>the</strong> placement of that text <strong>to</strong> produce suitably personalized letters and reports. Various<br />
tasks that could be done include:<br />
1. Billing<br />
Calculate amount due, date due and discount for early payment, and write bill with correct name,<br />
address and aligned dollar amounts.<br />
2. Reminding<br />
Select patients with upcoming appointments and write letters reminding patients of appointment<br />
date, time, physician and procedure.<br />
3. Fund Raising<br />
Select past donors and write solicitations for funds, including in <strong>the</strong> letters <strong>the</strong> number of years of<br />
support, <strong>the</strong> maximum previously given, and <strong>the</strong> number of supporters in this individual’s class or<br />
organization.<br />
4. Claims Processing<br />
Select pending insurance claims and write letters giving <strong>the</strong> current status of <strong>the</strong> claim, including<br />
deductible amount, amount covered, amount payable, and remaining coverage.<br />
__________________________________________________________________________<br />
Figure 11.4 A Form Letter: The Input File<br />
File MailList<br />
Last First Sex Company Street<br />
Greene Sharon F Pierce & Co. P.O. Box 365<br />
Smyth William - Devon Industries 126 West 46th St.<br />
City State Zip Copier<br />
New York NY 10003 Kanon Premiere<br />
Brooklyn NY 11234 Shape 100<br />
__________________________________________________________________________<br />
11.17 Positioning Columns<br />
The control word character “@” may be followed directly by a number <strong>to</strong> specify an exact column location. This<br />
positions <strong>the</strong> first letter of <strong>the</strong> variable Name in <strong>the</strong> fifth column:<br />
[ PUT @5 Name ]<br />
The column location is measured from <strong>the</strong> start of <strong>the</strong> line. The column pointer moves <strong>to</strong> <strong>the</strong> specified column<br />
and subsequent text begins in that column.<br />
@PLUS and @MINUS move <strong>the</strong> column pointer right and left, respectively, from its current position.<br />
@PLUS moves <strong>the</strong> pointer right <strong>the</strong> specified number of columns; @MINUS moves it left. Thus, if <strong>the</strong> current<br />
column is number 20, @PLUS7 moves <strong>the</strong> pointer seven columns <strong>to</strong> <strong>the</strong> right <strong>to</strong> column 27.<br />
Note that <strong>the</strong> pointer moves only in <strong>the</strong> current line. Thus, if text is already in <strong>the</strong> specified column, it is overwritten<br />
by <strong>the</strong> new text. Also, when using @PLUS and @MINUS, <strong>the</strong> pointer moves relative <strong>to</strong> its current<br />
location, which may be dependent upon <strong>the</strong> length of <strong>the</strong> last value it printed. This instruction,<br />
[ PUT @10 First Last @25 Phone ]
11.10 TEXTWRITER: Report Writing<br />
might produce:<br />
Susan Wells 205-672-9122<br />
Thomas Bretchei617-926-0106<br />
12345678901234567890123456789012345678901234567890<br />
The second phone number has overwritten <strong>the</strong> remainder of that name. (A scale numbering <strong>the</strong> columns is included<br />
just for illustration.)<br />
11.18 Positioning Lines<br />
The control words @PARA, @NEXT, @SKIP, @PAGE, @INDENT and @WIDTH specify where a subsequent<br />
line prints. @PARA starts a new paragraph by moving <strong>the</strong> pointer <strong>to</strong> <strong>the</strong> next line and indenting three<br />
columns. Text starts in <strong>the</strong> fourth column. Alternate styles of paragraphia may be obtained using @SKIP (or<br />
@SKIP @5) instead of @PARA.<br />
@NEXT positions subsequent text in <strong>the</strong> next line. @SKIP skips <strong>the</strong> specified number of lines or, if no number<br />
directly follows, skips one line. @PAGE positions text at <strong>the</strong> <strong>to</strong>p of <strong>the</strong> next page.<br />
The control word @INDENT specifies an additional number of columns <strong>to</strong> indent text from <strong>the</strong> current margin<br />
setting. (The value after @INDENT is added <strong>to</strong> <strong>the</strong> current margin setting and text is indented that many<br />
columns from <strong>the</strong> left.) The current margin is that specified by <strong>the</strong> MARGIN identifier in <strong>the</strong> TEXTWRITER<br />
command or <strong>the</strong> default margin of zero indent. @IN is an abbreviation for @INDENT. @NOINDENT resets <strong>the</strong><br />
indentation <strong>to</strong> that specified by <strong>the</strong> MARGIN identifier or <strong>to</strong> 0 if MARGIN was not used.<br />
The @WIDTH control word sets <strong>the</strong> width of <strong>the</strong> report. The width is measured from column one, not from<br />
<strong>the</strong> current margin or indent setting. Thus, when <strong>the</strong> indent is increased, <strong>the</strong> line length is shortened. @WIDTH<br />
overrides any previous output width settings defined by <strong>the</strong> OUTPUT.WIDTH command or <strong>the</strong> identifier WIDTH<br />
in TEXTWRITER. The integer following @WIDTH may range from 2 <strong>to</strong> 400. @NOWIDTH turns off <strong>the</strong> current<br />
setting, and <strong>the</strong> width of <strong>the</strong> report reverts <strong>to</strong> that set by <strong>the</strong> WIDTH identifier or <strong>the</strong> OUTPUT.WIDTH<br />
command.<br />
Each of <strong>the</strong>se control words flushes <strong>the</strong> text buffer and prints accumulated text before moving <strong>to</strong> <strong>the</strong> specified<br />
line. When <strong>the</strong>se control words are not used, text prints continuously until <strong>the</strong> current line is full and <strong>the</strong>n text<br />
continues on <strong>the</strong> next line.<br />
11.19 Positioning Words<br />
@TRIM is assumed by TEXTWRITER. Trailing blanks are au<strong>to</strong>matically trimmed from character strings<br />
before <strong>the</strong>y are positioned in <strong>the</strong> text. This avoids having many blanks following a short name. @NOTRIM may<br />
be specified <strong>to</strong> turn trimming off. Then, a character value occupies as many columns as its defined length, even<br />
though a particular value may be blank or only a few characters long. (Numbers do not have trailing blanks.)<br />
@JUST specifies that text be right as well as left justified. The lines of <strong>the</strong> report are aligned at both <strong>the</strong> left<br />
and right margins. When @JUST is not used and <strong>the</strong> JUSTIFY identifier is not included in <strong>the</strong> TEXTWRITER<br />
command, as many words as fit in a line are printed and <strong>the</strong>n a new line is started. Thus, <strong>the</strong> right margin is jagged<br />
or unaligned. @NOJUST turns justification off, overriding <strong>the</strong> JUSTIFY identifier if it has been used.<br />
@BEFORE places <strong>the</strong> next value or string <strong>to</strong> be written immediately before <strong>the</strong> specified column. It affects<br />
only <strong>the</strong> text or variable following directly after it, and it does not reset <strong>the</strong> right edge of <strong>the</strong> report. This<br />
instruction,<br />
[ PUT @BEFORE20 City @24 Area.Code ]<br />
given this data, produces:<br />
Prince<strong>to</strong>n 609<br />
Somerville 201<br />
Tren<strong>to</strong>n 609
TEXTWRITER: Report Writing 11.11<br />
123456789012345678901234567890<br />
Notice that City is right aligned before column 20; column 20 itself is blank. Area code starts in column 24. (The<br />
scale is not part of <strong>the</strong> output.)<br />
@COMMAS requests that subsequent numeric values print with commas every three digits (counting from<br />
<strong>the</strong> decimal point <strong>to</strong> <strong>the</strong> left). This makes reading large numbers, such as population figures or dollar amounts,<br />
easier. @NOCOMMAS turns this off.<br />
__________________________________________________________________________<br />
Figure 11.5 A Form Letter: The TEXTWRITER Command<br />
TEXTWRITER MailList<br />
[/* RETURN ADDRESS */;<br />
PUT @PAGE @SKIP4 @40 <br />
@NEXT @40 <br />
@NEXT @40 <br />
@SKIP @40 (LTRIM (.DATE.) ) @SKIP2 ;<br />
/* CUSTOMER ADDRESS */;<br />
IF Sex EQ 'M', T.PUT , F.PUT ;<br />
PUT First Last @NEXT Company @NEXT Street<br />
@NEXT City State > Zip ;<br />
/* SALUTATION */;<br />
PUT @SKIP ;<br />
IF Sex EQ 'M', T.PUT , F.PUT , M.GOTO Sir;<br />
PUT Last ; GOTO Continue ;<br />
Sir: PUT ;<br />
/* BODY OF LETTER */;<br />
Continue: ;<br />
PUT @PARA <br />
> Copier ><br />
> City ;<br />
PUT @PARA Copier <br />
;<br />
/* CLOSING */;<br />
PUT @SKIP2 @40 @SKIP4 @40 <br />
@NEXT @40 @SKIP2 @NEXT ],<br />
MARGIN 5, JUSTIFY, WIDTH 71 $<br />
__________________________________________________________________________
11.12 TEXTWRITER: Report Writing<br />
__________________________________________________________________________<br />
Figure 11.6 A Form Letter: One Letter<br />
Ms. Sharon Greene<br />
Pierce & Co.<br />
P.O. Box 365<br />
New York, NY 10003<br />
Dear Ms. Greene:<br />
GREAT Copier Supplies<br />
123 First Street<br />
New York, NY 11001<br />
May 7, 1986<br />
Thank you for calling GREAT Copier. We s<strong>to</strong>ck all supplies<br />
for <strong>the</strong> Kanon Premiere copier at discount prices, and we deliver<br />
<strong>the</strong>m right <strong>to</strong> your business in New York.<br />
Enclosed is a price list for <strong>the</strong> Kanon Premiere. Should you<br />
have any questions, please give us a call.<br />
Sincerely yours,<br />
Sam Right<br />
Sales Manager<br />
SR:ms<br />
Enc: pl<br />
__________________________________________________________________________<br />
@PLACES specifies <strong>the</strong> number of decimal places (counting from <strong>the</strong> decimal point <strong>to</strong> <strong>the</strong> right) with which<br />
<strong>to</strong> print subsequent numeric values. The numbers are rounded if <strong>the</strong>y have more than <strong>the</strong> specified number of<br />
places or zeros are added if <strong>the</strong>y have less than <strong>the</strong> specified number of places. This instruction:<br />
[ PUT @PLACES2 <strong>Inc</strong>ome ]<br />
prints <strong>the</strong> variable <strong>Inc</strong>ome with two decimal places. @PL is an abbreviation. @NOPLACES turns off <strong>the</strong> prior<br />
places specification. Numbers <strong>the</strong>n print with <strong>the</strong>ir actual number of decimal places. (@NOPLACES is not <strong>the</strong><br />
same as @PLACES0. Zero decimal places print when @PLACES0 is specified. The number of places actually<br />
in a numeric value print when @NOPLACES, <strong>the</strong> initial default setting, is in effect.)<br />
The PLACES function is distinct from <strong>the</strong> @PLACES control word. Both may be used in PUT clauses, if<br />
desired:<br />
[ PUT 'Total income, <strong>to</strong> <strong>the</strong> nearest dollar, is: '<br />
@PLACES2 (PLACES (<strong>Inc</strong>ome, 0)) ]
TEXTWRITER: Report Writing 11.13<br />
In this example, <strong>the</strong> PLACES function rounds <strong>the</strong> number of decimal places in <strong>Inc</strong>ome <strong>to</strong> zero and <strong>the</strong> @PLACES<br />
control word sets <strong>the</strong> number of places <strong>to</strong> two. Thus, <strong>Inc</strong>ome is shown <strong>to</strong> <strong>the</strong> nearest dollar, but with two zeros<br />
included after <strong>the</strong> decimal point in <strong>the</strong> common dollar and cents pattern.<br />
@SPREAD is <strong>the</strong> assumed setting. @SPREAD causes a single blank <strong>to</strong> be placed between variables. @NO<br />
SPREAD can be used <strong>to</strong> change this so that <strong>the</strong> blank is omitted. For example:<br />
TEXTWRITER Tests [ PUT @NOSPREAD First Last ] $<br />
produces <strong>the</strong> following output:<br />
JamesWilmot<br />
SheilaHiggin<br />
Variables are usually trimmed of outside blanks before printing. The combination of @NOTRIM and @NO-<br />
SPREAD causes TEXTWRITER <strong>to</strong> leave things alone and print <strong>the</strong> values exactly as <strong>the</strong>y are s<strong>to</strong>red in <strong>the</strong> file<br />
with no intervening blanks.<br />
11.20 Labeling Values<br />
The PUTL instruction and <strong>the</strong> control word @EQUAL align variable names as well as <strong>the</strong>ir values about <strong>the</strong>ir<br />
equal-signs. This is useful for embedding lists within reports or dumping values in cases with inconsistent data.<br />
PUTL requests that <strong>the</strong> variable name as well as its value be placed in <strong>the</strong> output line:<br />
[ IF FIRST (.FILE.),<br />
PUT 'The following accounts are overdue:' @SKIP1;<br />
IF DAYS (.NDATE., 'YYYYMMDD') - DAYS (Due, 'YYMMDD') GT 30,<br />
PUT Acct.Number Co.Name, PUTL @26 Billed Due ]<br />
Here is a possible report using <strong>the</strong>se PUTL statements:<br />
The following accounts are overdue:<br />
1205 Jones & Sons, <strong>Inc</strong>. Billed = 860112 Due = 860210<br />
1231 Birchwood Lumber Billed = 860210 Due = 860305<br />
The @EQUAL control word aligns labeled variables by specifying <strong>the</strong> column location of <strong>the</strong> equal-sign.<br />
Each line of text has that variable label and value with <strong>the</strong> equal-sign in <strong>the</strong> same column. @EQUAL is used after<br />
PUTL:<br />
[ PUTL @EQUAL15 Sys<strong>to</strong>lic ]<br />
For two cases this produces:<br />
Sys<strong>to</strong>lic = 96<br />
Sys<strong>to</strong>lic = 82<br />
123456789012345678901234567890<br />
Multiple locations may be specified. This instruction:<br />
or <strong>the</strong> equivalent:<br />
[ PUTL @EQUAL15:35 Sys<strong>to</strong>lic Dias<strong>to</strong>lic ]<br />
[ PUTL @EQUAL=15:35 Sys<strong>to</strong>lic Dias<strong>to</strong>lic ]<br />
aligns <strong>the</strong> variable names and values about <strong>the</strong> equal-signs, which are positioned in columns 15 and 35. Ei<strong>the</strong>r<br />
instruction produces:
11.14 TEXTWRITER: Report Writing<br />
Sys<strong>to</strong>lic = 96 Dias<strong>to</strong>lic = 123<br />
Sys<strong>to</strong>lic = 82 Dias<strong>to</strong>lic = 114<br />
1234567890123456789012345678901234567890<br />
If <strong>the</strong> variable name and value does not fit in <strong>the</strong> line when <strong>the</strong> equal-sign is positioned in <strong>the</strong> specified column,<br />
it is placed in <strong>the</strong> next line of text. @NOEQUAL turns off alignment about <strong>the</strong> equal-sign.<br />
A very useful enhancement <strong>to</strong> <strong>the</strong> PUTL control word is <strong>the</strong> .ALL. system variable. This causes all <strong>the</strong> variables<br />
in <strong>the</strong> file <strong>to</strong> be printed. Figure 11.7 shows <strong>the</strong> command and <strong>the</strong> prin<strong>to</strong>ut that results.<br />
__________________________________________________________________________<br />
Figure 11.7 TEXTWRITER: Displaying all <strong>the</strong> Variables<br />
TEXTWRITER Tests [ PUTL @EQUAL=15:40:60 .ALL. @SKIP ] $<br />
SS.Number = 243-24-5007 Last = Wilmot First = James<br />
Vocab = 97 Riding = 90 Tea = 98<br />
Hockey = 78 Car = 71 Beer = 85<br />
Juggling = 64 Affairs = 97 Memo = 93<br />
SS.Number = 311-04-8831 Last = Higgin First = Sheila<br />
Vocab = 96 Riding = 89 Tea = 54<br />
Hockey = 86 Car = 70 Beer = 91<br />
Juggling = 82 Affairs = 96 Memo = 86<br />
__________________________________________________________________________<br />
You will almost always want <strong>to</strong> use @EQUALS with <strong>the</strong> .ALL. PUTL combination. If you do not specify<br />
where <strong>the</strong> equal signs should be placed, <strong>the</strong> variables and values are printed one after ano<strong>the</strong>r with 4 spaces between<br />
<strong>the</strong> value and <strong>the</strong> next variable name. The amount that is printed on each line is determined by <strong>the</strong> current<br />
output width setting up <strong>to</strong> a maximum of 400 characters.<br />
PUTL prints a value with its variable name. PUT prints a value. If you are providing a series of variable<br />
names some of which are <strong>to</strong> have labels and some of which should be printed without labels, you may use <strong>the</strong><br />
@LABEL and @NOLABEL control words. @LABEL in a PUT is <strong>the</strong> equivalent of a PUTL. @NOLABEL in<br />
a PUTL is <strong>the</strong> equivalent of a PUT. The command:<br />
TEXTWRITER Tests [ PUTL Last First @NONAME SS.Number ] $<br />
produces <strong>the</strong> following output.<br />
Last = Wilmot First = James 243-24-5007<br />
Last = Higgin First = Sheila 311-04-8831<br />
11.21 Specifying Missing Characters<br />
The @MISS control word specifies a character or a character string <strong>to</strong> print in place of <strong>the</strong> dash or dashes that<br />
usually print for missing values. @MISS, followed by a character or string in quotes, requests that character or<br />
string print for any of <strong>the</strong> three types of missing values:<br />
[ PUT @MISS
TEXTWRITER: Report Writing 11.15<br />
[ PUT 'The cus<strong>to</strong>mer response was '<br />
@MISS1"don’t know" @MISS2'no answer' @MISS3'refuse <strong>to</strong> answer'<br />
Response4 '.' ]<br />
@M1, @M2 and @M3 are abbreviations for <strong>the</strong> three types of missing control words.<br />
The missing specifications are in effect until <strong>the</strong>y are reset or turned off with @NOMISS. The @NOMISS<br />
control word returns <strong>the</strong> missing character <strong>to</strong> <strong>the</strong> dash, <strong>the</strong> initial default character.<br />
@NOMISS1, @NOMISS2 and @NOMISS3 may be used <strong>to</strong> selectively reset specific missing control words.<br />
__________________________________________________________________________<br />
Figure 11.8 A Complex Report: The Input and Labels Files<br />
File Tests<br />
SS Number Last First Vocab Riding Tea Hockey Car<br />
243-24-5007 Wilmot James 97 90 98 78 71<br />
311-04-8831 Higgin Sheila 96 89 54 86 70<br />
Beer Juggling Affairs Memo<br />
85 64 97 93<br />
91 82 96 86<br />
File Tests.lab<br />
Score (1)superior (2)excellent<br />
(3)above average (4)average<br />
(5)marginal (6)poor /<br />
__________________________________________________________________________<br />
11.22 A Complex Report<br />
This section discusses <strong>the</strong> more complex report shown in Figures 11.8,11.9 and 11.10. More of <strong>the</strong> potential of<br />
<strong>PPL</strong> is illustrated. Figure 11.8 shows a portion of an input file containing scores for various aptitude tests (some<br />
of which are perhaps a little silly). Figure 11.9 is <strong>the</strong> desired output and Figure 11.10 contains <strong>the</strong> command.<br />
In <strong>the</strong> first case, scores on Tea and Vocab and Affairs, which are all above 95, are recoded as a 1. Therefore,<br />
<strong>the</strong>y will be represented in <strong>the</strong> first sentence that is constructed. Riding and Memo, which are recoded as a 2, are<br />
represented in <strong>the</strong> second sentence. Beer is recoded as a 3 and is represented in <strong>the</strong> third sentence. For each case,<br />
<strong>the</strong> contents of each sentence and even <strong>the</strong> number of sentences is different.<br />
The general procedure of <strong>the</strong> instructions is:<br />
1. Generate all of <strong>the</strong> variables and scratch variables that will be used in <strong>the</strong> command. Since none of<br />
<strong>the</strong>m are given initial values <strong>the</strong>y are set <strong>to</strong> missing.<br />
2. Select <strong>the</strong> desired variables and do any necessary recoding. Specify a heading for each report (each<br />
case is a separate report on a new page).
11.16 TEXTWRITER: Report Writing<br />
__________________________________________________________________________<br />
Figure 11.9 A Complex Report: The Report (Two Pages)<br />
James Wilmot (243-24-5007)<br />
Comments on INTELLECTUAL MEASURES<br />
Vocabulary skills, tea pouring and knowledge of current affairs<br />
are in <strong>the</strong> superior range by Company standards. Horseback riding and<br />
memo writing are excellent. Beer drinking is above average. Field<br />
hockey play is average. Car parking is marginal. Juggling is poor.<br />
..............<br />
Sheila Higgin (311-04-8831)<br />
Comments on INTELLECTUAL MEASURES<br />
Vocabulary skills and knowledge of current affairs are in <strong>the</strong><br />
superior range by Company standards. Horseback riding and beer<br />
drinking are excellent. Field hockey play, juggling and memo writing<br />
are above average. Car parking is marginal. Tea pouring is poor.<br />
__________________________________________________________________________<br />
3. Initialize #Sentence <strong>to</strong> 0. Start <strong>the</strong> All.Scores DO LOOP through <strong>the</strong> 6 possible scores with <strong>the</strong><br />
scratch variable #S.<br />
4. Count <strong>the</strong> number of tests that match <strong>the</strong> current value of #S. For example, for <strong>the</strong> first case when<br />
#S = 1, #NV.of.Score = 3; James Wilmot has three values of score 1 (97, 98 and 97 after recoding).<br />
5. Set #Used <strong>to</strong> zero. Start <strong>the</strong> All.Tests DO loop through <strong>the</strong> 9 test scores. When a value of a test<br />
equals #S, increase #Used and start writing.<br />
When processing <strong>the</strong> first case, 97 (<strong>the</strong> value of variable Vocab) is recoded as a 1. The KEEP reorders<br />
<strong>the</strong> file so that Vocab is <strong>the</strong> first variable in <strong>the</strong> file. The first time through <strong>the</strong> All.Scores loop<br />
#S=1. Within that loop, <strong>the</strong> first time through <strong>the</strong> All.Tests loop #J=1, <strong>the</strong> position of Vocab. Since<br />
<strong>the</strong> value of Vocab is a 1 which equals <strong>the</strong> 1 of #S, we have a match for our first sentence.<br />
6. Generate #Name <strong>to</strong> hold <strong>the</strong> desired name for each test. The first variable Vocab is called “vocabulary<br />
skills” in <strong>the</strong> report.<br />
7. If this is <strong>the</strong> first item in a sentence (#Used = 1), capitalize <strong>the</strong> first letter of <strong>the</strong> name.<br />
8. Depending on <strong>the</strong> number of items that are <strong>to</strong> go in <strong>the</strong> sentence (#NV.of.Score) and <strong>the</strong> number already<br />
in (#Used), put <strong>the</strong> name of <strong>the</strong> aptitude test and possibly a comma or <strong>the</strong> word “and”.<br />
9. Put “is” or “are” in <strong>the</strong> sentence, depending on how many items (#Used) have been used in <strong>the</strong> sentence<br />
already.<br />
10. Change <strong>the</strong> text slightly for sentences after <strong>the</strong> first one <strong>to</strong> make <strong>the</strong> report less repetitious. When<br />
<strong>the</strong> sentence for a given score has been written, <strong>the</strong> All.Tests loop is complete. When all <strong>the</strong> 6 scores<br />
have been evaluated, <strong>the</strong> All.Scores loop is complete.
TEXTWRITER: Report Writing 11.17<br />
__________________________________________________________________________<br />
Figure 11.10 A Complex Report: The TEXTWRITER Command<br />
TEXTWRITER Tests<br />
[<br />
[<br />
[<br />
/* 1. Recoding, Beginning Each Case */;<br />
GEN Score;<br />
GEN #Sentence, GEN #NV.of.score, GEN #USED, GEN #Name:C32 ;<br />
KEEP Vocab TO Memo .OTHERS.;<br />
DO #J = 1, 9;<br />
SET V(#J) = RECODE ( V(#J),<br />
0 TO 65 = 6, 65 TO 73 = 5, 73 TO 80 = 4,<br />
80 TO 88 = 3, 88 TO 95 = 2, 95 TO 100 = 1 ) ;<br />
ENDDO;<br />
PUT @PAGE First Last > SS.Number @NEXT ;<br />
PUT @15 @PARA ]<br />
SET #Sentence = 0 ;<br />
/* #Sentence is <strong>the</strong> count of <strong>the</strong> number of sentences<br />
#S controls <strong>the</strong> loop through <strong>the</strong> 6 possible scores<br />
#NV.of.Score is number of variables with a given score */;<br />
DO All.Scores #S = 1, 6;<br />
SET #NV.of.Score = 0 ;<br />
DO #J = 1, 9;<br />
IF V(#J) EQ #S, INC #NV.of.Score ;<br />
ENDDO;<br />
/* No tests had this score, get <strong>the</strong> next one */;<br />
IF #NV.of.Score EQ 0 NEXTDO; ]<br />
/* #Used is number of items used in sentence<br />
#J is <strong>the</strong> position (1-9) of a test variable */;<br />
SET #Used = 0;<br />
DO All.Tests #J = 1, 9;<br />
IF V(#J) NE #S, NEXTDO;<br />
INC #Used;<br />
/* #Name is <strong>the</strong> name of each test item */;<br />
SET #Name = RECODE ( #J,<br />
1 = , 2 = ,
11.18 TEXTWRITER: Report Writing<br />
[<br />
3 = , 4 = ,<br />
5 = , 6 = ,<br />
7 = ,<br />
8 = ,<br />
9 = ) ;<br />
/* Capitalize <strong>the</strong> name of <strong>the</strong> first test item */;<br />
IF #Used EQ 1, SET #Name =<br />
CHANGE ( #Name, 1, 1, UPPER (SUBSTRING (#Name, 1, 1) ) );<br />
/* Use commas and “and” appropriately<br />
<strong>the</strong>n get <strong>the</strong> next test */;<br />
IF #NV.of.Score EQ 1, PUT #Name, NEXTDO;<br />
IF #Used LE #NV.of.Score - 2, PUT #Name ;<br />
IF #Used EQ #NV.of.Score - 1, PUT #Name ;<br />
IF #Used EQ #NV.of.Score, PUT > #Name ;<br />
All.Tests: ENDDO ]<br />
/* Write sentence, set score <strong>to</strong> #S for labelling */;<br />
INC #Sentence ;<br />
/* Use “is” and “are” appropriately */;<br />
IF #Used EQ 1, PUT > ;<br />
IF #Used GT 1, PUT > ;<br />
/* Different text for <strong>the</strong> first sentence only */;<br />
SET Score = #S;<br />
IF #Sentence EQ 1, PUT Score<br />
> ;<br />
IF #Sentence NE 1, PUT Score ;<br />
All.Scores: ENDDO ],<br />
JUSTIFY, WIDTH 60, LABELS 'Tests.lab' $<br />
__________________________________________________________________________<br />
11.23 Control Word Summary<br />
Control words all begin with “@” and many of <strong>the</strong>m are followed by a number giving a column location or o<strong>the</strong>r<br />
value. The number may follow directly after <strong>the</strong> control word (@SKIP2) or after an equal-sign (@SKIP=2). Although<br />
<strong>the</strong> argument directly following <strong>the</strong> control word is typically a number, it may be any expression that<br />
evaluates <strong>to</strong> a numeric value.<br />
The following control words remain in effect throughout <strong>the</strong> processing of a case by <strong>the</strong> TEXTWRITER<br />
command, unless <strong>the</strong>y are specifically changed or turned off:<br />
@INDENT @EQUAL<br />
@WIDTH @MISS<br />
@JUST @COMMAS
TEXTWRITER: Report Writing 11.19<br />
@TRIM @PLACES<br />
@LABEL @FONT1 - @FONT9<br />
Thus, specifying @JUST means that all <strong>the</strong> text specified in this and subsequent PUT clauses will be justified;<br />
specifying @COMMAS means that all numeric values will have commas in <strong>the</strong>m. Prefacing <strong>the</strong> control word with<br />
“NO” (@NOJUST or @NOCOMMAS) turns off a prior setting. It resets <strong>the</strong> setting <strong>to</strong> that initially assumed by<br />
<strong>the</strong> TEXTWRITER command:<br />
[ PUT 'Invoice number ' Inv.No ', for $'<br />
@COMMAS Inv.Amt ', dated ' @NOCOMMAS Inv.Date<br />
' is past due.' ]<br />
Commas are inserted only in values of <strong>the</strong> numeric variable Inv.Amt and not in Inv.No or Inv.Date, which are<br />
character variables.<br />
The following control words do not remain in effect throughout <strong>the</strong> processing of a case. They apply only<br />
<strong>to</strong> <strong>the</strong> variable expression or character string that directly follows:<br />
@nn @PAGE<br />
@PLUS @NEXT<br />
@MINUS @PARA<br />
@BEFORE @SKIP<br />
These control words must be reissued <strong>to</strong> produce <strong>the</strong> desired results again. (The “nn” above represents a positive<br />
whole number.)<br />
The STREAM and CASE identifiers also affect <strong>the</strong> action of control words. Remember that in CASE mode,<br />
text is flushed and a new line is started when processing of a new case is begun. In STREAM mode, text is flushed<br />
only when processing of all cases is complete. CASE is assumed when nei<strong>the</strong>r identifier is used in <strong>the</strong> TEXT-<br />
WRITER command; STREAM must be specified if it is desired.<br />
When STREAM mode is specified, all control words that typically remain in effect, except @INDENT and<br />
@WIDTH that flush and print text, are reset when processing of a new case begins. In CASE mode, all control<br />
words, including @INDENT and @WIDTH, are reset <strong>to</strong> <strong>the</strong>ir initial default values.<br />
Thus, a PUT instruction such as this,<br />
[ PUT 'Student ' ID.No ' has an outstanding bill of $'<br />
@COMMAS @PLACES2 Balance '.' ]<br />
which specifies that <strong>the</strong> variable Balance print with commas and two decimal places, does not need <strong>to</strong> be initialized<br />
for each case. After <strong>the</strong> first student is processed, <strong>the</strong> variable ID.No for <strong>the</strong> second student will not print<br />
with commas and two places. The @COMMAS and @PLACES2 control words are reset <strong>to</strong> <strong>the</strong>ir default values<br />
at <strong>the</strong> start of each case. However, <strong>the</strong>y do remain in effect throughout a case. Thus, if <strong>the</strong> prior PUT instruction<br />
was followed by ano<strong>the</strong>r PUT instruction, any numeric values specified in that instruction would print with commas<br />
and two decimal places. @NOCOMMAS and @NOPLACES would have <strong>to</strong> precede those numeric variables<br />
<strong>to</strong> reset <strong>the</strong>se control words.<br />
11.24 COMPARING TEXTWRITER AND OTHER COMMANDS<br />
Any of <strong>the</strong> control words, except @INDENT, @JUST, @SPREAD, @WIDTH, and <strong>the</strong> PostScript controls such<br />
as @FONT1, may be used in PUT clauses following any command. They are not exclusive <strong>to</strong> <strong>the</strong> TEXTWRITER<br />
command. Thus, brief reports or “dump messages” may be produced as a system file is processed by any command.<br />
However, as with all P-<strong>STAT</strong> commands, <strong>the</strong> TEXTWRITER identifiers can be used only in <strong>the</strong><br />
TEXTWRITER command.<br />
There is a basic difference between TEXTWRITER and o<strong>the</strong>r commands in <strong>the</strong> way text is output. When<br />
TEXTWRITER is used, text prints continuously — a new line is not started unless <strong>the</strong> prior line is full or control
11.20 TEXTWRITER: Report Writing<br />
words specify a new line. Text is flushed and a new line is started only when processing of a new case is begun,<br />
unless STREAM mode is specified. Then text is flushed only when processing of all cases is complete.<br />
When ano<strong>the</strong>r command (not TEXTWRITER) is used, a new line is started and text is written whenever a<br />
PUT clause ends unless <strong>the</strong> final character in <strong>the</strong> clause is an “@” by itself. This causes <strong>the</strong> line <strong>to</strong> be held for<br />
<strong>the</strong> possible addition of more text. When processing of a case is finished, all text is written, regardless of whe<strong>the</strong>r<br />
<strong>the</strong> line was held or not.<br />
The control character “@” is used at <strong>the</strong> end of a PUT instruction <strong>to</strong> hold <strong>the</strong> column pointer in <strong>the</strong> same line:<br />
PROCESS Class102<br />
[ PUT ID <br />
(ROUND (MEAN.GOOD (Test?) ) ) @ ;<br />
IF (COUNT.GOOD (Test?) ) NE 8, PUT ] $<br />
The “@” after <strong>the</strong> period keeps <strong>the</strong> column pointer in <strong>the</strong> same line. If <strong>the</strong>re are not eight good test scores, an<br />
additional text string is put in that line. This report is produced:<br />
1022: Average is 85.<br />
1248: Average is 88. Missing some tests.<br />
This command, without <strong>the</strong> “@”,<br />
produces:<br />
PROCESS Class102<br />
[ PUT ID <br />
(ROUND (MEAN.GOOD (Test?) ) ) ;<br />
IF (COUNT.GOOD (Test?) ) NE 8, PUT ] $<br />
1022: Average is 85.<br />
1248: Average is 88.<br />
Missing some tests.<br />
The text about <strong>the</strong> missing tests appears on a new line. (It would not be necessary <strong>to</strong> use <strong>the</strong> control character “@”<br />
in <strong>the</strong> TEXTWRITER command, which assumes continuous printing of text.<br />
11.25 OPTIONAL IDENTIFIERS: PostScript<br />
The TEXTWRITER command can use any font that is available on your PostScript printer <strong>to</strong> produce cameraready<br />
prin<strong>to</strong>ut. Figure 11.11 shows <strong>the</strong> output that results when PostScript controls are added <strong>to</strong> <strong>the</strong> complex report<br />
described in Figures 11.8, 11.9, and 11.10. The controls for <strong>the</strong> page and paragraph are changed <strong>to</strong> request<br />
fonts:<br />
PUT @PAGE @FONT2<br />
First Last > SS.Number @NEXT @FONT1 ;<br />
PUT @15 @PARA @FONT3; ]<br />
The following PostScript identifiers are added before <strong>the</strong> final “$”. LEFT.EDGE, which uses inches, replaces<br />
WIDTH 70, which uses number of characters, <strong>to</strong> determine <strong>the</strong> width of <strong>the</strong> prin<strong>to</strong>ut.<br />
POSTSCRIPT, PORTRAIT, FONT1 TIMES ROMAN BOLD 14,<br />
FONT2 TIMES ROMAN BOLD 12, FONT3 ARIAL 12,<br />
FONT4 TIMES ROMAN BOLDITALIC 12, LEFT.EDGE 2., PR 'Test.ps' $<br />
The identifier POSTSCRIPT causes <strong>the</strong> initial control codes that PostScript requires <strong>to</strong> be written <strong>to</strong> <strong>the</strong> output<br />
file. You should always use a PR identifier as PostScript output written <strong>to</strong> a non-PostScript device such as <strong>the</strong><br />
terminal prints <strong>the</strong> control words ra<strong>the</strong>r than implementing <strong>the</strong>m. Usually one of <strong>the</strong> two identifiers PORTRAIT
TEXTWRITER: Report Writing 11.21<br />
or LANDSCAPE is used. PORTRAIT is used when <strong>the</strong> prin<strong>to</strong>ut is going <strong>to</strong> paper that is 8.5 nches wide and 11<br />
inches high. LANDSCAPE, <strong>the</strong> assumed orientation, is used for output that is 11 inches wide by 8.5 inches high.<br />
If you have paper that is a different size, <strong>the</strong> P-<strong>STAT</strong> POSTSCRIPT.SETUP command can be used <strong>to</strong> set <strong>the</strong> paper<br />
size. POSTSCRIPT.SETUP can also be used <strong>to</strong> set fonts and margins.<br />
The area that is <strong>to</strong> be used on <strong>the</strong> paper can be controlled by using <strong>the</strong> TOP.EDGE. BOTTOM.EDGE,<br />
LEFT.EDGE and RIGHT.EDGE identifiers. The arguments are given in inches. 1 inch margins are assumed for<br />
any edge that is not supplied. In Figures 11.1 through 11.4, <strong>the</strong> identifiers used <strong>to</strong> make <strong>the</strong> printed output fit nicely<br />
are:<br />
TOP.EDGE .5, BOTTOM.EDGE 5.5, LEFT.EDGE 1.5, RIGHT.EDGE 1.5<br />
11.26 PostScript Page Changes<br />
The assumption is that each time you use <strong>the</strong> @PAGE control word, <strong>the</strong> PostScript page will be sent <strong>to</strong> <strong>the</strong> printer.<br />
This setting is controlled by <strong>the</strong> SHOWPAGE identifier. However, using PostScript is more like drawing on a<br />
slate than writing lines on a page. It is possible <strong>to</strong> move around <strong>the</strong> page, overwrite, draw lines or print text. In<br />
<strong>the</strong> P-<strong>STAT</strong> implementation any command that has PostScript support can be combined on a single page with any<br />
o<strong>the</strong>r such command.<br />
When NO SHOWPAGE is used, a page is not sent <strong>to</strong> <strong>the</strong> printer until a subsequent command uses <strong>the</strong> SHOW-<br />
PAGE identifier. SHOWPAGE is assumed unless NO SHOWPAGE is used. An au<strong>to</strong>matic page change occurs<br />
when a block of text extends beyond <strong>the</strong> defined bot<strong>to</strong>m of <strong>the</strong> page unless NO SHOWPAGE is in effect.<br />
__________________________________________________________________________<br />
Figure 11.11 PostScript Output<br />
James Wilmot (243-24-5007)<br />
Comments on INTELLECTUAL MEASURES<br />
Vocabulary skills, tea pouring and knowledge of current affairs are in<br />
<strong>the</strong> superior range by Company standards. Horseback riding and memo<br />
writing are excellent. Beer drinking is above average. Field hockey play is<br />
average. Car parking is marginal. Juggling is poor.<br />
Sheila Higgin (311-04-8831)<br />
Comments on INTELLECTUAL MEASURES<br />
Vocabulary skills and knowledge of current affairs are in <strong>the</strong> superior<br />
range by Company standards. Horseback riding and beer drinking are<br />
excellent. Field hockey play, juggling and memo writing are above<br />
average. Car parking is marginal. Tea pouring is poor.<br />
__________________________________________________________________________<br />
11.27 Setting <strong>the</strong> Fonts<br />
The identifiers that are used <strong>to</strong> set <strong>the</strong> fonts are: FONT and FONT1 through FONT9. When FONT is used by<br />
itself it sets all 9 of <strong>the</strong> available fonts <strong>to</strong> <strong>the</strong> supplied setting. If FONT is not supplied <strong>the</strong> assumed font is Times-<br />
Roman. The assumed pointsize depends on <strong>the</strong> combination of orientation (LANDSCAPE or PORTRAIT) and
11.22 TEXTWRITER: Report Writing<br />
<strong>the</strong> output width. In PORTRAIT orientation with an output width less than 80, <strong>the</strong> default pointsize is 10. In<br />
LANDSCAPE orientation with a width that is greater than 80, <strong>the</strong> pointsize is set <strong>to</strong> 8.<br />
Font names must be correctly spelled. Several of <strong>the</strong> more common font names are available as keywords so<br />
that you need not remember <strong>the</strong> exact form (upper and lower case and hyphenation). The available combinations<br />
are:<br />
TIMES HELVETICA COURIER<br />
TIMES BOLD HELVETICA BOLD COURIER BOLD<br />
TIMES ITALIC HELVETICA OBLIQUE COURIER OBLIQUE<br />
TIMES BOLDITALIC HELVETICA BOLDOBLIQUE COURIER BOLDOBLIQUE<br />
These must be preceded by one of <strong>the</strong> FONT identifiers and optionally followed by <strong>the</strong> desired pointsize.<br />
FONT1 HELVETICA 10, FONT3 TIMES ITALIC, FONT4 COURIER,<br />
Helvetica and Times are proportional fonts. In a proportional font each letter takes up an appropriate amount of<br />
space so that an i is not as wide as a W. Courier is a monospace font and each letter takes up <strong>the</strong> same amount of<br />
room.<br />
You may use any font that is available on your laser printer. However, if it is not in <strong>the</strong> list of keywords, it<br />
must be enclosed in quotes. For example:<br />
FONT9 'ZapfChancery-MediumItalic' 10,<br />
Fonts can be only be defined with <strong>the</strong> TEXTWRITER identifiers or in a previous POSTSCRIPT or POST-<br />
SCRIPT.SETUP command. Their usage in <strong>the</strong> textwriter output is done by using TEXTWRITER control words.<br />
11.28 TEXTWRITER Control Words: The Fonts<br />
The font control words are any of @FONT1 through @FONT9. The font change takes effect immediately and<br />
remains in effect until <strong>the</strong> next font control word is specified. If a number of fonts have been specified but no font<br />
control word is used, FONT1 is assumed.<br />
The output in Figures 11.1 <strong>to</strong> 11.4 use four fonts defined by identifiers in <strong>the</strong> command:<br />
FONT1 TIMES BOLD 14, FONT2 TIMES BOLD 12,<br />
FONT3 ARIAL 12, FONT4 TIMES BOLDITALIC 12<br />
Within <strong>the</strong> TEXTWRITER <strong>PPL</strong>, <strong>the</strong> control words @FONT1, @FONT2, @FONT3 and @FONT4 are used <strong>to</strong><br />
specify which of <strong>the</strong> defined fonts <strong>to</strong> use for a particular piece of text.<br />
In Figure 11.11 <strong>the</strong> 2 paragraphs are in a regular Arial font while <strong>the</strong> two heading lines are in Times Bold 12<br />
and Times Bold 14. Justific1ation is not requested for Figure 11.11. In Figure 11.12 <strong>the</strong> text in <strong>the</strong> paragraphs is<br />
right/left justified. This is done by adding “, JUSTIFY” <strong>to</strong> <strong>the</strong> end of <strong>the</strong> TEXTWRITER command.<br />
In Figure 11.13 <strong>the</strong> paragraphs are not justified but have font changes in <strong>the</strong> middle of <strong>the</strong> paragraph. A<br />
change was made in <strong>the</strong> TEXTWRITER command <strong>to</strong> isolate <strong>the</strong> variable “Score” so that it could be printed in a<br />
bold italic font:<br />
SET Score = #S;<br />
IF #Sentence EQ 1, PUT ;<br />
PUT @FONT4; PUT Score;<br />
IF #Sentence NE 1, PUT ;<br />
PUT @FONT3;<br />
IF #Sentence EQ 1, PUT >;<br />
__________________________________________________________________________
TEXTWRITER: Report Writing 11.23<br />
Figure 11.12 Justification in PostScript Text<br />
James Wilmot (243-24-5007)<br />
Comments on INTELLECTUAL MEASURES<br />
Vocabulary skills, tea pouring and knowledge of current affairs are in<br />
<strong>the</strong> superior range by Company standards. Horseback riding and memo<br />
writing are excellent. Beer drinking is above average. Field hockey play is<br />
average. Car parking is marginal. Juggling is poor.<br />
Sheila Higgin (311-04-8831)<br />
Comments on INTELLECTUAL MEASURES<br />
Vocabulary skills and knowledge of current affairs are in <strong>the</strong> superior<br />
range by Company standards. Horseback riding and beer drinking are<br />
excellent. Field hockey play, juggling and memo writing are above<br />
average. Car parking is marginal. Tea pouring is poor.<br />
__________________________________________________________________________<br />
__________________________________________________________________________<br />
Figure 11.13 Changing Fonts Text in a PostScript Paragraph<br />
James Wilmot (243-24-5007)<br />
Comments on INTELLECTUAL MEASURES<br />
Vocabulary skills, tea pouring and knowledge of current affairs are in<br />
<strong>the</strong> superior range by Company standards. Horseback riding and memo<br />
writing are excellent. Beer drinking is above average. Field hockey play<br />
is average. Car parking is marginal. Juggling is poor.<br />
Sheila Higgin (311-04-8831)<br />
Comments on INTELLECTUAL MEASURES<br />
Vocabulary skills and knowledge of current affairs are in <strong>the</strong> superior<br />
range by Company standards. Horseback riding and beer drinking<br />
are excellent. Field hockey play, juggling and memo writing are above<br />
average. Car parking is marginal. Tea pouring is poor.<br />
__________________________________________________________________________
11.24 TEXTWRITER: Report Writing<br />
In Figure11.14, <strong>the</strong> text of <strong>the</strong> paragraphs is both justified and has font changes in <strong>the</strong> middle of <strong>the</strong> text. As<br />
you can see <strong>the</strong> spacing is not as good as it is in Figure 11.12. This is because any font change, color change or<br />
underline causes a flush. The program does <strong>the</strong> justification by estimating what might come next. This usually<br />
results in somewhat more space between <strong>the</strong> words. Text with many font changes may result in somewhat less<br />
attractive results than text without intermediate font changes. Text with many long words will also tend <strong>to</strong> have<br />
less attractive results when justified.<br />
__________________________________________________________________________<br />
Figure 11.14 Font Changes in a Justified PostScript Paragraph<br />
James Wilmot (243-24-5007<br />
Comments on INTELLECTUAL MEASURE<br />
Vocabulary skills, tea pouring and knowledge of current affairs are in<br />
<strong>the</strong> superior range by Company standards. Horseback riding and memo<br />
writing are excellent. Beer drinking is above average. Field hockey play<br />
is average. Car parking is marginal. Juggling is poor.<br />
Sheila Higgin (311-04-8831<br />
Comments on INTELLECTUAL MEASURE<br />
Vocabulary skills and knowledge of current affairs are in <strong>the</strong> superior<br />
range by Company standards. Horseback riding and beer drinking<br />
are excellent. Field hockey play, juggling and memo writing are above<br />
average. Car parking is marginal. Tea pouring is poor.<br />
11.29 Control Words: Positioning <strong>the</strong> Text<br />
There are two types of text <strong>to</strong> consider:<br />
1. text that spans multiple lines like <strong>the</strong> paragraphs in <strong>the</strong> previous figures and<br />
2. tables or short pieces of text which need <strong>to</strong> be positioned at particular places.<br />
When PostScript is in effect, <strong>the</strong> control words such as @ and @BEFORE do not work well with proportional<br />
fonts. To account for <strong>the</strong> effects of proportional fonts and <strong>the</strong> fact that a PostScript page is not written from <strong>to</strong>p<br />
<strong>to</strong> bot<strong>to</strong>m but “drawn” on <strong>the</strong> page, <strong>the</strong>re are many TEXTWRITER control words specifically for use with<br />
PostScript.<br />
The following control words use inch measurements <strong>to</strong> specify where <strong>the</strong> string or number that follows is <strong>to</strong><br />
be placed and how it is <strong>to</strong> be placed in relation <strong>to</strong> that location.<br />
3. @CINCH=nncenters <strong>the</strong> text that follows at inch location nn. The text ends with <strong>the</strong> next control<br />
word of a type that causes a buffer flush such as @NEXT, @FLUSH, or ano<strong>the</strong>r<br />
@CINCH type. Suppose var1 equals 111.222 .<br />
@CINCH=3.1 @PLACES2 var1 @SKIP2;<br />
produces “variable one = 111.22” and puts is so that its middle point is 3.1<br />
inches in<strong>to</strong> <strong>the</strong> current line.
TEXTWRITER: Report Writing 11.25<br />
Note: @PLACES is a textwriter control word that does not cause a flush of <strong>the</strong><br />
current text buffer. Since @SKIP flushes <strong>the</strong> text buffer, it ends <strong>the</strong> CINCH.<br />
4. @CINCH.U=nncenters and also underlines <strong>the</strong> string at inch nn.<br />
5. @RINCH=nnputs <strong>the</strong> textwriter text right justified at <strong>the</strong> specified location. This works well for<br />
numbers if <strong>the</strong> number of decimal places is controlled.<br />
6. @RINCH.U=nnright justifies and underlines <strong>the</strong> text.<br />
7. @LINCH=nn left justifies text at <strong>the</strong> specified location.<br />
8. @LINCH.U=nnleft justifies and underlines <strong>the</strong> text<br />
__________________________________________________________________________<br />
Figure 11.15 TEXTWRITER: Tabular Ouput with PostScript<br />
LIST numbers $<br />
Var1 Var2 Var3<br />
123.11 168.50 568.12<br />
12.239 45.67 33.20<br />
123.45 211.99 444.44<br />
TEXTWRITER numbers<br />
[ IF FIRST ( .FILE. ) THEN;<br />
PUT @LEADING=2 @Y1 @NEXT<br />
@CINCH.U=2.5 <br />
@RINCH.U=4.7 <br />
@LINCH.U=5.7 @SKIP=2;<br />
ENDIF;<br />
PUT @NOPLACES;<br />
IF var1 GE 35 THEN;<br />
PUT @PINCH=2.5 var1; ELSE; PUT @PINCH.U=2.5 var1; ENDIF;<br />
PUT @PLACES2;<br />
IF var2 GE 35 THEN;<br />
PUT @RINCH=4.5 var2; ELSE; PUT @RINCH.U=4.5 var2; ENDIF;<br />
IF var3 GE 35 THEN;<br />
PUT @LINCH=5.7 var3; ELSE; PUT @LINCH.U=5.7 var3; ENDIF;<br />
PUT @NEXT;<br />
IF LAST ( .FILE. ) THEN;<br />
PUT @NEXT @Y2 @LINEWIDTH=1.5 @DRAW.BOX @LINEWIDTH;<br />
PUT @X1=3.4 @DRAW.V @X1=5.2 @DRAW.V;<br />
ENDIF; ],<br />
POSTSCRIPT, PORTRAIT,<br />
LEFT.EDGE .5, RIGHT.EDGE .5, PR number.ps $<br />
__________________________________________________________________________<br />
9. @PINCH=nncenters <strong>the</strong> text around a specified lineup character, which is assumed <strong>to</strong> be a decimal<br />
point. This is good for writing a column of fractional numbers when <strong>the</strong> number<br />
of decimal places differs.
11.26 TEXTWRITER: Report Writing<br />
10. @PINCH.U=nnlike PINCH but it also underlines.<br />
11. @PINCH.CHAR='c' provides an alternate character such as '=' <strong>to</strong> be used in <strong>the</strong> pinch lineups. If no<br />
argument, it reverts <strong>to</strong> <strong>the</strong> default '.' .<br />
12. @FLUSH flushes <strong>the</strong> current textwriter buffer without also moving <strong>to</strong> <strong>the</strong> next line. The effect<br />
of flushing turns off temporary options like @CINCH or @UNDERLINE. It<br />
does not affect color settings.<br />
__________________________________________________________________________<br />
Figure 11.16 PostScript: Tables with Proportional Fonts<br />
column one column two column three<br />
123.1100 168.50 568.12<br />
12.23900 45.67 33.20<br />
123.4500 211.99 444.44<br />
__________________________________________________________________________<br />
If you wish <strong>to</strong> draw lines or boxes it is useful <strong>to</strong> he able <strong>to</strong> specify <strong>the</strong> coordinates which define <strong>the</strong> drawing<br />
area. The following are control words that make it easy <strong>to</strong> move <strong>the</strong> current location on <strong>the</strong> page <strong>to</strong> resume printing<br />
at that location or for line and box drawing:<br />
1. @X1 s<strong>to</strong>res a value that is <strong>the</strong> current left margin. If <strong>the</strong>re is an argument as @X1=3.4, that<br />
inch location is s<strong>to</strong>red. in <strong>the</strong> X1 variable.<br />
2. @X2 s<strong>to</strong>res a value that is <strong>the</strong> current right margin. An argument such as @X2=8.3 s<strong>to</strong>res<br />
that inch value in <strong>the</strong> X2 variable.<br />
3. @Y1 s<strong>to</strong>res a value that is <strong>the</strong> current <strong>to</strong>p margin. If an argument is supplied, that value is<br />
s<strong>to</strong>red in <strong>the</strong> Y1 variable.<br />
4. @Y2 s<strong>to</strong>res a value that is <strong>the</strong> current bot<strong>to</strong>m margin. If no argument is supplied, that argument<br />
is s<strong>to</strong>red in <strong>the</strong> Y2 variable.<br />
5. @MOVETO sets <strong>the</strong> current location <strong>to</strong> <strong>the</strong> X1/Y1 position. The next text string will begin at that<br />
location.<br />
6. @DRAW.H draws a horizontal line from <strong>the</strong> X1/Y1 position <strong>to</strong> <strong>the</strong> X2/Y1 position.<br />
7. @DRAW.V draws a vertical line from <strong>the</strong> X1/Y1 position <strong>to</strong> <strong>the</strong> X1/Y2 position.<br />
8. @DRAW.U=nnunderlines <strong>the</strong> current line from X1 <strong>to</strong> X2. nn is <strong>the</strong> amount below <strong>the</strong> current position<br />
where <strong>the</strong> underline should be drawn. The argument, nn, is in units of 1/<br />
72 of an inch. If no argument is given, <strong>the</strong> assumed value is 3.<br />
9. @DRAW.BOXdraws a box using X1/Y1 as <strong>the</strong> upper left coordinate and X2/Y2 as <strong>the</strong> lower right<br />
coordinate.
TEXTWRITER: Report Writing 11.27<br />
Yet ano<strong>the</strong>r group of control words which determine location include:<br />
1. @DOWN=nnmoves down that many lines. The actual distance depends on <strong>the</strong> point size of <strong>the</strong> font<br />
and <strong>the</strong> leading (<strong>the</strong> space between <strong>the</strong> lines). If nn is not specified, 1 is<br />
assumed.<br />
2. @UP=nn moves up that many lines. The actual distance depends on <strong>the</strong> point size and <strong>the</strong> leading.<br />
If nn is not specified, 1 is assumed.<br />
3. @TOP moves <strong>to</strong> <strong>the</strong> first line, just below <strong>the</strong> <strong>to</strong>p margin.<br />
4. @BOTTOM moves <strong>to</strong> <strong>the</strong> last line, just above <strong>the</strong> bot<strong>to</strong>m margin.<br />
5. @LEADING=nnspecifies <strong>the</strong> space between lines. LEADING is usually set <strong>to</strong> 1/72 of an inch.<br />
LEADING=3, increases <strong>the</strong> space <strong>to</strong> 3/72 of an inch. A larger LEADING improves<br />
<strong>the</strong> readability of text when a large point size is used.<br />
6. @LINEWIDTH=nn specifies <strong>the</strong> width of <strong>the</strong> lines and boxes that are drawn. LINEWIDTH is usually<br />
set at .5. This measurement is units of 1/72 of an inch. <strong>Inc</strong>reasing <strong>the</strong><br />
LINEWIDTH causes bolder looking lines. @LINEWITH with no argument resets<br />
it <strong>to</strong> <strong>the</strong> original value of .5. LINEWIDTH=36 would provide a border 1/2<br />
inch wide.<br />
Figure 11.15 contains <strong>the</strong> TEXTWRITER command which creates <strong>the</strong> table in Figure 11.16. Each of <strong>the</strong><br />
three columns is formatted differently <strong>to</strong> show <strong>the</strong> effect of left, right and center justification. The headings are<br />
both properly justified and underlined. The LEADING is increased so that <strong>the</strong>re is more room between lines for<br />
<strong>the</strong> underlining.<br />
PUT @LEADING=2 @Y1 @NEXT<br />
@CINCH.U=2.5 <br />
@RINCH.U=4.7 <br />
@LINCH.U=5.7 <br />
The use of @Y1 s<strong>to</strong>res <strong>the</strong> location of <strong>the</strong> line before <strong>the</strong> headings. This value is needed later when we draw<br />
<strong>the</strong> box around <strong>the</strong> table. The first column is of particular interest because <strong>the</strong> numbers are placed so that <strong>the</strong> decimal<br />
point is located 2.5 inches from <strong>the</strong> left edge of <strong>the</strong> paper. This location may or may not be <strong>the</strong> actual center<br />
of <strong>the</strong> number.<br />
IF var1 GE 35 THEN;<br />
PUT @PINCH=2.5 var1; ELSE; PUT @PINCH.U=2.5 var1; ENDIF;<br />
This IF statement tests <strong>the</strong> value of Var1. If <strong>the</strong> value is relatively large, <strong>the</strong> @PINCH control word is used<br />
<strong>to</strong> locate <strong>the</strong> value so that <strong>the</strong> decimal point falls 2.5 inches from <strong>the</strong> left edge of <strong>the</strong> paper. If <strong>the</strong> value is small,<br />
PINCH.U places <strong>the</strong> value and underlines it. PINCH and PINCH.U are very useful when a column of numbers<br />
contains values which have different numbers of decimal places. PINCH and PINCH.U can be used <strong>to</strong> line strings<br />
up around a character o<strong>the</strong>r than <strong>the</strong> decimal point; For example:<br />
PUT @PINCH.CHAR=”=”<br />
causes <strong>the</strong> equal sign <strong>to</strong> be used in determining <strong>the</strong> location of <strong>the</strong> text that follows.<br />
The second column in Figure 11.16 is right justified 4.5 inches from <strong>the</strong> left edge of <strong>the</strong> page. This works<br />
well when all <strong>the</strong> numbers have <strong>the</strong> same number of decimal places. The third column illustrates that left justification<br />
of a column of numbers is seldom satisfac<strong>to</strong>ry.<br />
11.30 Indenting Text<br />
The identifiers are used <strong>to</strong> set <strong>the</strong> initial margins for <strong>the</strong> page. The right and left margins can be adjusted by using<br />
@L.MARGIN and @R.MARGIN <strong>to</strong> supply an indent value. This value is an offset <strong>to</strong> <strong>the</strong> existing margins.<br />
@L.MARGIN=.5 @R.MARGIN=.5
11.28 TEXTWRITER: Report Writing<br />
This provides a half inch indent on each side of <strong>the</strong> page. Figure 11.17 illustrates <strong>the</strong> command and <strong>the</strong> resulting<br />
output. @L.MARGIN is used <strong>to</strong> create a hanging indent with text that explains both @L.MARGIN and<br />
@R.MARGIN. Note: a postive number as <strong>the</strong> argument moves <strong>the</strong> margin <strong>to</strong>wards <strong>the</strong> center of <strong>the</strong> page. A<br />
negative number moves <strong>the</strong> margin <strong>to</strong>wards <strong>the</strong> edge of <strong>the</strong> page.<br />
__________________________________________________________________________<br />
Figure 11.17 Indenting <strong>the</strong> Text<br />
TEXTWRITER Work [ Case 1;<br />
PUT @L.MARGIN=1.5;<br />
PUT ;<br />
PUT @SKIP @L.MARGIN @L.MARGIN=1.5;<br />
PUT ; ],<br />
POSTSCRIPT, PORTRAIT, PR marg.ps,<br />
FONT1 TIMES 12, LEFT.EDGE 1.25, RIGHT.EDGE 1.25$<br />
__________________________________________________________________________<br />
11.31 Colors in PostScript Output<br />
The assumption is that postscript output will be black on white. The black can be changed <strong>to</strong> any of red, orange,<br />
yellow, green, blue or violet. The control words are @RED, @ORANGE, @YELLOW, @GREEN, @BLUE,<br />
@VIOLET, @BLACK and @NOCOLOR. @NOCOLOR reverts <strong>to</strong> black. The change in color flushes any text<br />
that has preceded it but not yet been placed on <strong>the</strong> page.<br />
If you wish more control over <strong>the</strong> colors, you can use <strong>the</strong> POSTSCRIPT.SETUP command <strong>to</strong> assign colors<br />
<strong>to</strong> specific fonts using 3 numeric values for <strong>the</strong> amount of red, green and blue <strong>to</strong> be used. For example:<br />
POSTSCRIPT.SETUP, FONT4 HELVETICA, COLOR FONT4 .2 .6 .1<br />
Flushing makes a difference when justification is being done and <strong>the</strong> section of text is not yet complete. Justification<br />
is done by adding a tiny bit <strong>to</strong> <strong>the</strong> spaces between each word. To figure out how much <strong>to</strong> add, it is<br />
necessary <strong>to</strong> know <strong>the</strong> length of <strong>the</strong> text in <strong>the</strong> current font. The difference between <strong>the</strong> length of <strong>the</strong> text and <strong>the</strong><br />
available line width is divided by <strong>the</strong> number of words in <strong>the</strong> line. This is <strong>the</strong> amount used <strong>to</strong> pad <strong>the</strong> spaces.<br />
When a flush occurs in <strong>the</strong> middle, <strong>the</strong> <strong>to</strong>tal length of <strong>the</strong> text is not known so <strong>the</strong> amount must be estimated.<br />
It is this estimation which causes some of <strong>the</strong> justified lines <strong>to</strong> have more space between words that you expect<br />
looking at <strong>the</strong> text. When <strong>the</strong> lines are not justified, <strong>the</strong> flushing does not affect <strong>the</strong> spacing within <strong>the</strong> paragraphs<br />
even when <strong>the</strong>re are font and color changes.<br />
NOTE: Because <strong>the</strong> postscript commands do not have <strong>the</strong> actual font tables available, <strong>the</strong> spacing is based on<br />
estimates. The use of capitalized words in a justified line may cause overprinting. The flushing of <strong>the</strong> line that<br />
occurs when font changes are made can also contribute <strong>to</strong> imperfections in <strong>the</strong> spacing. The leading between lines
TEXTWRITER: Report Writing 11.29<br />
is also based on <strong>the</strong> pointsize of <strong>the</strong> fonts. Changing font sizes in <strong>the</strong> middle of paragraphs will also cause <strong>the</strong><br />
leading <strong>to</strong> change.<br />
11.32 Underlining Text<br />
It is easy <strong>to</strong> underline items in tables when using <strong>the</strong> @CINCH, @LINCH, @RINCH and @PINCH control<br />
words. Each of <strong>the</strong>se has an underline format which is <strong>the</strong> control word followed by “.U” <strong>to</strong> indicate underlining.<br />
To underline text in <strong>the</strong> middle of a sentence it is necessary <strong>to</strong> indicate where <strong>the</strong> underlining starts and where it<br />
ends. This is done with @UNDERLINE and @NOUNDERLINE.<br />
Figure 11.18 illustrates <strong>the</strong> output if @UNDERLINE and @NOUNDERLINE are added <strong>to</strong> <strong>the</strong> font changes<br />
for variable Score.<br />
PUT @FONT4 @UNDERLINE;<br />
PUT Score; IF Sentence NE 1, PUT ;<br />
PUT @FONT2 @NOUNDERLINE ;<br />
The code which figures out where <strong>to</strong> break up a line looks for blanks or <strong>the</strong> end of a chunk of text. In <strong>the</strong> example<br />
above if <strong>the</strong> decimal point that ends <strong>the</strong> sentence is separated from <strong>the</strong> word that precedes it by a font, color or<br />
underline control word, it may well end up by itself on <strong>the</strong> next line.<br />
There are three ways <strong>to</strong> emphasize important text.<br />
1. Change <strong>the</strong> font <strong>to</strong> an italic or bold typeface<br />
2. Change <strong>the</strong> color of <strong>the</strong> text.<br />
3. Underline <strong>the</strong> text.<br />
These methods are not exclusive and you may if you wish use all three <strong>to</strong> produce for example text that is red,<br />
italic and underlined.<br />
PUT @FONT6 @RED @UNDERLINE <br />
@FONT1 @BLACK @NOUNDERLINE ;<br />
@UNDERLINE and <strong>the</strong> “.U” control words all underline from <strong>the</strong> start of <strong>the</strong> designated string <strong>to</strong> <strong>the</strong> end of<br />
that string. With @UNDERLINE that can be many lines down a page. Underline ends when @NOUNDERLINE<br />
or an @NEXT, @SKIP or @PARA control word is encountered. It is not affected by a color or font change.<br />
The @DRAW.U control word underlines from one specific location on a line <strong>to</strong> ano<strong>the</strong>r specific location on<br />
that line. The position of that underline is usually 3/72 of an inch below <strong>the</strong> current line unless an argument is<br />
provided <strong>to</strong> provide a different distance. The start and end of <strong>the</strong> underline are determined by <strong>the</strong> values of @X1<br />
and @X2 which are initially set <strong>to</strong> <strong>the</strong> left and right edge values.<br />
@X1=3.5 @X2=6 @DRAW.U=5<br />
draws a line 2.5 inches long beginning 3.5 inches from <strong>the</strong> left edge of <strong>the</strong> paper and 5/72 of an inch below <strong>the</strong><br />
current line.
11.30 TEXTWRITER: Report Writing<br />
__________________________________________________________________________<br />
Figure 11.18 Underlining <strong>the</strong> Text<br />
James Wilmot (243-24-5007)<br />
Comments on INTELLECTUAL MEASURES<br />
Vocabulary skills, tea pouring and knowledge of current affairs are in<br />
<strong>the</strong> superior range by Company standards. Horseback riding and memo<br />
writing are excellent. Beer drinking is above average. Field hockey play<br />
is average. Car parking is marginal. Juggling is poor.<br />
Sheila Higgin (311-04-8831)<br />
Comments on INTELLECTUAL MEASURES<br />
Vocabulary skills and knowledge of current affairs are in <strong>the</strong> superior<br />
range by Company standards. Horseback riding and beer drinking<br />
are excellent. Field hockey play, juggling and memo writing are above<br />
average. Car parking is marginal. Tea pouring is poor.
TEXTWRITER: Report Writing 11.31<br />
TEXTWRITER<br />
Required:<br />
TEXTWRITER Invoices<br />
SUMMARY<br />
[ IF FIRST (.FILE.),<br />
PUT @PAGE .DATE. @SKIP1 ;<br />
IF Date.Paid MISSING AND<br />
DAYS (.NDATE., 'YYYYMMDD') -<br />
DAYS (Date.Invoice, 'YYYYMMDD') GT 30,<br />
PUT @NEXT Inv.Number ><br />
Amount.Due Date.Invoice<br />
Company<br />
> Phone ],<br />
JUSTIFY, WIDTH 60 $<br />
The TEXTWRITER command produces textual reports about <strong>the</strong> data in a P-<strong>STAT</strong> system file. TEXT-<br />
WRITER uses <strong>the</strong> PUT instruction <strong>to</strong> place text strings, <strong>the</strong> values of variables, scratch variables and<br />
system variables, and <strong>the</strong> evaluations of expressions in <strong>the</strong> report. Character strings are enclosed in<br />
quotes or between <strong>the</strong> directional signs “ > ”:<br />
[ IF FIRST (.FILE.),<br />
PUT @PAGE .DATE. @SKIP1 ;<br />
The PUTL instruction may also be used <strong>to</strong> put variable labels in <strong>the</strong> text. O<strong>the</strong>r <strong>PPL</strong> instructions, functions<br />
and opera<strong>to</strong>rs may be used for logical testing, recoding, calculations and o<strong>the</strong>r tasks. The placement<br />
of text is controlled using <strong>the</strong> "@" symbol and control words.<br />
Text is not right justified unless <strong>the</strong> identifier JUSTIFY or <strong>the</strong> control word @JUST specifies right justification.<br />
The WIDTH identifier or <strong>the</strong> control word @WIDTH specifies an output width o<strong>the</strong>r than <strong>the</strong><br />
current one.<br />
The previous TEXTWRITER command produces output in <strong>the</strong> following form:<br />
Outstanding Invoices as of Apr 16, 1995<br />
Invoice Number 1260 for $212.55, dated 950211, is past due.<br />
Please call Smith, Jakes & Row at 215-356-7000.<br />
TEXTWRITER fn<br />
specifies <strong>the</strong> name of <strong>the</strong> required input file. The filename is followed directly by P-<strong>STAT</strong> <strong>Programming</strong><br />
<strong>Language</strong> (<strong>PPL</strong>) clauses. (No comma follows <strong>the</strong> filename.)<br />
fn=file name nn=number cs=character string arg=keyword argument
11.32 TEXTWRITER: Report Writing<br />
Optional Identifiers:<br />
BLANKS nn<br />
CASE<br />
JUSTIFY<br />
gives <strong>the</strong> maximum number of blanks that may come between any two words after line justification.<br />
TEXTWRITER inserts additional blanks after certain punctuation characters and between words, if necessary,<br />
<strong>to</strong> justify <strong>the</strong> line of text. The default setting of BLANKS is four. A smaller number may be<br />
specified, but justification may be affected.<br />
specifies that text be flushed and printed at <strong>the</strong> conclusion of processing of each case, and that a new line<br />
be started at <strong>the</strong> start of processing of <strong>the</strong> next case. All control words are reset at <strong>the</strong> start of each case.<br />
This is <strong>the</strong> assumed mode.<br />
requests that <strong>the</strong> text be right-justified as well as left-justified; that is, <strong>the</strong> lines of text align on <strong>the</strong> right<br />
edge as well as <strong>the</strong> left. When JUSTIFY is not specified, only <strong>the</strong> left edge of <strong>the</strong> report is aligned. The<br />
control words @JUST and @NOJUST may be used in <strong>the</strong> <strong>PPL</strong> clauses <strong>to</strong> override <strong>the</strong> current justification<br />
setting.<br />
LABELS fn<br />
provides <strong>the</strong> name of a labels file. If a value in <strong>the</strong> prin<strong>to</strong>ut belongs <strong>to</strong> a numeric variable which is represented<br />
in <strong>the</strong> labels file, <strong>the</strong> text for that value is used instead of <strong>the</strong> number. Extended variable labels<br />
in <strong>the</strong> labels file are ignored.<br />
LEADBLANK and NO LEADBLANK<br />
each line of text is usually started with an initial blank. This is used as a carriage control character. If<br />
<strong>the</strong> output is not going <strong>to</strong> a printer you may use <strong>the</strong> NO LEADBLANK identifier <strong>to</strong> remove this extra<br />
blank. LEADBLANK is <strong>the</strong> assumed setting.<br />
MARGIN nn<br />
specifies <strong>the</strong> number of columns <strong>to</strong> indent text from <strong>the</strong> left margin. MARGIN 0 is assumed when <strong>the</strong><br />
MARGIN identifier is not used. Within <strong>PPL</strong> clauses, <strong>the</strong> control word @INDENT can be used for additional<br />
indentation beyond <strong>the</strong> MARGIN setting.<br />
OUT fn<br />
provides <strong>the</strong> name for a new P-<strong>STAT</strong> system file which contains <strong>the</strong> contents in <strong>the</strong> input file as <strong>the</strong>y are<br />
modified by <strong>the</strong> TEXTWRITER <strong>PPL</strong>.<br />
PUTL.CHARS 'cs'<br />
STREAM<br />
provides 1 <strong>to</strong> 3 characters which will replace <strong>the</strong> “ = “ (blank, equal-sign, blank) which usually separates<br />
<strong>the</strong> variable name from <strong>the</strong> value in a PUTL situation.<br />
specifies <strong>the</strong> continuous output of text — text is not flushed and printed upon <strong>the</strong> completion of each<br />
case. All control words except @INDENT and @WIDTH, which cause <strong>the</strong> flushing of text, are reset at<br />
<strong>the</strong> start of processing of each case. CASE is assumed when STREAM is not specified.<br />
SPREAD and NO SPREAD<br />
SPREAD is assumed. It causes a single blank <strong>to</strong> be placed between adjacent variables. NO SPREAD<br />
causes <strong>the</strong> variables <strong>to</strong> be written out directly one after <strong>the</strong> o<strong>the</strong>r with no intervening space.<br />
arg=keyword argument fn=file name nn=number cs=character string
TEXTWRITER: Report Writing 11.33<br />
WIDTH nn<br />
gives <strong>the</strong> number of columns <strong>to</strong> be used for <strong>the</strong> report. When WIDTH is not used, <strong>the</strong> current output<br />
width defines <strong>the</strong> number of columns up <strong>to</strong> a maximum of 400. A specified WIDTH can be from 2 <strong>to</strong><br />
400. The control word @WIDTH does <strong>the</strong> same thing within PUT instructions.<br />
WIDTH is measured from <strong>the</strong> first column. Thus, if MARGIN 20 and WIDTH 80 are specified, <strong>the</strong>re<br />
are 60 columns available for text. These columns can be referred <strong>to</strong> using @1 through @60.<br />
Optional Control Words:<br />
@nn<br />
Control words are used in TEXTWRITER <strong>to</strong> control positioning of text. Any of <strong>the</strong>m, except @IN-<br />
DENT, @JUST and @WIDTH, may also be used in <strong>the</strong> <strong>PPL</strong> that may follow any command for data<br />
cleaning or producing brief reports. The control words all begin with “@” and many of <strong>the</strong>m are followed<br />
by a number giving a column location or o<strong>the</strong>r value. The number may follow directly after <strong>the</strong><br />
control word (@SKIP2) or may follow after an equal-sign (@SKIP=2). Although <strong>the</strong> argument directly<br />
following <strong>the</strong> control word is typically a number, it may be any expression (within paren<strong>the</strong>ses) that evaluates<br />
<strong>to</strong> a numeric value.<br />
The following control words remain in effect throughout <strong>the</strong> processing of a case unless <strong>the</strong>y are specifically<br />
changed or turned off:<br />
@INDENT @EQUAL<br />
@WIDTH @MISS<br />
@JUST @COMMAS<br />
@TRIM @PLACES<br />
Prefacing a control word with “NO” (@NOCOMMAS) turns off a prior setting.<br />
The following control words apply only <strong>to</strong> <strong>the</strong> variable expression or character string that directly<br />
follows:<br />
@nn @PAGE<br />
@PLUS @NEXT<br />
@MINUS @PARA<br />
@BEFORE @SKIP<br />
(“nn” represents a positive whole number.) These control words must be reissued <strong>to</strong> produce <strong>the</strong> desired<br />
results again.<br />
specifies a column location, measured from <strong>the</strong> start of <strong>the</strong> line. The column pointer moves <strong>to</strong> this location<br />
and <strong>the</strong> next character is written in this column.<br />
[PUT @10 'The initial T in this line is in column 10.'] produces:<br />
The initial T in this line is in column 10.<br />
123456789012345678901234567890<br />
(The additional line is a scale and not part of <strong>the</strong> output.)<br />
@BEFORE nn<br />
specifies a column location against which <strong>the</strong> next output element is right aligned. The text is written<br />
before <strong>the</strong> specified column. When no location is given, text is written before <strong>the</strong> current location of <strong>the</strong><br />
column pointer.<br />
(PUT @BEFORE30 'The period is in column 29.') produces:<br />
fn=file name nn=number cs=character string arg=keyword argument
11.34 TEXTWRITER: Report Writing<br />
The period is in column 29.<br />
123456789012345678901234567890<br />
This instruction:<br />
[ PUT 'The amount due is: '<br />
@BEFORE (' $' // CHARACTER (Amount.Due) ) ;<br />
produces:<br />
@COMMAS<br />
The amount due is: $12.56<br />
requests that all numeric values be printed with commas inserted every three digits (counting from <strong>the</strong><br />
decimal point <strong>to</strong> <strong>the</strong> left). This makes large numbers easier <strong>to</strong> read. @NOCOMMAS turns off<br />
@COMMAS.<br />
@EQUAL nn<br />
gives <strong>the</strong> column location of <strong>the</strong> equal-sign separating a variable name from its value. @EQUAL is used<br />
with <strong>the</strong> <strong>PPL</strong> instruction PUTL which puts a variable name (label) as well as its value in <strong>the</strong> line of text.<br />
Multiple locations may be specified:<br />
(PUTL @EQUAL10:30:50 Last First Age)<br />
The output line contains <strong>the</strong> three variables and <strong>the</strong>ir values, with <strong>the</strong> equal-signs in <strong>the</strong> specified<br />
columns:<br />
Last = Wilson First = Margaret Age = 23<br />
If <strong>the</strong> spacing is not adequate for <strong>the</strong> actual length of some of <strong>the</strong> character variables or <strong>the</strong> actual width<br />
of numeric variables, those long values print on <strong>the</strong> next line. @NOEQUAL turns off <strong>the</strong> @EQUAL<br />
specifications. An easy way <strong>to</strong> print all <strong>the</strong> variables in your file is <strong>to</strong> use <strong>the</strong> system variable .ALL. instead<br />
of <strong>the</strong> list of variable names.<br />
@INDENT nn<br />
@JUST<br />
specifies an additional number of columns <strong>to</strong> indent text from <strong>the</strong> current margin. The value after @IN-<br />
DENT is added <strong>to</strong> <strong>the</strong> current margin setting and text is indented that many columns from <strong>the</strong> left. This<br />
defines <strong>the</strong> new left margin of <strong>the</strong> report. (The current margin is that set by <strong>the</strong> identifier MARGIN in<br />
<strong>the</strong> TEXTWRITER command or, if MARGIN is not used, <strong>the</strong> default value 0.) @IN is an abbreviation<br />
for @INDENT. @NOINDENT resets <strong>the</strong> indentation <strong>to</strong> that specified by <strong>the</strong> MARGIN identifier or <strong>to</strong><br />
0 if MARGIN was not used.<br />
requests that <strong>the</strong> text be right justified as well as left justified — that is, <strong>the</strong> lines of text be aligned on<br />
<strong>the</strong> right edge as well as <strong>the</strong> left one. @NOJUST turns off right justification, overriding <strong>the</strong> JUSTIFY<br />
identifier in <strong>the</strong> TEXTWRITER command.<br />
@MISS 'cs'<br />
defines a character <strong>to</strong> print <strong>to</strong> indicate any of <strong>the</strong> three types of missing values. It is used when characters<br />
o<strong>the</strong>r than dashes are desired <strong>to</strong> indicate missing values. @MISS1, @MISS2 and @MISS3 specify different<br />
characters for <strong>the</strong> three individual types of missing values:<br />
[ PUT Student.ID Last.Name Course.No<br />
@MISS1 @MISS2<br />
arg=keyword argument fn=file name nn=number cs=character string
TEXTWRITER: Report Writing 11.35<br />
@NEXT First.Sec<br />
@NEXT Second.Sec ]<br />
@M, @M1, @M2 and @M3 are abbreviations. @NOMISS (or @NOMISS2, etc.) resets <strong>the</strong> specified<br />
missing character back <strong>to</strong> dashes.<br />
@MINUS nn<br />
@NEXT<br />
@PARA<br />
@PAGE<br />
requests that <strong>the</strong> column pointer move left <strong>the</strong> specified number of columns. The current column location<br />
minus <strong>the</strong> numeric argument yields <strong>the</strong> column in which text will print. @PLUS moves <strong>the</strong> pointer <strong>to</strong><br />
<strong>the</strong> right.<br />
moves <strong>the</strong> column pointer <strong>to</strong> <strong>the</strong> beginning of <strong>the</strong> next line. Subsequent text is written on this new line .<br />
When @NEXT is not used, text is written on <strong>the</strong> current line until it is full and <strong>the</strong>n text continues on <strong>the</strong><br />
next line.<br />
(Note that this is opposite <strong>to</strong> what occurs when PUT is used in <strong>PPL</strong> following commands o<strong>the</strong>r than TEX-<br />
TWRITER. Then, a new line is started for each PUT clause unless an “@” by itself is used <strong>to</strong> hold <strong>the</strong><br />
column pointer in <strong>the</strong> current line.)<br />
requests that a new paragraph start. Subsequent text prints on <strong>the</strong> next line, beginning in <strong>the</strong> fourth<br />
column.<br />
requests that subsequent text print on a new page.<br />
@PLACES nn<br />
gives <strong>the</strong> number of decimal places <strong>to</strong> use in printing numeric values. Numbers are rounded if <strong>the</strong>y have<br />
more than <strong>the</strong> specified number of decimal places, or zeros are added <strong>to</strong> pad <strong>the</strong> numbers if <strong>the</strong>y have<br />
fewer than <strong>the</strong> specified number of places. @PL is an abbreviation. @NOPLACES turns off <strong>the</strong> prior<br />
places specification. Numbers <strong>the</strong>n print with <strong>the</strong>ir actual number of decimal places. (@PLACES0 or<br />
@PLACES=0 should be used <strong>to</strong> specify no decimal places or decimal point in <strong>the</strong> output.)<br />
@PLUS nn<br />
requests that <strong>the</strong> column pointer move right <strong>the</strong> specified number of columns. @MINUS moves <strong>the</strong><br />
pointer <strong>to</strong> <strong>the</strong> left.<br />
@SKIP nn<br />
@SPREAD<br />
@TRIM<br />
specifies <strong>the</strong> number of lines <strong>to</strong> skip before printing text. When no number follows @SKIP, one line is<br />
skipped. @SK is an abbreviation.<br />
is assumed and causes a single blank <strong>to</strong> be inserted between variables. @NOSPREAD causes that blank<br />
<strong>to</strong> be omitted.<br />
requests that lead and trailing blanks be trimmed from character values before <strong>the</strong>y are positioned in <strong>the</strong><br />
text. This is assumed by TEXTWRITER and need not be specified explicitly. @NOTRIM turns trimming<br />
off. Untrimmed, a character string will occupy as many columns as its defined length, even though<br />
it may be only one character long or entirely blank.<br />
fn=file name nn=number cs=character string arg=keyword argument
11.36 TEXTWRITER: Report Writing<br />
@WIDTH nn<br />
defines <strong>the</strong> output line width of <strong>the</strong> report. It overrides any previous output width settings defined by <strong>the</strong><br />
command OUTPUT.WIDTH or <strong>the</strong> identifier WIDTH in TEXTWRITER. The argument for @WIDTH<br />
may range from 2 <strong>to</strong> 400.<br />
@NOWIDTH turns off <strong>the</strong> line width setting, which <strong>the</strong>n reverts <strong>to</strong> that defined by <strong>the</strong> WIDTH identifier<br />
or <strong>the</strong> OUTPUT.WIDTH command. @NOWIDTH resets <strong>the</strong> line width <strong>to</strong> <strong>the</strong> original output width.<br />
TEXTWRITER and POSTSCRIPT<br />
Also Required:<br />
POSTSCRIPT<br />
Requires additional identifiers following <strong>the</strong> TEXTWRITER text.<br />
JUSTIFY, POSTSCRIPT, PORTRAIT,<br />
LEFT.EDGE 2., RIGHT.EDGE 2.,<br />
FONT1 TIMES 12, FONT2 TIMES BOLD 12 $<br />
is required unless <strong>the</strong> command is included within a PostScript block. A PostScript block begins with a<br />
POSTSCRIPT command and ends with a POSTSCRIPT.CLOSE command.<br />
Optional Identifiers for PostScript Output:<br />
BOTTOM.EDGE nn<br />
sets <strong>the</strong> bot<strong>to</strong>m edge <strong>the</strong> specified number of inches from <strong>the</strong> bot<strong>to</strong>m of <strong>the</strong> page. If <strong>the</strong>re is no argument,<br />
<strong>the</strong> bot<strong>to</strong>m edge is reset <strong>to</strong> its beginning value. This is usually 1 inch for all edges unless changed in a<br />
POSTSCRIPT.SETUP command. The measurements can be fractional.<br />
FONT arg arg nn<br />
provides <strong>the</strong> name, type, and point size for <strong>the</strong> fonts <strong>to</strong> be used. A character string in quotes can replace<br />
<strong>the</strong> first two arguments <strong>to</strong> specify an alternate font not supported in <strong>the</strong> keywords. Available keyword<br />
combinations are:<br />
TIMES HELVETICA COURIER<br />
TIMES BOLD HELVETICA BOLD COURIER BOLD<br />
TIMES ITALIC HELVETICA OBLIQUE COURIER OBLIQUE<br />
TIMES BOLDITALIC HELVETICA BOLDOBLIQUE COURIER BOLDOBLIQUE<br />
FONT1-FONT9 arg arg nn<br />
provides up <strong>to</strong> 9 different font/type/size combinations for use in <strong>the</strong> command.<br />
LANDSCAPE<br />
specifies that <strong>the</strong> orientation of <strong>the</strong> page is 11 wide by 8 1/2 high<br />
LEFT.EDGE nn<br />
PORTRAIT<br />
specifies <strong>the</strong> starting location from <strong>the</strong> left edge in inches. The number may be fractional. If no number<br />
is supplied <strong>the</strong> left edge is reset <strong>to</strong> <strong>the</strong> beginning value.<br />
specifies that <strong>the</strong> orientation of <strong>the</strong> pages is 8 1/2 inches wide by 11 inches high<br />
arg=keyword argument fn=file name nn=number cs=character string
TEXTWRITER: Report Writing 11.37<br />
RIGHT.EDGE nn<br />
specifies <strong>the</strong> number of inches that <strong>the</strong> output should be from <strong>the</strong> right hand edge of <strong>the</strong> paper. The number<br />
may be fractional. If it is used without an argument it is reset <strong>to</strong> <strong>the</strong> beginning value.<br />
‘SHOWPAGE and NO SHOWPAGE<br />
SHOWPAGE is assumed. NO SHOWPAGE is used when you wish <strong>to</strong> put more than one command on<br />
a single sheet of paper.<br />
TOP.EDGE nn<br />
specifies <strong>the</strong> starting location of <strong>the</strong> prin<strong>to</strong>ut from <strong>the</strong> <strong>to</strong>p of <strong>the</strong> paper in inches which may be fractional.<br />
If no number is supplied, it is reset <strong>to</strong> <strong>the</strong> beginning value.<br />
Optional Control Words:<br />
@FONT1-@FONT9<br />
causes an immediate change in <strong>the</strong> font that is used. That font remains in effect until <strong>the</strong> next FONTn<br />
control word is processed.<br />
@CINCH.U=nn<br />
@RINCH=nn<br />
centers and also underlines <strong>the</strong> string at inch nn.<br />
puts <strong>the</strong> TEXTWRITER text right justified at <strong>the</strong> specified location. This works well for numbers if <strong>the</strong><br />
number of decimal places is controlled.<br />
@RINCH.U=nn<br />
@LINCH=nn<br />
right justifies and underlines <strong>the</strong> text.<br />
left justifies text at <strong>the</strong> specified location.<br />
@LINCH.U=nn<br />
@PINCH=nn<br />
left justifies and underlines <strong>the</strong> text<br />
centers <strong>the</strong> text around a specified lineup character, which is assumed <strong>to</strong> be a decimal point. This is good<br />
for writing a column of fractional numbers when <strong>the</strong> number of decimal places differs.<br />
@PINCH.U=nn<br />
like PINCH but it also underlines.<br />
@PINCH.CHAR='c'<br />
@FLUSH<br />
provides an alternate character such as '=' <strong>to</strong> be used in <strong>the</strong> pinch lineups. If no argument, it reverts <strong>to</strong><br />
<strong>the</strong> default '.' .<br />
flushes <strong>the</strong> current TEXTWRITER buffer without also moving <strong>to</strong> <strong>the</strong> next line. The effect of flushing<br />
turns off temporary options like @CINCH or @UNDERLINE. It does not affect color settings.<br />
fn=file name nn=number cs=character string arg=keyword argument
11.38 TEXTWRITER: Report Writing<br />
@X1<br />
@X2<br />
@Y1<br />
@Y2<br />
@MOVETO<br />
@DRAW.H<br />
@DRAW.V<br />
s<strong>to</strong>res a value that is <strong>the</strong> current left margin. If <strong>the</strong>re is an argument as @X1=3.4, that inch location is<br />
s<strong>to</strong>red. in <strong>the</strong> X1 variable.<br />
s<strong>to</strong>res a value that is <strong>the</strong> current right margin. An argument such as @X2=8.3 s<strong>to</strong>res that inch value in<br />
<strong>the</strong> X2 variable.<br />
s<strong>to</strong>res a value that is <strong>the</strong> current <strong>to</strong>p margin. If an argument is supplied, that value is s<strong>to</strong>red in <strong>the</strong> Y1<br />
variable.<br />
s<strong>to</strong>res a value that is <strong>the</strong> current bot<strong>to</strong>m margin. If no argument is supplied, that argument is s<strong>to</strong>red in<br />
<strong>the</strong> Y2 variable.<br />
sets <strong>the</strong> current location <strong>to</strong> <strong>the</strong> X1/Y1 position. The next text string will begin at that location.<br />
draws a horizontal line from <strong>the</strong> X1/Y1 position <strong>to</strong> <strong>the</strong> X2/Y1 position.<br />
draws a vertical line from <strong>the</strong> X1/Y1 position <strong>to</strong> <strong>the</strong> X1/Y2 position.<br />
@DRAW.U=nn<br />
underlines <strong>the</strong> current line from X1 <strong>to</strong> X2. nn is <strong>the</strong> amount below <strong>the</strong> current line in units of 72nds of<br />
an inch where <strong>the</strong> line should be drawn. If no argument is given, <strong>the</strong> assumed value is 3.<br />
@DOWN=nn<br />
@UP=nn<br />
@TOP<br />
@BOTTOM<br />
moves down that many lines. The actual distance depends on <strong>the</strong> point size of <strong>the</strong> font and <strong>the</strong> leading<br />
(<strong>the</strong> space between <strong>the</strong> lines). If nn is not specified, 1 is assumed.<br />
moves up that many lines. The actual distance depends on <strong>the</strong> point size and <strong>the</strong> leading. If nn is not<br />
specified, 1 is assumed.<br />
moves <strong>to</strong> <strong>the</strong> first line, just below <strong>the</strong> <strong>to</strong>p margin.<br />
moves <strong>to</strong> <strong>the</strong> last line, just above <strong>the</strong> bot<strong>to</strong>m margin.<br />
@LEADING=nn<br />
specifies <strong>the</strong> space between lines. LEADING is usually set <strong>to</strong> 1/72 of an inch. LEADING=3, increases<br />
<strong>the</strong> space <strong>to</strong> 3/72 of an inch. A larger LEADING improves <strong>the</strong> readability of text when a large point size<br />
is used.<br />
arg=keyword argument fn=file name nn=number cs=character string
TEXTWRITER: Report Writing 11.39<br />
@LINEWIDTH=nn<br />
specifies <strong>the</strong> width of <strong>the</strong> lines and boxes that are drawn. LINEWIDTH is usually set at .5. <strong>Inc</strong>reasing<br />
<strong>the</strong> LINEWIDTH causes bolder looking lines. @LINEWITH with no argument resets it <strong>to</strong> <strong>the</strong> original<br />
value of .5.<br />
@UNDERLINE<br />
begin underlining and continue until a subsequent @SKIP, @NEXT or @PAGE control word ends <strong>the</strong><br />
current chunk of output. @NOUNDERLINE can be also be used <strong>to</strong> end underlining.<br />
@NOUNDERLINE<br />
end of underlined text.<br />
The following control words can be used <strong>to</strong> control <strong>the</strong> color of subsequent printing. Color stays in effect until it<br />
is changed. @NOCOLOR is equivalent <strong>to</strong> @BLACK<br />
@RED<br />
@ORANGE<br />
@YELLOW<br />
@GREEN<br />
@BLUE<br />
@VIOLET<br />
@BLACK<br />
@NOCOLOR<br />
Color can also be changed by using <strong>the</strong> POSTSCRIPT.SETUP command <strong>to</strong> define fonts with specific<br />
colors. When <strong>the</strong> font is changed, <strong>the</strong> specified color will be used.<br />
fn=file name nn=number cs=character string arg=keyword argument
12<br />
P-<strong>STAT</strong> MACROS<br />
A macro is a named collection of text that can be inserted at any point in a P-<strong>STAT</strong> run. It may contain an entire<br />
command or a series of many commands. It may contain a fragment of a command, subcommand or data record.<br />
A macro can be changed dynamically at execution by passing keyword or positional arguments. This chapter<br />
covers:<br />
1. Macro format<br />
2. Activating a macro<br />
3. Types of macros<br />
4. Keyword arguments<br />
5. Positional arguments<br />
6. Using arguments<br />
7. Default values for arguments<br />
8. Instream macros<br />
9. multi-command macros<br />
10. SUBFILES command<br />
11. DIALOG command<br />
12.1 MACRO FORMAT<br />
MACRO ABC $<br />
contents of <strong>the</strong> macro<br />
ENDMACRO $<br />
A macro has three elements. The MACRO command supplies <strong>the</strong> name of <strong>the</strong> macro and may also have argument<br />
information. It is, in effect, <strong>the</strong> macro header. The contents or body of <strong>the</strong> macro may have an indefinite number<br />
of records (including none). The ENDMACRO command completes <strong>the</strong> macro and must be <strong>the</strong> only thing on its<br />
record. The ENDMACRO command can have a statement label such as:<br />
EXIT: ENDMACRO $<br />
12.2 Types of Macros<br />
There are two types of macros, BLOCK macros and INSTREAM macros. A BLOCK macro contains one or more<br />
full-fledged commands which may have subcommands and data records. It is invoked by using <strong>the</strong> RUN<br />
command.<br />
An IN STREAM macro can contain whatever one wishes. Its contents are inserted in<strong>to</strong> a command or subcommand<br />
wherever !!macname or !!(macname) is found (where macname is <strong>the</strong> name of <strong>the</strong> macro).<br />
Both types of macro can have positional or keyword arguments, and can be defined with default values for<br />
those arguments. Alternatively, a macro can be defined without any arguments. The commands within a BLOCK<br />
macro can contain INSTREAM macro calls. INSTREAM macros can contain o<strong>the</strong>r instream macro calls.
12.2 P-<strong>STAT</strong> MACROS<br />
The following macro contains only one line. Since it does not contain a set of complete commands, it could<br />
not be used as a BLOCK macro but could, for example, be inserted in<strong>to</strong> a SURVEY subcommand. It would be<br />
called by using !!vvv in <strong>the</strong> subcommand. The characters 'age by income ' would replace !!vvv as <strong>the</strong> subcommand<br />
is read.<br />
MACRO vvv $<br />
age by income<br />
ENDMACRO $<br />
This next macro contains a block of commands. It could not be used by an instream reference, since that<br />
would insert several commands within <strong>the</strong> command or subcommand that contained <strong>the</strong> !!rrr, which would cause<br />
syntax errors galore. The correct way <strong>to</strong> use it is by saying RUN rrr $. Its first three lines are comments.<br />
MACRO rrr $<br />
/* example 1 of macro rrr. */<br />
/* no use of arguments. */<br />
/* no use of instream macro calls. */<br />
CORRELATE data1 [ KEEP age income education ], OUT work1$<br />
LIST work1 $<br />
ENDMACRO $<br />
12.3 S<strong>to</strong>ring and Activating Macros<br />
Macros can be s<strong>to</strong>red as ordinary ASCII (or EBCDIC) files which can be edited by an external edi<strong>to</strong>r. Within<br />
P-<strong>STAT</strong>’s edi<strong>to</strong>r each macro appears as a single command even when it is a block macro containing many commands.<br />
The body of <strong>the</strong> macro is s<strong>to</strong>red as data records for <strong>the</strong> macro command. A macro with no body will<br />
appear in <strong>the</strong> edi<strong>to</strong>r with a single data record, <strong>the</strong> ENDMACRO command.<br />
A macro must first be activated before it can be used. Activating is done by processing <strong>the</strong> definition in <strong>the</strong><br />
normal course of processing P- <strong>STAT</strong> commands. When a macro is activated, information about its arguments is<br />
acquired and <strong>the</strong> macro is placed, ready <strong>to</strong> be used, on a temporary file. The currently active macros can be seen<br />
by using <strong>the</strong> SHOWMACROS$ command.<br />
If <strong>the</strong> macro is entered from <strong>the</strong> terminal it is active as soon as <strong>the</strong> ENDMACRO $ command is processed.<br />
Macros s<strong>to</strong>red in an external ASCII file are activated by a TRANSFER command. Macros that are s<strong>to</strong>red in<br />
P-<strong>STAT</strong>’s edit file format are activated au<strong>to</strong>matically when <strong>the</strong> OLD.EDIT.FILE command is executed. In <strong>the</strong><br />
edi<strong>to</strong>r macros can be changed by editing <strong>the</strong> data records. The changed macro is activated by using <strong>the</strong> X (eXecute)<br />
edit instruction.<br />
After a block macro is active (i.e., its definition has been read by P-<strong>STAT</strong>), it can be executed by using <strong>the</strong><br />
RUN command. The RUN command executes <strong>the</strong> entire series of commands defined in <strong>the</strong> macro. For example:<br />
RUN Sales $<br />
An instream macro is executed when <strong>the</strong> command that references it is used. For example, macro VVV defined<br />
above could be executed by:<br />
SURVEY Psfile;<br />
!!vvv ;<br />
$<br />
Figure 12.1 illustrates a command stream that activates three macros. Two are instream macros and one is a<br />
block macro, which uses <strong>the</strong> o<strong>the</strong>r two. The block macro executes a CORRELATE command and a LIST command.<br />
The CORRELATE command uses <strong>the</strong> first instream macro <strong>to</strong> provide <strong>the</strong> input file name and <strong>the</strong> second<br />
instream macro <strong>to</strong> select variables.<br />
The final step in Figure 12.1 is <strong>the</strong> RUN command which calls <strong>the</strong> first macro. The block macro <strong>the</strong>n references<br />
<strong>the</strong> two instream macros. It does not matter in what order <strong>the</strong> macros are activated.
P-<strong>STAT</strong> MACROS 12.3<br />
__________________________________________________________________________<br />
Figure 12.1 Activating Three Macros<br />
MACRO rrr $<br />
/* example 2 of macro rrr, using instream macros. */<br />
/* this macro correlates some variables */<br />
/* and <strong>the</strong>n lists <strong>the</strong> result. */<br />
CORRELATE !!aaa [ KEEP !!bbb ], OUT work1 $<br />
LIST work1 $<br />
ENDMACRO $<br />
MACRO aaa$<br />
data1<br />
ENDMACRO $<br />
MACRO bbb$<br />
age income education<br />
ENDMACRO $<br />
RUN rrr $<br />
__________________________________________________________________________<br />
12.4 Comments Within a Macro<br />
Comments can be used freely in macros. They are particularly useful at <strong>the</strong> beginning of a macro <strong>to</strong> document<br />
what <strong>the</strong> macro does, when it was last changed, who maintains it, and so forth.<br />
Comments start with a /* and end with a */. For example:<br />
/* this macro correlates some variables<br />
and <strong>the</strong>n lists <strong>the</strong> result */<br />
is a valid comment, as is<br />
/*---------*/<br />
/* comment */<br />
/*---------*/<br />
12.5 Macros With Arguments<br />
The macros shown so far have not had any arguments. The only way <strong>to</strong> generalize such a macro was by calling<br />
o<strong>the</strong>r macros. That is perfectly legal but often, especially with block macros, generalizing is better done by using<br />
arguments. There are two types of notation for defining macro arguments: keyword and positional.<br />
The paren<strong>the</strong>ses after a macro name define its arguments, if any. These arguments are known as DUMMY<br />
ARGUMENTS. (In some languages <strong>the</strong>y are known as formal parameters.) When a macro is CALLED, each<br />
occurrence of a dummy argument in <strong>the</strong> body of <strong>the</strong> macro is replaced by <strong>the</strong> associated ARGUMENT VALUE<br />
in <strong>the</strong> call. For example:<br />
MACRO rrr ( file, vars) $<br />
defines a macro named rrr with two keyword argument: file and vars. Figure 12.2 illustrates a version of macro<br />
rrr that has <strong>the</strong> same effect as <strong>the</strong> macro in Figure 12.1 except that <strong>the</strong> names of <strong>the</strong> P-<strong>STAT</strong> system file and <strong>the</strong><br />
variables <strong>to</strong> be used are passed <strong>to</strong> <strong>the</strong> macro in <strong>the</strong> RUN command ra<strong>the</strong>r than from <strong>the</strong> instream macros. However,<br />
using arguments is a simpler way <strong>to</strong> allow <strong>the</strong> macro <strong>to</strong> be used with differing filenames and sets of variables.
12.4 P-<strong>STAT</strong> MACROS<br />
__________________________________________________________________________<br />
Figure 12.2 Block Macro With Keyword Arguments<br />
. MACRO rrr ( file, vars) $<br />
/* example 3 of macro rrr. */<br />
/* this macro correlates some variables */<br />
/* and <strong>the</strong>n lists <strong>the</strong> result. */<br />
/* it uses KEYWORD arguments instead */<br />
/* of calls <strong>to</strong> o<strong>the</strong>r macros. */<br />
CORRELATE &file [ KEEP &vars ], OUT work1 $<br />
LIST work1 $<br />
ENDMACRO $<br />
RUN rrr ( data1, age income education ) $<br />
__________________________________________________________________________<br />
The ‘&’ is used <strong>to</strong> identify keywords <strong>to</strong> be replaced within <strong>the</strong> macro. Since ‘file’ was defined as <strong>the</strong> first<br />
keyword dummy argument, every use of &file within <strong>the</strong> macro is replaced by <strong>the</strong> first argument value. Similarly,<br />
&vars is replaced by <strong>the</strong> second argument value. &(file) and &(vars) can also be used; <strong>the</strong>se specify <strong>the</strong> keyword<br />
more precisely. Argument values are separated by commas.<br />
( data1, age income education )<br />
Thus data1 is a single argument and since it is <strong>the</strong> first argument it is associated with <strong>the</strong> keyword “file”. The<br />
second argument “vars” receives <strong>the</strong> entire string “age income education”. The following is what is actually<br />
executed:<br />
CORRELATE data1 [ KEEP age income education ], OUT work1 $<br />
LIST work1 $<br />
Figure 12.3 illustrates <strong>the</strong> same macro using positional arguments. This is done by providing <strong>the</strong> number of<br />
arguments in <strong>the</strong> paren<strong>the</strong>sis after <strong>the</strong> macro name.<br />
MACRO rrr ( 2 ) $<br />
When positional arguments are used, <strong>the</strong> body of <strong>the</strong> macro contains <strong>the</strong> position preceded by <strong>the</strong> “&”. Thus wherever<br />
&1 or &(1) is found within <strong>the</strong> macro <strong>the</strong> first argument value found in <strong>the</strong> call will be used.<br />
__________________________________________________________________________<br />
Figure 12.3 Block Macro With Positional Arguments<br />
MACRO rrr ( 2 ) $<br />
/* example 4 of macro rrr. */<br />
/* <strong>the</strong> same thing using POSITIONAL arguments */<br />
CORRELATE &1 [ KEEP &2 ], OUT work1 $<br />
LIST work1 $<br />
ENDMACRO $<br />
RUN rrr ( data1, age income education ) $<br />
__________________________________________________________________________
P-<strong>STAT</strong> MACROS 12.5<br />
12.6 Using Arguments<br />
There are a few simple rules <strong>to</strong> follow when using arguments in a macro.<br />
1. A keyword for a dummy argument should start with a letter, contain letters, digits and decimal<br />
points, and have no more than 16 characters. For example, FILE and VARS in Figure 12.2.<br />
2. Each such keyword should be found at least once in <strong>the</strong> macro, preceded by an ampersand (&).<br />
3. Similarly, if macro ppp(4)$ were used, <strong>the</strong> macro should contain at least one usage each of &1, &2,<br />
&3 and &4.<br />
4. The keyword or integer can be within paren<strong>the</strong>ses, like &(file) or &(2).<br />
5. There can be as many as 150 keywords or positional arguments. I.e., macro xxx(150)$ is possible.<br />
The order in which <strong>the</strong>y are found in <strong>the</strong> body of <strong>the</strong> macro does not matter.<br />
6. Usages of <strong>the</strong> first &keyword are replaced by <strong>the</strong> first argument value, usages of <strong>the</strong> second &keyword<br />
by <strong>the</strong> second argument value, etc.<br />
7. Positional macros behave similarly. Usages of &1 are replaced by <strong>the</strong> first argument value, usages<br />
of &2 by <strong>the</strong> second argument value, etc.<br />
8. The number of dummy arguments given in <strong>the</strong> definition must be <strong>the</strong> same as <strong>the</strong> number of argument<br />
values supplied when <strong>the</strong> macro is called.<br />
There are similar rules for <strong>the</strong> actual arguments used when <strong>the</strong> macro is invoked:<br />
1. Argument values are separated by commas. Argument values of 11 and 22 for macro zzz would be<br />
conveyed by saying<br />
RUN ZZZ (11,22)$ or !!ZZZ(11,22) or !!(ZZZ)(11,22)<br />
2. Argument values can be quoted. This is necessary when <strong>the</strong> value contains a comma or right paren<strong>the</strong>sis<br />
or a form of quote. Thus, "john's house" is valid, as is 'xx"xx'. Ei<strong>the</strong>r quote(") or apostrophe<br />
(') can be used unless that character is part of <strong>the</strong> value, in which case <strong>the</strong> o<strong>the</strong>r should be used <strong>to</strong><br />
bound <strong>the</strong> value. Suppose you want <strong>to</strong> pass, literally, 'title text' <strong>to</strong> a macro. The argument should be<br />
“'title text'”<br />
3. A quoted value can be empty, as in !!abc( “”). This is called a NULL value but it is none<strong>the</strong>less a<br />
value. The associated &keyword in <strong>the</strong> macro would simply vanish. If !!abc(" ") were used, <strong>the</strong><br />
one blank would replace <strong>the</strong> &keyword.<br />
4. If quoted, <strong>the</strong> value that is used is <strong>the</strong> contents of <strong>the</strong> quotes. If not quoted, it is <strong>the</strong> first nonblank<br />
through <strong>the</strong> last nonblank before <strong>the</strong> comma or right paren<strong>the</strong>sis. Consider <strong>the</strong>se macro calls both<br />
of which do exactly <strong>the</strong> same thing:<br />
!!ppl ( age income education )<br />
!!ppl ( 'age income education' )<br />
Both are evaluated as having one argument. Since <strong>the</strong> defining quotes are stripped as <strong>the</strong> argument<br />
is moved in<strong>to</strong> place. However, consider <strong>the</strong>se macro calls:<br />
!!ppl ( IF age missing, DELETE )<br />
!!ppl ( 'IF age missing, DELETE' )<br />
The first will be evaluated as having two arguments, because of <strong>the</strong> comma. The second has but one<br />
argument. Put a value within quotes if it contains commas, etc.<br />
5. An argument value can, in one situation, have several actual values, as in<br />
!!abc ( 'age' 'income' 'education' )
12.6 P-<strong>STAT</strong> MACROS<br />
Each of those values must be quoted. These are used when a subcommand record contains <strong>the</strong> associated<br />
&keyword and nothing else. That record is discarded and, in its place, a subcommand record<br />
is written for each of <strong>the</strong> values. The above would generate three records: one containing age, one<br />
containing income, and one containing education.<br />
6. An argument can be omitted. !!zzz( , ) has two omitted arguments, which is allowed only when <strong>the</strong><br />
macro was defined with default values <strong>to</strong> be used when a call omits a value. Defaults are described<br />
below.<br />
7. P(3) or such can be used as an argument value. If P(3) is set <strong>to</strong> 123.456, those seven characters constitute<br />
<strong>the</strong> resulting argument value. In o<strong>the</strong>r words, <strong>the</strong> internal double precision binary number<br />
currently in P(3) is formatted in<strong>to</strong> ascii characters, and those ascii characters serve as <strong>the</strong> actual argument.<br />
The value should not be missing.<br />
8. #N or ##TOTAL or such can be used as an argument value. The scratch variable can be numeric or<br />
character. The actual argument is <strong>the</strong> formatted ascii representation of a numeric scratch variable,<br />
or <strong>the</strong> current character value of a character scratch variable. The value should not be missing.<br />
__________________________________________________________________________<br />
Figure 12.4 Macro With Positional Arguments and Default Values<br />
MACRO sss ( 2 ) ( age, income )$<br />
BANNER &1, STUB &2;<br />
ENDMACRO $<br />
SURVEY data2;<br />
!!sss (,) which becomes: BANNER age, STUB income;<br />
$<br />
_________________________________________________________________________.<br />
12.7 Default Values for Arguments<br />
When a macro is defined with arguments, it can contain default values <strong>to</strong> be used when a call omits one or more<br />
values. Defaults are placed in paren<strong>the</strong>ses after <strong>the</strong> keyword or positional paren<strong>the</strong>ses. Since <strong>the</strong>y are <strong>to</strong> be used<br />
as argument values when necessary, <strong>the</strong>ir syntax is <strong>the</strong> same as that of <strong>the</strong> argument values in a macro call.<br />
MACRO abc ( fff, vvv ) ( work1, )$<br />
LIST &fff [ KEEP &vvv ]$<br />
ENDMACRO$<br />
Macro abc has two arguments. A default is supplied for <strong>the</strong> first argument. There is no supplied default for<br />
<strong>the</strong> second argument. A default value will be used when <strong>the</strong> call does not supply a value for <strong>the</strong> argument. For<br />
example:<br />
RUN abc( , age income )$<br />
has no initial argument because <strong>the</strong>re are only blanks before <strong>the</strong> initial comma. Since <strong>the</strong>re is no first value <strong>to</strong><br />
replace &fff, a default for that argument must have been included in <strong>the</strong> definition and will be used now. The<br />
expansion is:<br />
LIST work1 [ KEEP age income ]$<br />
The defaults are <strong>to</strong>tally ignored if <strong>the</strong> call has actual values for each argument. The existence of defaults do<br />
not change <strong>the</strong> need for a call <strong>to</strong> indicate <strong>the</strong> presence or absence of its argument or arguments. Given <strong>the</strong> macro<br />
above, <strong>the</strong>se calls are errors:<br />
RUN abc ( age income )$ that is just one value, and<br />
<strong>the</strong> macro has two arguments.
P-<strong>STAT</strong> MACROS 12.7<br />
Values are separated by commas.<br />
RUN abc ( age income, )$ now we have two values, <strong>the</strong><br />
second being explicitly omitted,<br />
but <strong>the</strong> definition has no<br />
default for <strong>the</strong> second value.<br />
Figure 12.4 illustrates a macro with 2 positional arguments. Both have default values provided in <strong>the</strong> definition.<br />
Because both values are available <strong>the</strong> macro can be used with no values, one value or both values. A<br />
definition can also have omitted default values. The following macro has 5 positional arguments. Defaults are<br />
provided for &1 and &3 but not for &2, &4 and &5.<br />
MACRO xxx (5) ( aaa,, ccc,, )$<br />
Consider <strong>the</strong> following instream macro call.<br />
!!mmm ( abc, "", ).<br />
It has three values. Value 2 is null, but it is still regarded as a value. Only argument 3 would invoke a default<br />
value.<br />
___________________________________________________________________________<br />
Figure 12.5 Macros Can Call Macros<br />
.<br />
MACRO aaa $<br />
1 2 3<br />
!!bbb<br />
7 8 9<br />
MACEND $<br />
MACRO bbb $<br />
11 12 13<br />
!!ccc<br />
MACEND $<br />
MACRO ccc $<br />
101 102 103<br />
MACEND$<br />
MAKE work1, NV 3;<br />
!!aaa<br />
$<br />
LIST work1$<br />
1 2 3 from aaa<br />
11 12 13 from bbb<br />
101 102 103 from ccc<br />
7 8 9 from aaa<br />
___________________________________________________________________________<br />
12.8 Nested Instream Macros<br />
Macros can call macros. Figure 12.5 illustrates <strong>the</strong> use of instream macros in which macro aaa is called within<br />
a MAKE command.
12.8 P-<strong>STAT</strong> MACROS<br />
MAKE work1, NV 3;<br />
!!aaa<br />
$<br />
Macro aaa contains 3 data records. The second record activates ano<strong>the</strong>r instream macro, bbb.<br />
MACRO aaa$<br />
1 2 3<br />
!!bbb<br />
7 8 9<br />
ENDMACRO $<br />
Macro bbb contains 2 data records, one of which is a call <strong>to</strong> macro ccc.<br />
MACRO ccc$<br />
101 102 103<br />
ENDMACRO $<br />
Macro ccc contains a single data record. Since it does not have a call <strong>to</strong> ano<strong>the</strong>r instream macro, <strong>the</strong> command<br />
is completed with records taken from <strong>the</strong> 3 instream macros in <strong>the</strong> order in which <strong>the</strong> records are processed.<br />
There is no rule that prohibits macros from recursion. For example macro ccc could call macro aaa. This will<br />
cause <strong>the</strong> MAKE command <strong>to</strong> continue until it runs out of disk space.<br />
12.9 Instream Macros in a Command<br />
A command can have many instream macro calls. They can occur anywhere after <strong>the</strong> command name and<br />
before <strong>the</strong> ending dollar or semicolon. The command text is scanned from its beginning for macro calls after each<br />
macro insertion. Therefore, its macros can call o<strong>the</strong>r macros indefinitely.<br />
The characters of an instream macro record are inserted through <strong>the</strong> right-most non blank, possibly with an<br />
additional padding blank when <strong>the</strong> record has less than 80 characters. These insertions may extend a command<br />
by hundreds or even thousands of characters. That causes no problems as long as <strong>the</strong> command does not exceed<br />
its maximum size, which in <strong>the</strong> Whopper/2 version of P-<strong>STAT</strong> is 50,000 characters.<br />
12.10 Instream Macros in Subcommands<br />
A single subcommand can also have many instream macro calls. Subcommand processing, however, is done differently<br />
due <strong>to</strong> <strong>the</strong> limit of 80 characters in a single subcommand record. There are two forms of subcommand<br />
macro expansion. The first occurs when <strong>the</strong> macro call is NOT <strong>the</strong> only thing in <strong>the</strong> record. For example,<br />
BANNER !!aaa, STUB !!bbb;<br />
The expansions are done in an array that can hold 800 bytes, which is ten times <strong>the</strong> size of a subcommand<br />
record. As with commands, <strong>the</strong> array is re-scanned after each insertion so that nested macros are honored. When<br />
no more macros are found, <strong>the</strong> array is written in up-<strong>to</strong>-80 character chunks <strong>to</strong> <strong>the</strong> subcommand buffer for use by<br />
<strong>the</strong> command that is currently active. Up-<strong>to</strong>-80 means that each chunk ends with at a reasonable point at or before<br />
80. Reasonable means breaking at a blank, comma, right paren<strong>the</strong>sis, etc.<br />
Different rules prevail when <strong>the</strong> macro call is <strong>the</strong> only thing on <strong>the</strong> subcommand record. First, that record<br />
vanishes. Instead, a subcommand record is generated for each line of <strong>the</strong> macro. However, what about arguments<br />
on one of <strong>the</strong>se lines?<br />
If <strong>the</strong> positional argument in <strong>the</strong> macro (&3 or such) is not <strong>the</strong> only thing on <strong>the</strong> line, <strong>the</strong> record is expanded<br />
by replacing <strong>the</strong> argument with its value. This continues until <strong>the</strong> line has no more arguments. Then it is broken<br />
in<strong>to</strong> 80's as described above. If <strong>the</strong> positional argument in <strong>the</strong> macro (&3 or such) IS <strong>the</strong> only thing on <strong>the</strong> line, a<br />
subcommand record is generated for EACH of <strong>the</strong> argument's non-null values.<br />
In an instream macro, <strong>the</strong> default is <strong>to</strong> insert <strong>the</strong> characters through <strong>the</strong> right most non-blank AND THEN<br />
ADD ONE BLANK. Consider:
P-<strong>STAT</strong> MACROS 12.9<br />
MACRO vars $<br />
age<br />
income<br />
ENDMACRO $<br />
An instream usage might be:<br />
LIST x[ KEEP !!(vars) ] $<br />
Is <strong>the</strong> !!(vars) replaced by 9 characters (ageincome), or by 11 characters (age income ), or by 160 characters<br />
(age + 77 blanks and income + 74 blanks)? In o<strong>the</strong>r words, do we PAD <strong>the</strong> records as <strong>the</strong>y are inserted? If so, by<br />
how much?<br />
A run begins with <strong>the</strong> padding default set <strong>to</strong> one. However, <strong>the</strong>re are several ways <strong>to</strong> change <strong>the</strong> default.<br />
MACRO.PAD n$ is a command that specifies <strong>the</strong> padding default for macros activated subsequently. N can be<br />
zero (which sets a no-pad status) or some larger integer, like 1 or 80.<br />
MACRO XXX (file), PAD n $ causes <strong>the</strong> pad default for that specific macro <strong>to</strong> be n (also 0 <strong>to</strong> 80) ra<strong>the</strong>r than<br />
<strong>the</strong> current MACRO.PAD setting.<br />
___________________________________________________________________________<br />
Figure 12.6 Instream Macros in Subcommand Records.<br />
MACRO sss $<br />
STUB Q1 TO Q43<br />
ENDMACRO $<br />
MACRO bbb $<br />
BANNER Age <strong>Inc</strong>ome Education<br />
ENDMACRO $<br />
SURVEY work1, ECHO; produces<br />
!!sss, !!bbb; STUB Q1 TO Q43, BANNER Age <strong>Inc</strong>ome Education;<br />
$<br />
SURVEY work1, ECHO; produces<br />
!!sss, STUB Q1 TO Q43,<br />
!!bbb; BANNER Age <strong>Inc</strong>ome Education;<br />
$<br />
___________________________________________________________________________<br />
A specific record in an instream macro can contain a padding specification which takes precedence over any<br />
pad default. This is done using back-slashes. If a record in a macro ends with two back-slashes, like<br />
age \\<br />
<strong>the</strong> characters up <strong>to</strong> (but not including) <strong>the</strong> back-slashes will be inserted. The above record will cause <strong>the</strong> 4 characters<br />
‘age ’<strong>to</strong> be inserted.<br />
The same 4-character insertion would happen with<br />
age \\ /*a comment*/<br />
Ei<strong>the</strong>r of <strong>the</strong>se would insert just 3 characters:<br />
age\\ /*a comment*/<br />
age\\<br />
__________________________________________________________________________
12.10 P-<strong>STAT</strong> MACROS<br />
Figure 12.7 Lots of Instream Macros<br />
.<br />
MACRO input $<br />
ibm.data<br />
ENDMACRO $<br />
MACRO ppl $<br />
if age gt 20, retain;<br />
set region <strong>to</strong> recode ( region, 99=m )<br />
ENDMACRO $<br />
MACRO labfile $<br />
"ibm.labels"<br />
ENDMACRO $<br />
MACRO date $<br />
August 13, 2006<br />
ENDMACRO $<br />
MACRO layout $<br />
layout question <strong>to</strong>tals labels body summary,<br />
places means 3,<br />
row column percents,<br />
ENDMACRO $<br />
MACRO define $<br />
define 'under $20,000' income 1 <strong>to</strong> 3,<br />
define 'over $20,000' income 4 <strong>to</strong> 6,<br />
ENDMACRO $<br />
MACRO stub.banner $<br />
stub age income, banner region;<br />
ENDMACRO $<br />
__________________________________________________________________________<br />
12.11 Using Lots of Instream Macros<br />
TRANSFER 'macro.file' $<br />
SURVEY !!input [ !!ppl ], LABELS !!labfile ;<br />
TITLE "this was run on !!date",<br />
!!layout<br />
!!define<br />
!!stub.banner<br />
$<br />
In <strong>the</strong> above example, a transfer is done first <strong>to</strong> a file containing <strong>the</strong> macro definitions that may be needed.<br />
The SURVEY command uses seven macros; <strong>the</strong> macro.file should contain those seven and may well contain<br />
more. Activating a macro which goes unused does not cause any problems; it simply uses a bit more space on a<br />
temporary scratch-file on disk. Figure 12.7 illustrates what <strong>the</strong> transfer file might contain.
P-<strong>STAT</strong> MACROS 12.11<br />
12.12 MACRO COMMANDS<br />
Thus far we have seen three commands that are associatied with macros.<br />
1. MACRO provides <strong>the</strong> macro name<br />
2. ENDMACROdefines <strong>the</strong> end of <strong>the</strong> macro<br />
3. RUN executes a block macro<br />
There are four more useful macro commands:<br />
4. MACRO.PAD<br />
5. SHOW.MACROS<br />
6. COUNT.MACROS<br />
7. FULL.MACRO.ARGS<br />
MACRO.PAD 0 $ changes <strong>the</strong> default padding for records of instream macros activated subsequently. The<br />
run begins with a default of 1. Values of 0 <strong>to</strong> 80 can be used.<br />
SHOW.MACROS $ can be used <strong>to</strong> display <strong>the</strong> currently activated macros. This prints <strong>the</strong> entire contents of<br />
<strong>the</strong> activated macros. SHOW.MACROS, NAMES $ can be used <strong>to</strong> list just <strong>the</strong> names of <strong>the</strong> currently activated<br />
macros. Adding FILE 'filename' causes <strong>the</strong> output <strong>to</strong> be written <strong>to</strong> that file.<br />
COUNT.MACROS $ simply reports how many macros have been read, how many had errors, and how many<br />
are usable. Since TRANSFER only reports each macro activation when verbosity is 4, using COUNT.MACROS<br />
after a transfer <strong>to</strong> a macro library gives a sense of what went on.<br />
When a macro has many arguments, some of which may not be present, constructing a call with <strong>the</strong> proper<br />
number of null arguments can be tricky. FULL.MACRO.ARGS is a command that can be used <strong>to</strong> specify whe<strong>the</strong>r<br />
trailing null arguments are required or optional.<br />
FULL.MACRO.ARGS OFF $<br />
turns off <strong>the</strong> requirement that all macro arguments must be fully represented. Thus in a macro that references 1<br />
<strong>to</strong> 12 months and has 12 arguments in its definition<br />
!!zzz ( Jan, Feb )<br />
can be used instead of:<br />
!!zzz ( Jan, Feb ,,,,,,,,, )<br />
This setting works only for <strong>the</strong> trailing (rightmost) arguments. The command <strong>to</strong> require fully supplied arguments<br />
is:<br />
FULL.MACRO.ARGS $<br />
The macro call can still have null arguments or a comma for defaults but <strong>the</strong>re must be something represented for<br />
every arguments.<br />
The records in a macro definition should not exceed 80 characters. A macro must be activated before it can<br />
be used. This is largely au<strong>to</strong>matic. If you TRANSFER <strong>to</strong> a file which holds all of your macro definitions, <strong>the</strong><br />
result of <strong>the</strong> transfer is <strong>to</strong> activate all of <strong>the</strong> macros it found <strong>the</strong>re.<br />
12.13 CORRECTING MACROS IN THE EDITOR<br />
A macro appears in <strong>the</strong> edi<strong>to</strong>r as a single MACRO command which has some number of 'data' records. Its data<br />
records are, in fact, <strong>the</strong> rest of <strong>the</strong> macro. To change it you should modify <strong>the</strong> text as needed and <strong>the</strong>n EXECUTE<br />
<strong>the</strong> macro command. That de-activates <strong>the</strong> old version and activates <strong>the</strong> new version.
12.12 P-<strong>STAT</strong> MACROS<br />
If a macro appears in <strong>the</strong> edi<strong>to</strong>r as a series of commands, ending with an ENDMACRO$ command, it can be<br />
changed in <strong>the</strong> usual way. Then, <strong>to</strong> activate <strong>the</strong> changed version, simply EXECUTE <strong>the</strong> macro command; <strong>the</strong> rest<br />
of <strong>the</strong> macro will au<strong>to</strong>matically be included in <strong>the</strong> activation.<br />
12.14 BLOCK MACROS<br />
A block macro is a named collection of P-<strong>STAT</strong> commands and subcommands or data records. It is only necessary<br />
<strong>to</strong> use <strong>the</strong> RUN command with <strong>the</strong> name of <strong>the</strong> macro in order <strong>to</strong> execute <strong>the</strong> entire series of commands. This<br />
section covers:<br />
1. The special features of block macros.<br />
2. SUBFILES controls a loop through a series of commands. The loop is executed once for each subgroup<br />
found in <strong>the</strong> SUBFILES input file. SUBFILES can only be used within a block macro.<br />
3. DIALOG permits a conversation with <strong>the</strong> user. DIALOG is usually, but not necessarily, used within<br />
a macro.<br />
___________________________________________________________________________<br />
Figure 12.8 Defining a Block Macro<br />
MACRO Sales ( Month ) $<br />
/*<br />
TO USE: RUN Sales ( Month )$<br />
For Month substitute <strong>the</strong> 3 letter abbreviation for <strong>the</strong> current month.<br />
*/<br />
The Sales macro is <strong>to</strong> be run on <strong>the</strong> 5th of each month.<br />
Copies of <strong>the</strong> report should be sent immediately <strong>to</strong> all department<br />
heads and <strong>to</strong> Sam Knightbridge, Vice President of Sales.<br />
TITLE 'Sales by Region and Department for <strong>the</strong> Month of &Month, 2010' $<br />
SORT Sales&Month,<br />
BY Region Department,<br />
OUT &MonthSales $<br />
LIST &MonthSales,<br />
TOTALS Dollar.Amounts Sales,<br />
MEANS Dollar.Amounts,<br />
BY Region Department,<br />
TITLES $<br />
ENDMACRO $<br />
___________________________________________________________________________<br />
12.15 Executing a Block Macro<br />
After a block macro is active (i.e., its definition has been read by P-<strong>STAT</strong>), it can be executed by using <strong>the</strong> RUN<br />
command. The RUN command executes <strong>the</strong> entire series of commands defined in <strong>the</strong> macro:<br />
RUN ABC $
P-<strong>STAT</strong> MACROS 12.13<br />
RUN also passes character string arguments <strong>to</strong> <strong>the</strong> macro. The number of arguments depends on <strong>the</strong> macro<br />
definition. The arguments are enclosed in paren<strong>the</strong>sis and can be ei<strong>the</strong>r keyword or positional. This sales macro<br />
has a single keyword dummy argument. When it is executed <strong>the</strong> run command must provide <strong>the</strong> actual value <strong>to</strong><br />
be used for that argument. Given:<br />
MACRO Sales ( State ) $<br />
LIST &State $<br />
ENDMACRO $<br />
The macro is executed by a RUN command such as:<br />
RUN Sales ( NJ) $<br />
___________________________________________________________________________<br />
Figure 12.9 The RUN Command and Partial Output<br />
ECHO $<br />
RUN Sales ( Jan ) $<br />
TITLE 'Sales by Region and Department for <strong>the</strong> Month of Jan, 2010' $<br />
SORT SalesJan,<br />
BY Region Department,<br />
OUT JanSales $<br />
Sort on 4 cases completed.<br />
The largest change in position for any case was 2 positions.<br />
LIST JanSales,<br />
TOTALS Dollar.Amounts Sales,<br />
MEANS Dollar.Amounts,<br />
BY Region Department,<br />
COMMAS$<br />
Sales by Region and Department for <strong>the</strong> Month of Jan, 2010<br />
-- Region : East --<br />
-- Department: Clothing --<br />
Dollar<br />
Sales Amounts<br />
45,265 534,500<br />
25,430 435,005<br />
Department ------ ---------<br />
Total 70,695 969,505<br />
Department ------------<br />
Mean 484,752.50<br />
___________________________________________________________________________
12.16 Macro Substitution Using Strings<br />
Figure 12.8 contains a macro <strong>to</strong> do a monthly report. The macro “Sales” contains a series of commands <strong>to</strong> process<br />
sales records on a monthly basis. It contains a comment section which begins with /* and ends with */ .<br />
The input <strong>to</strong> <strong>the</strong> Sales macro is a P-<strong>STAT</strong> system file with a name such as SalesJan or SalesFeb. In <strong>the</strong> macro<br />
<strong>the</strong> name of <strong>the</strong> input P-<strong>STAT</strong> system file is Sales&Month. “&Month” is a string that will change depending on<br />
<strong>the</strong> report that is needed. When <strong>the</strong> macro is executed, <strong>the</strong> string “&Month" is replaced by <strong>the</strong> argument value<br />
provided in <strong>the</strong> RUN command for <strong>the</strong> dummy argument Month. &(Month) can also be used.<br />
RUN Sales ( Jan ) $<br />
The substitution is done wherever an ampersand (&) is immediately followed by “Month”, <strong>the</strong> dummy argument<br />
in <strong>the</strong> macro definition. Substitution occurs in commands, subcommands and even in data records. The use<br />
of <strong>the</strong> & before each use of <strong>the</strong> dummy argument ensures that <strong>the</strong> substitution is only done where it is intended.<br />
The form of &(Month) can also be used.<br />
Figure 12.9 contains <strong>the</strong> RUN command for <strong>the</strong> Sales macro as well as partial output. Before <strong>the</strong> RUN command,<br />
<strong>the</strong>re is an ECHO command. The reason for using ECHO is <strong>to</strong> see <strong>the</strong> commands after text substitution has<br />
occurred. Note: The comment text is not echoed because it is discarded as it is read.<br />
12.17 Scope of Temporary Scratch Variables<br />
Temporary scratch variables, such as #N, usually are erased when <strong>the</strong> command in which <strong>the</strong>y are created ends;<br />
however, a temporary scratch variable generated in a macro exists for <strong>the</strong> life of that macro. It is, <strong>the</strong>refore, available<br />
for use by all commands in <strong>the</strong> macro. It is erased only when <strong>the</strong> macro exits.<br />
GEN #Hname:C $<br />
PROCESS All [<br />
IF Hid EQ &Hnum, SET #Hname = Hospital, QUITCOMMAND ] $<br />
LIST #Hname [ KEEP Name Age Diagnosis ] $<br />
ENDMACRO $<br />
RUN ListH ( 1 ) $<br />
The PROCESS command is used <strong>to</strong> search for <strong>the</strong> first case which has a value of 1 for variable Hid. The value<br />
of variable Hospital is s<strong>to</strong>red in a character temporary scratch variable and <strong>the</strong> command terminates. When <strong>the</strong><br />
LIST command is scanned by <strong>the</strong> P-<strong>STAT</strong> executive routines <strong>the</strong> value s<strong>to</strong>red in #Hname is substituted for <strong>the</strong><br />
filename. If it is not a legal name for a P-<strong>STAT</strong> file an error occurs.<br />
If a permanent scratch variable is defined for local (within-macro) use, it runs <strong>the</strong> risk of stepping on a permanent<br />
scratch variable of <strong>the</strong> same name used elsewhere in some o<strong>the</strong>r way. Having a temporary scratch variable<br />
be usable across commands within a macro avoids this risk.<br />
12.18 Scratch Variables and Nested Macros<br />
Suppose macro AAA begins with:<br />
SET P(1) = 22 $<br />
GEN ##A = 23 $<br />
GEN #B = 24 $<br />
RUN XXX $<br />
P(1) is now 22 and ##A is 23. Since <strong>the</strong>ir scope is global, macro XXX can use <strong>the</strong>m and get or change <strong>the</strong> values<br />
that macro AAA just set. Also <strong>the</strong> values do not vanish when macro AAA exits.<br />
What about temporary scratch variable #B?<br />
1. as mentioned before, it 'belongs’ <strong>to</strong> macro AAA. It exists for <strong>the</strong> commands within macro AAA. It<br />
vanishes when macro AAA exits. In o<strong>the</strong>r words, its scope is local <strong>to</strong> macro AAA.
P-<strong>STAT</strong> MACROS 12.15<br />
2. It also exists for commands in macros called by macro AAA if <strong>the</strong> called macro does not generate<br />
its own version of #B.<br />
If AAA calls XXX, and XXX uses #B without a GENERATE, it gets and can change <strong>the</strong> #B that belongs <strong>to</strong><br />
macro AAA. If macro XXX does a GENERATE of #B, it now has its own #B, unrelated <strong>to</strong> <strong>the</strong> #B in macro AAA.<br />
This feature can be useful when a macro does some things and calls ano<strong>the</strong>r macro <strong>to</strong> finish <strong>the</strong> task. An example<br />
of this is a DIALOG macro calling a do-<strong>the</strong>-work macro, described later.<br />
12.19 Temporary Files in Macros<br />
Intermediate files produced in macros are often given names that begin with “MACFILE.” <strong>to</strong> indicate that <strong>the</strong>y<br />
are temporary files that are not needed after <strong>the</strong> macro completes. There can be one <strong>to</strong> eight characters after <strong>the</strong><br />
“MACFILE”. These files are referenced by <strong>the</strong>ir names in <strong>the</strong> macro, but <strong>the</strong>y are written on disk with names<br />
composed of <strong>the</strong> P-<strong>STAT</strong> prefix for temporary files, “W_”, and some random characters that are generated <strong>to</strong> produce<br />
a unique name. If you include a FILES $ command within a macro you will see a display like:<br />
---------------au<strong>to</strong>save files: D:\PSFILES--------------------------------<br />
| name current previous |<br />
| |<br />
|#macfil1.sor W_10eZX3.PS1 |<br />
| (# indicates a temporary WORK file) |<br />
-------------------------------------------------------------------------<br />
Temporary macro files are deleted when <strong>the</strong> macro finishes. (O<strong>the</strong>r temporary files are deleted when <strong>the</strong><br />
P-TAT session ends.) Figure 12.10 shows <strong>the</strong> Sales macro with <strong>the</strong> output from <strong>the</strong> sort command as a temporary<br />
file. This is <strong>the</strong>n input <strong>to</strong> <strong>the</strong> LIST command. When <strong>the</strong> ENDMACRO statement is processed, <strong>the</strong> temporary file<br />
is erased.<br />
___________________________________________________________________________<br />
Figure 12.10 Macros: Temporary File Names<br />
MACRO Sales ( Month ) $<br />
TITLE 'Sales by Region and Department for <strong>the</strong> Month of &Month, 2010' $<br />
SORT Sales&Month,<br />
BY Region Department,<br />
OUT MACFILE.sor $<br />
LIST MACFILE.sor,<br />
TOTALS Dollar.Amounts Sales,<br />
MEANS Dollar.Amounts,<br />
BY Region Department,<br />
TITLES $<br />
ENDMACRO $<br />
___________________________________________________________________________<br />
12.20 Subcommands in Macros<br />
Figure 12.11 is a variation on <strong>the</strong> Sales macro with a SURVEY command instead of <strong>the</strong> LIST command. SUR-<br />
VEY requires subcommand information. If <strong>the</strong> table is always <strong>the</strong> same and only <strong>the</strong> file varies, <strong>the</strong>n <strong>the</strong><br />
subcommand records can be included in <strong>the</strong> macro in <strong>the</strong>ir final form. If, however, <strong>the</strong> tables may change, <strong>the</strong>n<br />
<strong>the</strong>re must be provision for substitution of <strong>the</strong> subcommands.
12.16 P-<strong>STAT</strong> MACROS<br />
In this example, <strong>the</strong>re is provision for 2 subcommand records; one <strong>to</strong> provide <strong>the</strong> BANNER (column) information<br />
and one <strong>to</strong> provide <strong>the</strong> STUB (row) information. Each such record is limited <strong>to</strong> 80 characters. The RUN<br />
command for this variation would look like:<br />
RUN Sales( Jan,<br />
BANNER Region,<br />
STUB Department ) $<br />
Note that “BANNER Region” is a single argument replacing “bvar” in <strong>the</strong> macro definition.<br />
__________________________________________________________________________<br />
Figure 12.11 Macros: Supplying Subcommands<br />
MACRO Sales ( Month, bvar, svar )$<br />
TITLE 'Sales by Region and Department for <strong>the</strong> Month &Month, 2010' $<br />
SORT Sales&Month,<br />
BY Region Department,<br />
OUT MACFILE.sor $<br />
SURVEY MACFILE.sor, TITLES;<br />
PLACES PERCENTS 0,<br />
&bvar,<br />
&svar;<br />
$<br />
ENDMACRO $<br />
___________________________________________________________________________<br />
If <strong>the</strong> macro does not supply <strong>the</strong> subcommand punctuation as:<br />
SURVEY MACFILE.sor, TITLES;<br />
PLACES PERCENTS 0,<br />
&bvar &svar<br />
<strong>the</strong>n that punctuation must be in <strong>the</strong> arguments provided in <strong>the</strong> RUN command. Since <strong>the</strong> punctuation is meaningful<br />
<strong>to</strong> <strong>the</strong> RUN command, <strong>the</strong>se arguments must be enclosed in quotes.<br />
RUN Sales( Jan,<br />
'BANNER Region,',<br />
'STUB Department;' ) $<br />
In this type of situation, <strong>the</strong> block macro might well be designed <strong>to</strong> use instream macros for <strong>the</strong> subcommand<br />
definitions. Instream macros are covered in <strong>the</strong> previous chapter. Quotes around <strong>the</strong> arguments are stripped off<br />
and <strong>the</strong> contents of <strong>the</strong> quotes are substituted for <strong>the</strong> arguments in <strong>the</strong> macro. This means that you must use double<br />
quotes if you wish <strong>to</strong> pass a quoted string. For example:<br />
MACRO ttt ( t ) $<br />
TITLE &t $<br />
LIST sales.jan, TITLES $<br />
ENDMACRO $<br />
RUN ttt ( '".DATE."' ) $<br />
If <strong>the</strong> TITLE command itself contains <strong>the</strong> quotes:<br />
TITLE '&t' $<br />
The RUN command can be entered without quotes:<br />
RUN ttt ( .DATE. ) $
P-<strong>STAT</strong> MACROS 12.17<br />
___________________________________________________________________________<br />
Figure 12.12 Macro with Conditional Execution<br />
MACRO bvar $<br />
/* BANNER aa bb cc, */<br />
ENDMACRO $<br />
MACRO svar $<br />
/* STUB v1 v2 v3 */<br />
ENDMACRO $<br />
MACRO Sales.Report ( Month ) $<br />
/*<br />
TO USE:<br />
1. GENERATE ##CONTROL:C = 'LIST', 'SURVEY', or 'BOTH'<br />
2. RUN Sales.Report ( abc ) $<br />
For abc substitute <strong>the</strong> 3 letter abbreviation for <strong>the</strong> current month.<br />
If you are requesting a SURVEY you must supply stub variables in MACRO<br />
svar and (optionally) banner variables in MACRO bvar.<br />
*/<br />
TITLE 'Sales by Region and Department for <strong>the</strong> Month &Month, 2010 ' $<br />
SORT Sales&Month, BY Region Department, OUT MACFILE.sor $<br />
IF ##CONTROL EQ 'LIST' BRANCH Step1 $<br />
IF ##CONTROL EQ 'SURVEY' BRANCH Step2 $<br />
IF ##CONTROL NE 'BOTH' THEN;<br />
PUT 'Macro Sales.Report: ##CONTROL must be set <strong>to</strong> LIST, SURVEY or BOTH' ;<br />
BRANCH Finish ;<br />
ENDIF $<br />
Step1: LIST MACFILE.sor, TOTALS Dollar.Amounts Sales,<br />
MEANS Dollar.Amounts,<br />
BY Region Department,<br />
TITLES $<br />
IF ##CONTROL NE 'BOTH' BRANCH Finish $<br />
Step2: SURVEY MACFILE.sor, TITLES;<br />
PLACES PERCENTS 0,<br />
!!bvar<br />
!!svar ;<br />
$<br />
Finish: ENDMACRO $<br />
__________________________________________________________________________
12.18 P-<strong>STAT</strong> MACROS<br />
12.21 Conditional Execution of Commands<br />
Macro Sales.Report in Figure 12.11 is an enhanced version of macro Sales. When you execute this macro,<br />
you not only choose your file but which commands you wish <strong>to</strong> execute. The choice in this example is ei<strong>the</strong>r a<br />
LIST command, a SURVEY command or both <strong>the</strong> LIST and <strong>the</strong> SURVEY. The choice is made by setting a permanent<br />
system variable before running <strong>the</strong> macro. The macro in Figure 12.12 tests <strong>the</strong> scratch variable and<br />
branches <strong>to</strong> <strong>the</strong> desired command.<br />
GENERATE ##CONTROL:C = 'LIST' $<br />
RUN Sales.Report ( Jan ) $<br />
produces a report with just <strong>the</strong> LIST command. The following commands:<br />
GENERATE ##CONTROL = 'BOTH' $<br />
MACRO svar $<br />
STUB Department Region;<br />
ENDMACRO $<br />
RUN Sales.Report ( Jan ) $<br />
produce a report with a LIST and <strong>the</strong>n a SURVEY containing two 1-way tables.<br />
It is <strong>the</strong> BRANCH <strong>PPL</strong> instruction which transfers control <strong>to</strong> <strong>the</strong> appropriate set of commands. BRANCH is<br />
followed by <strong>the</strong> label of <strong>the</strong> next command <strong>to</strong> be executed. That label must be at <strong>the</strong> beginning of a command line<br />
followed by a colon (:). BRANCH can be used in any command stream <strong>to</strong> bypass commands.<br />
__________________________________________________________________________<br />
Figure 12.13 Macros: Reversing <strong>the</strong> Order of Execution<br />
MACRO Sales.Report ( Month ) $<br />
SORT Sales&Month, BY Region Department, OUT MACFILE.sor $<br />
IF ##CONTROL AMONG ( 'LIST' 'BOTH' ) BRANCH Step1 $<br />
IF ##CONTROL AMONG ( 'SURVEY' 'REVERSE' ) BRANCH Step2 $<br />
PUT 'Macro Sales: Invalid value for ##CONTROL' $<br />
BRANCH Finish $<br />
Step1: LIST MACFILE.sor, TOTALS Dollar.Amounts Sales,<br />
MEANS Dollar.Amounts,<br />
BY Region Department,<br />
TITLES $<br />
IF ##CONTROL AMONG ( 'LIST' 'REVERSE' ) BRANCH Finish $<br />
Step2: SURVEY MACFILE.sor, TITLES;<br />
PLACES PERCENTS 0,<br />
!!bvar<br />
!!svar ;<br />
$<br />
IF ( ##CONTROL EQ 'REVERSE' ) BRANCH Step1 $<br />
Finish: ENDMACRO $<br />
___________________________________________________________________________
P-<strong>STAT</strong> MACROS 12.19<br />
In a macro, BRANCH can be used <strong>to</strong> ei<strong>the</strong>r bypass commands or <strong>to</strong> branch back and execute commands that<br />
occur earlier in <strong>the</strong> macro. Thus it is easy <strong>to</strong> change Sales.Report so that <strong>the</strong> order of <strong>the</strong> report, LIST and <strong>the</strong>n<br />
SURVEY or SURVEY and <strong>the</strong>n LIST, is also controlled. This requires only <strong>the</strong> ability <strong>to</strong> branch around <strong>the</strong> LIST<br />
and <strong>the</strong>n possibly <strong>to</strong> branch back. Figure 12.13 contains <strong>the</strong> changes needed <strong>to</strong> add this option.<br />
12.22 DIALOG<br />
DIALOG is a <strong>PPL</strong> function which can be used anywhere but is especially useful when you wish <strong>to</strong> make a macro<br />
easy for someone else <strong>to</strong> use. If <strong>the</strong> macro is designed correctly with DIALOG, it can be run interactively by a<br />
user who knows little more than <strong>the</strong> names of <strong>the</strong> macro and <strong>the</strong> files or variables he wishes <strong>to</strong> select. With DIA-<br />
LOG in place <strong>the</strong> RUN command for this version of <strong>the</strong> Sales Report macro is simply:<br />
RUN sales.report $<br />
Using <strong>the</strong> macro in Figure 12.14, <strong>the</strong> following messages <strong>the</strong>n appear on <strong>the</strong> screen. User replies are in bold-faced<br />
type:<br />
-------------------------------------------------<br />
Enter <strong>the</strong> three letter abbreviation for <strong>the</strong> month<br />
feb<br />
Enter one of <strong>the</strong> numbers 1-4 for <strong>the</strong>se choices<br />
1: LIST command only<br />
2: LIST and SURVEY commands<br />
3: SURVEY command only<br />
4: SURVEY and LIST commands<br />
4<br />
Enter <strong>the</strong> names of your column (banner) variables<br />
region department<br />
Enter <strong>the</strong> names of your stub (row) variables<br />
item1 TO item10<br />
___________________________________________________________________________<br />
Figure 12.14 Macros: DIALOG Provides an Interactive Front End<br />
MACRO Sales.Report $<br />
GEN ##Reply, GEN #Mon:c3 $<br />
GEN #bvar:c78 =' ', GEN #svar:c78 =' ' $<br />
GEN #bvar2:c80=' ', GEN #svar2:c80=' ' $<br />
Prompt1: DIALOG #Mon<br />
'-------------------------------------------------'<br />
'Enter <strong>the</strong> three letter abbreviation for <strong>the</strong> month'<br />
HELP 'Expected abbreviations include'<br />
'jan feb mar apr may jun jul aug sep oct nov dec' $<br />
IF .RESPONSE. EQ 0 OR .RESPONSE. EQ -9 BRANCH Finish $<br />
IF .RESPONSE. NE 14 BRANCH Prompt1 $<br />
IF #Mon NOTAMONG ( 'jan' 'feb' 'mar' 'apr' 'may' 'jun'<br />
'jul' 'aug' 'sep' 'oct' 'nov' 'dec' )<br />
BRANCH Prompt1 $<br />
Prompt2: DIALOG ##Reply ' '<br />
'Enter one of <strong>the</strong> numbers 1-4 for <strong>the</strong>se choices'
12.20 P-<strong>STAT</strong> MACROS<br />
'1: LIST command only'<br />
'2: LIST and SURVEY commands'<br />
'3: SURVEY command only'<br />
'4: SURVEY and LIST commands' $<br />
IF .RESPONSE. EQ 0 BRANCH Finish $<br />
IF .RESPONSE. NE 1 BRANCH Prompt2 $<br />
IF ##REPLY LT 1 .OR. ##REPLY GT 4 BRANCH Prompt2 $<br />
IF ##REPLY EQ 1 BRANCH Do.it $<br />
Prompt3: DIALOG #bvar<br />
'Enter <strong>the</strong> names of your column (banner) variables' $<br />
IF .RESPONSE. NOTAMONG ( -2 14 16 ) BRANCH Prompt3 $<br />
IF .RESPONSE. NE -2 SET #bvar2 = 'BAN' /// LRTRIM ( #bvar ) // ',' $<br />
Prompt4: DIALOG #svar<br />
'Enter <strong>the</strong> names of your stub (row) variables' $<br />
IF .RESPONSE. NOTAMONG ( -2 14 16 ) BRANCH Prompt4 $<br />
IF .RESPONSE. NE -2 SET #svar2 = 'STUB' /// LRTRIM ( #svar ) // ',' $<br />
Do.it: RUN Report.Step2 ( #mon, #bvar2, #svar2 ) $<br />
FINISH: ENDMACRO $<br />
MACRO Report.Step2 ( Month, bvar, svar ) $<br />
TITLE 'Sales by Region and Department for <strong>the</strong> Month &Month, 2010 ' $<br />
SORT Sales&Month, BY Region Department, OUT MACFILE.sor $<br />
IF ##REPLY EQ 1 OR ##REPLY EQ 2 BRANCH Step1 $<br />
IF ##REPLY EQ 3 OR ##REPLY EQ 4 BRANCH Step2 $<br />
Step1: LIST MACFILE.sor, TOTALS Dollar.Amounts Sales,<br />
MEANS Dollar.Amounts,<br />
BY Region Department,<br />
TITLES $<br />
IF ##REPLY EQ 1 OR ##REPLY EQ 4 BRANCH Finish $<br />
Step2: SURVEY MACFILE.sor, TITLES;<br />
PLACES PERCENTS 0,<br />
&bvar<br />
&svar ;<br />
$<br />
IF ##REPLY EQ 4 BRANCH Step1 $<br />
Finish: ENDMACRO $<br />
__________________________________________________________________________
P-<strong>STAT</strong> MACROS 12.21<br />
In order <strong>to</strong> supply this friendly front end, <strong>the</strong> Sales.Report macro is rewritten as “Report.Step2” and a new<br />
Sales.Report macro is designed which prompts for <strong>the</strong> information it needs. It uses this information <strong>to</strong> build <strong>the</strong><br />
RUN command for Report.Step2. Figure 12.14 lists <strong>the</strong> new Sales.Report macro and <strong>the</strong> revised Report.Step2.<br />
Report.Step2 is very like <strong>the</strong> previous Sales.Report except that <strong>the</strong> character ##CONTROL variable is replaced<br />
by <strong>the</strong> use of <strong>the</strong> ##REPLY numeric scratch variable. The SURVEY command is also slightly changed so<br />
that <strong>the</strong> user need only know <strong>the</strong> names of <strong>the</strong> variables that define <strong>the</strong> rows and columns ra<strong>the</strong>r than rewrite <strong>the</strong><br />
supporting instream bvar and svar macros.<br />
There is a great deal of overhead in a DIALOG macro if you wish <strong>to</strong> provide for all <strong>the</strong> possible responses<br />
that a user may make. There should be provisions for QUIT. Help text and tests for appropriate replies should be<br />
provided whenever possible.<br />
12.23 Format of <strong>the</strong> DIALOG command<br />
The DIALOG command has a scratch variable and some number of lines of text enclosed in quotes. The<br />
scratch variable is required only if a reply is expected. Each line of text is displayed on a separate line on <strong>the</strong><br />
terminal. The lines of text can contain scratch variables. If so, <strong>the</strong>ir current values are displayed.<br />
Optional HELP text is also part of <strong>the</strong> DIALOG command. This is not displayed unless <strong>the</strong> user requests it<br />
by entering ei<strong>the</strong>r “H” or “HELP” in reply <strong>to</strong> <strong>the</strong> prompt. The keyword “HELP” separates <strong>the</strong> normal DIALOG<br />
text from <strong>the</strong> HELP text. In Figure 12.14, <strong>the</strong> first DIALOG command:<br />
Prompt1: DIALOG #Mon<br />
'-------------------------------------------------'<br />
'Enter <strong>the</strong> three letter abbreviation for <strong>the</strong> month'<br />
HELP 'Expected abbreviations include'<br />
'jan feb mar apr may jun jul aug sep oct nov dec' $<br />
contains a scratch variable, 2 lines of text, and <strong>the</strong> HELP key word followed by 2 lines of help text. Note: <strong>the</strong><br />
scratch variable must be created before <strong>the</strong> DIALOG command.<br />
There are two mechanisms for examining a user reply. The first is <strong>the</strong> user reply which is s<strong>to</strong>red in <strong>the</strong> DIA-<br />
LOG scratch variable. The second is a numeric system variable .RESPONSE. which contains a code indicating<br />
<strong>the</strong> type of <strong>the</strong> reply. .RESPONSE. is set each time DIALOG is executed. .RESPONSE. values are:<br />
negative: no response, or an invalid response:<br />
-2 = entirely blank<br />
-4 = H or HELP, but <strong>the</strong> dialog had no help text<br />
-6 = 'abc' for a numeric scratch variable, or such<br />
-8 = a scratch variable was not supplied<br />
-9 = in batch mode<br />
zero: <strong>the</strong> response was Q or QUIT<br />
positive: a valid response:<br />
1 = integer, like 1990<br />
2 = non-integer, like 3.1416<br />
11 = Y or YES<br />
12 = N or NO<br />
14 = character response o<strong>the</strong>r than yes/no/quit<br />
that is a legal p-stat name or label<br />
16 = o<strong>the</strong>r character response<br />
In Figure 12.14 <strong>the</strong> code which looks at <strong>the</strong> user reply first checks <strong>to</strong> see whe<strong>the</strong>r QUIT was entered and <strong>to</strong><br />
make sure that <strong>the</strong> macro is not being inappropriately used in a batch run.<br />
IF .RESPONSE. EQ 0 OR .RESPONSE. EQ -9 BRANCH Finish $
12.22 P-<strong>STAT</strong> MACROS<br />
The next check is <strong>to</strong> make sure that <strong>the</strong> response is a single word:<br />
IF .RESPONSE. NE 14 BRANCH Prompt1 $<br />
The final check tests <strong>the</strong> character scratch variable #Mon <strong>to</strong> make sure that it is one of <strong>the</strong> 12 months.<br />
IF #Mon NOTAMONG ( 'jan' 'feb' 'mar' 'apr' 'may' 'jun'<br />
'jul' 'aug' 'sep' 'oct' 'nov' 'dec' )<br />
BRANCH Prompt1 $<br />
These checks are not as complete and informative as <strong>the</strong>y might be. In <strong>the</strong> example above <strong>the</strong> BRANCH<br />
might better have been preceded by:<br />
PUT '<strong>Inc</strong>orrect reply. Use H <strong>to</strong> get a list of <strong>the</strong> 3 character months',<br />
The following provides a more complete diagnostic of <strong>the</strong> problem when DIALOG is run in a batch job:<br />
IF .RESPONSE. EQ -9 PUT <br />
<br />
<br />
;<br />
GO TO FINISH;<br />
Note <strong>the</strong> use of ##REPLY which is a permanent scratch variable when <strong>the</strong> o<strong>the</strong>r scratch variables in <strong>the</strong><br />
Sales.Report macro are generated with a single # sign as temporary scratch variables. If ##REPLY is generated<br />
as a temporary scratch variable, Report.Step2 cannot be run as a standalone macro without <strong>the</strong> front end dialog.<br />
The o<strong>the</strong>r information that it needs, <strong>the</strong> month and <strong>the</strong> stub and banner variables are passed <strong>to</strong> it as arguments and<br />
it does not matter whe<strong>the</strong>r <strong>the</strong> RUN command comes from <strong>the</strong> dialog macro or from a standalone RUN command.<br />
With ##REPLY as a permanent scratch variable, <strong>the</strong> macro can be run interactively with a dialog or in a batch<br />
command stream.<br />
The o<strong>the</strong>r three prompt sections in Figure 12.14 are all similar <strong>to</strong> <strong>the</strong> first prompt section. In each case <strong>the</strong><br />
essentials are in place, but improvements could be made <strong>to</strong> <strong>the</strong> error handling.<br />
There is one tricky piece in preparing <strong>the</strong> character string arguments for <strong>the</strong> Report.Step2 macro. The problem<br />
occurs when passing a character string <strong>to</strong> a macro if that character string contains a comma. If a string is not<br />
enclosed in quotes when it is given <strong>to</strong> <strong>the</strong> RUN command <strong>the</strong> comma which is needed in <strong>the</strong> macro instead serves<br />
as a delimiter between <strong>the</strong> arguments of <strong>the</strong> RUN command. If it is enclosed in quotes, <strong>the</strong> quotes are stripped off<br />
by <strong>the</strong> MACRO command after <strong>the</strong> string is properly s<strong>to</strong>red.<br />
Because quotes are stripped off as <strong>the</strong> RUN command is processed, a string that requires quotes within <strong>the</strong><br />
macro must be enclosed in double quotes or angle brackets. For example<br />
MACRO small ( t ) $<br />
TITLES &t $<br />
LIST myfile, TITLES $<br />
ENDMACRO $<br />
RUN small ( ) $<br />
12.24 Does <strong>the</strong> File Exist<br />
A user friendly macro can also check that <strong>the</strong> files, which are referenced in <strong>the</strong> macro, exist, and if <strong>the</strong>y do not<br />
provide a reasonable error message. The P-<strong>STAT</strong> command INQUIRE.EXTERNAL is used <strong>to</strong> test <strong>the</strong> existence<br />
of a given external file. It sets a system variable, .XINQUIRE. <strong>to</strong> 1 if <strong>the</strong> file is <strong>the</strong>re and <strong>to</strong> 0 if it is not <strong>the</strong>re. If<br />
<strong>the</strong> Sales.Report macro used a labels file named 'report.lab' we could check its existence:<br />
GEN #LABNAME = "'report.lab'" $<br />
INQUIRE.EXTERNAL #LABNAME $<br />
IF .XINQUIRE. EQ 1, BRANCH OK $<br />
DIALOG 'Labels file #LABNAME is needed.',<br />
OK: etc.
P-<strong>STAT</strong> MACROS 12.23<br />
The existence of a P-<strong>STAT</strong> system file can also be tested. INQUIRE ABC $ set .INQUIRE. <strong>to</strong> 1 if it exists,<br />
and <strong>to</strong> zero if it does not.<br />
12.25 SUBFILES<br />
The SUBFILES command is a major feature which is only available within macros. SUBFILES provides a BY<br />
capability for all <strong>the</strong> commands within its provenance. SUBFILES is similar <strong>to</strong> MACROS in that its domain begins<br />
and ends with a P-<strong>STAT</strong> command. For SUBFILES, <strong>the</strong> ending command is ENDSUBFILES $.<br />
Figure 12.15 Macro With SUBFILES<br />
___________________________________________________________________________<br />
MACRO Sales ( Month )$<br />
SUBFILES Sales&Month, BY Region $<br />
SORT SUBFILE, BY Department, OUT Work $<br />
TITLES 'Report by #Region for &Month 2010' $<br />
LIST Work, BY Department,<br />
NO.CASES, TOTALS,<br />
TITLES $<br />
ENDSUBFILES $<br />
ENDMACRO $<br />
___________________________________________________________________________<br />
Figure 12.15 contains <strong>the</strong> commands for a simple macro with a SUBFILES command. The macro prints a<br />
separate report for each value of <strong>the</strong> BY variable Region. The file does NOT have <strong>to</strong> be sorted on <strong>the</strong> BY variable.<br />
The TITLES command refers <strong>to</strong> #REGION, a scratch variable that appears <strong>to</strong> be undefined. That is because it is<br />
defined behind <strong>the</strong> scenes by <strong>the</strong> SUBFILES command. For every BY variable a scratch variable is created which<br />
has <strong>the</strong> same name as <strong>the</strong> BY variable with <strong>the</strong> single # prefix. These scratch variables contain <strong>the</strong> current value<br />
of each BY variable as <strong>the</strong> SUBFILE iterations are done.<br />
The input <strong>to</strong> <strong>the</strong> SUBFILES command is a P-<strong>STAT</strong> system file. This will usually be <strong>the</strong> only time that file<br />
is referenced in <strong>the</strong> SUBFILES loop. The file name "SUBFILE" is used <strong>to</strong> refer <strong>to</strong> <strong>the</strong> current subgroup that is<br />
being processed regardless of <strong>the</strong> original input file name.<br />
It is <strong>the</strong> value of <strong>the</strong> scratch variables which allow you <strong>to</strong> easily identify which of <strong>the</strong> many possible subgroups<br />
is currently being processed. These are temporary scratch variables but since <strong>the</strong>y are defined within a MACRO<br />
command, <strong>the</strong>y exist as long as <strong>the</strong> macro is being processed. This means that <strong>the</strong> values are available throughout<br />
<strong>the</strong> subfile process. These scratch variables can be used in TITLES and in PUT statements.<br />
The SUBFILES command needs <strong>to</strong> know <strong>the</strong> name of <strong>the</strong> input file and <strong>the</strong> names of <strong>the</strong> BY variables There<br />
can be up <strong>to</strong> 15 different BY variables and <strong>the</strong>re can be a mixture of numeric and character variables. Thus it is<br />
possible <strong>to</strong> have hundreds of different groups defined by all possible combinations of <strong>the</strong> by group variables. For<br />
each such group a pass is made through all <strong>the</strong> commands within <strong>the</strong> current SUBFILE.<br />
12.26 SUBFILES Optional Identifiers<br />
The SUBFILES command has several optional identifiers. Usually <strong>the</strong> groups are presented in <strong>the</strong> order in<br />
which <strong>the</strong>y are encountered in <strong>the</strong> data file. This can be controlled by using <strong>the</strong> identifiers UP or DOWN. The<br />
following illustrates how <strong>the</strong>se identifiers work with two BY variables, one numeric and one character. The first<br />
pair of columns is <strong>the</strong> order in which <strong>the</strong> initial case of each subgroup is found in <strong>the</strong> input file. The second pair<br />
of columns is <strong>the</strong> way <strong>the</strong> groups are organized if UP is used. The third pair of columns illustrates <strong>the</strong> DOWN<br />
order.
12.24 P-<strong>STAT</strong> MACROS<br />
Natural Order UP Order DOWN Order<br />
2 West 1 North 3 South<br />
1 North 1 West 3 East<br />
3 South 2 East 2 West<br />
2 East 2 North 2 South<br />
1 West 2 South 2 North<br />
2 North 2 West 2 East<br />
3 East 3 East 1 West<br />
2 South 3 South 1 North<br />
FREQUENCIES is a SUBFILES identifier that causes <strong>the</strong> groups <strong>to</strong> be displayed according <strong>to</strong> <strong>the</strong> number of<br />
cases in <strong>the</strong> group. When FREQUENCIES is used, DOWN is assumed unless UP is specified. When DOWN is<br />
used <strong>the</strong> group with <strong>the</strong> largest number of cases is first. When UP is used that group is last.<br />
SUBFILES Myfile, BY Age Region, FREQUENCIES, UP $<br />
Character variables are considered a match if <strong>the</strong> characters are identical even if <strong>the</strong> case of <strong>the</strong> characters is<br />
different. The identifier EXACT can be used. When EXACT is used, a value will be considered part of a new<br />
group unless all <strong>the</strong> characters are exactly <strong>the</strong> same in every way. South is different from SOUTH which is different<br />
from south, and so on.<br />
The final identifier is <strong>the</strong> GROUPS identifier which is followed by <strong>the</strong> name of a file of group definitions.<br />
Usually this file is generated for you and stays behind <strong>the</strong> scenes. Figure 12.16 shows <strong>the</strong> commands that are actually<br />
executed when <strong>the</strong> Sales macro with SUBFILES is run.<br />
There are two commands that do <strong>the</strong> work in a SUBFILES loop. LOCATE.GROUPS reads through <strong>the</strong> input<br />
file and determines how many groups <strong>the</strong>re are and how many cases are in each group. It also notes where <strong>the</strong>first<br />
and last cases in each group are located in <strong>the</strong> file. A GROUPS file from <strong>the</strong> LOCATE.GROUPS command with<br />
a single BY variable with just 2 values might look like:<br />
Number<br />
First Last of Compare<br />
case case cases Region mode<br />
1 22 15 West not exact<br />
6 24 9 East not exact<br />
Figure 12.16 The SUBFILE Commands<br />
___________________________________________________________________________<br />
SUBFILES Salesfeb, BY Region $<br />
LOCATE.GROUPS Salesfeb, by Region,<br />
verbosity 1, groups WORK0032 $<br />
SUBNEXT Salesfeb [ cases 1111 <strong>to</strong> 9999],<br />
groups WORK0032, out subfile $<br />
TITLES 'Report by #Region for feb 2010' $<br />
SORT SUBFILE, BY Department, OUT WORK $<br />
LIST WORK, BY Department, TITLES, TOTALS$<br />
Report by West for feb 2010<br />
-- Department: Clothing --
P-<strong>STAT</strong> MACROS 12.25<br />
( Rest of report for West follows )<br />
ENDSUBFILES $<br />
SUBNEXT Salesfeb [ cases 1111 <strong>to</strong> 9999],<br />
groups WORK0032, out subfile $<br />
TITLES 'Report by #Region for feb 2010' $<br />
SORT SUBFILE, BY Department, OUT WORK $<br />
LIST WORK, BY Department, TITLES, TOTALS$<br />
Report by East for feb 2010<br />
-- Department: Clothing --<br />
( Rest of report for East follows )<br />
ENDSUBFILES $<br />
MACDONE$<br />
___________________________________________________________________________<br />
12.27 SUBFILES Looping<br />
The second command, SUBNEXT, controls <strong>the</strong> looping. It keeps track of <strong>the</strong> current group and, using <strong>the</strong><br />
GROUPS file from <strong>the</strong> LOCATE.GROUPS command, creates a subset of <strong>the</strong> original data file which contains<br />
just <strong>the</strong> members of <strong>the</strong> current group. The SUBNEXT command which appears <strong>to</strong> be:<br />
SUBNEXT SalesFeb [ CASES 1111 TO 9999 ] is executed as if it were<br />
SUBNEXT SalesFeb [ CASES 1 TO 22 ] for <strong>the</strong> first group and<br />
SUBNEXT SalesFeb [ CASES 6 TO 24 ] for <strong>the</strong> second group.<br />
This enables <strong>the</strong> SUBNEXT command <strong>to</strong> work very efficiently, especially if <strong>the</strong> file is already partially or fully<br />
sorted.<br />
The output file from SUBNEXT is always written <strong>to</strong> a file with <strong>the</strong> name "SUBFILE". This explains why<br />
<strong>the</strong> input <strong>to</strong> <strong>the</strong> SORT in Figure 12.16 is file "subfile". It is not a magic name out of nowhere, it is an actual temporary<br />
file that is created <strong>to</strong> contain <strong>the</strong> current subgroup.<br />
The SUBNEXT command is internal <strong>to</strong> SUBFILES and cannot be executed by a user. The LO-<br />
CATE.GROUPS command, on <strong>the</strong> o<strong>the</strong>r hand, can be executed at any time during a run and provides an easy way<br />
<strong>to</strong> determine <strong>the</strong> number of cases in <strong>the</strong> subgroups of a file. You can run <strong>the</strong> LOCATE.GROUPS command before<br />
a SUBFILE loop. The final identifier <strong>to</strong> <strong>the</strong> SUBFILES command is <strong>the</strong> GROUPS identifier which, if used, requires<br />
<strong>the</strong> name of a GROUPS output file from a previous LOCATE.GROUPS command.<br />
Because <strong>the</strong> GROUPS file is a P-<strong>STAT</strong> system file which can itself be modified with <strong>PPL</strong> <strong>the</strong>re is yet fur<strong>the</strong>r<br />
control over <strong>the</strong> groups that are processed. For example:<br />
MACRO Grouper $<br />
LOCATE.GROUPS Myfile, BY County State, GROUPS MyGroups $<br />
SUBFILES Myfile, BY County State,<br />
GROUPS MyGroups [ if Number.of.cases LT 20, EXCLUDE ] $<br />
This will cause all <strong>the</strong> small groups <strong>to</strong> be omitted from <strong>the</strong> rest of <strong>the</strong> SUBFILE loop.<br />
The SUBFILES identifiers UP and DOWN apply <strong>to</strong> all <strong>the</strong> BY variables. If <strong>the</strong> order that you want is UP on<br />
one variable and DOWN on ano<strong>the</strong>r, that can be accomplished by using <strong>the</strong> LOCATE.GROUPS command followed<br />
by a SORT command.<br />
LOCATE.GROUPS Myfile, BY Sales State, GROUPS Mygroup $
12.26 P-<strong>STAT</strong> MACROS<br />
SORT Mygroup, BY Sales (D) State (U), OUT Mygroup $<br />
SUBFILES Myfile, GROUPS Mygroup $<br />
If you use LOCATE.GROUPS <strong>to</strong> create your own GROUPS file, you may use any of <strong>the</strong> SUBFILES identifiers<br />
UP, DOWN, FREQUENCIES, and EXACT in <strong>the</strong> LOCATE.GROUPS command. However, if you provide<br />
your own GROUPS file <strong>to</strong> <strong>the</strong> SUBFILES command, you cannot use BY, UP, DOWN, FREQUENCIES or EX-<br />
ACT in that SUBFILES command.<br />
If <strong>the</strong> LOCATE.GROUPS command has <strong>PPL</strong> which deletes cases, <strong>the</strong> GROUPS file no longer describes <strong>the</strong><br />
original input file. If you <strong>the</strong>n use SUBFILES with <strong>the</strong> new GROUPS file and <strong>the</strong> original input file, <strong>the</strong> cases<br />
selected will not be <strong>the</strong> correct cases. The solution is easy. Add <strong>the</strong> OUT identifier <strong>to</strong> <strong>the</strong> LOCATE.GROUPS<br />
command <strong>to</strong> produce a file that corresponds <strong>to</strong> <strong>the</strong> GROUPS file.<br />
LOCATE.GROUPS Myfile [IF Department LT 10, EXCLUDE],<br />
BY Sales State, GROUPS Mygroup, OUT Temp $<br />
SORT Mygroup, BY Sales (D) State (U), OUT Mygroup $<br />
SUBFILES Temp, GROUPS Mygroup $<br />
12.28 SUBFILES System Variables<br />
There are three system variables that are set by <strong>the</strong> SUBFILES command.<br />
1. .SUBFILEPASS.counts <strong>the</strong> number of times through <strong>the</strong> subfile loop.<br />
2. .SUBFILEMAX.<strong>the</strong> <strong>to</strong>tal number of iterations <strong>to</strong> be done. This is <strong>the</strong> same as <strong>the</strong> <strong>to</strong>tal number of<br />
groups.<br />
3. .SUBFILECASES<strong>the</strong> number of cases in <strong>the</strong> current group.<br />
These variables can be used <strong>to</strong> provide different paths depending on <strong>the</strong>ir values. The following is a simplistic<br />
macro which uses all three variables:<br />
MACRO Counter $<br />
GEN #Big $<br />
SUBFILES Myfile, BY Region $<br />
IF .SUBFILEPASS. EQ 1, SET #BIG = 0, PUT $<br />
IF .SUBFILECASES GT 260 INCREASE #BIG $<br />
IF .SUBFILEPASS. EQ .SUBFILEMAX.<br />
PUT #BIG >$<br />
ENDSUBFILES $<br />
ENDMACRO $
P-<strong>STAT</strong> MACROS 12.27<br />
MACRO<br />
SUMMARY<br />
The MACRO command provides a name for a collection of P-<strong>STAT</strong> text. There are two type of macros.<br />
Block macros contain one or more P-<strong>STAT</strong> commands. They are executed with <strong>the</strong> RUN command.<br />
Instream macros contains pieces of command, programming language (<strong>PPL</strong>), subcommands or data.<br />
A block macro is a named collection of P-<strong>STAT</strong> commands and data records. The MACRO command<br />
supplies a name for <strong>the</strong> macro and defines any macro arguments. The arguments can be ei<strong>the</strong>r keyword<br />
or positional. It is followed by one or more P-<strong>STAT</strong> commands, subcommands and data records. A macro<br />
is nei<strong>the</strong>r checked for syntax nor executed when it is defined.<br />
The RUN command executes <strong>the</strong> entire series of commands that comprise <strong>the</strong> macro. The RUN command<br />
passes <strong>the</strong> true values for each of <strong>the</strong> macro arguments. If <strong>the</strong> arguments are keywords, substitution<br />
is done whenever <strong>the</strong> keyword preceded by an ampersand (&month) is found in <strong>the</strong> macro text. When<br />
<strong>the</strong> arguments are positional, substitution is done for &1, &2, etc.<br />
File names in macros, prefaced with “MACFILE.”, are temporary files that disappear after <strong>the</strong> macro<br />
finishes.<br />
A set of macro definitions can be created and modified in a simple ASCII file using a text edi<strong>to</strong>r. The<br />
macros are <strong>the</strong>n made available <strong>to</strong> a P-<strong>STAT</strong> run by doing a TRANSFER <strong>to</strong> that file.<br />
A macro appears in <strong>the</strong> P-<strong>STAT</strong> edi<strong>to</strong>r as a single command. Its commands are s<strong>to</strong>red as data records <strong>to</strong><br />
<strong>the</strong> macro command. A macro can be edited just as any o<strong>the</strong>r command is edited. It must <strong>the</strong>n be reexecuted<br />
(X) from within <strong>the</strong> edi<strong>to</strong>r for <strong>the</strong> changes <strong>to</strong> take effect.<br />
Macros support both keyword and positional arguments. Default values can be provided. If defaults are<br />
not provided in <strong>the</strong> macro definition, values must be supplied when <strong>the</strong> macro is used.<br />
MACRO rrr ( file, vars) $<br />
CORRELATE &file [ KEEP &vars ], OUT work1$<br />
LIST work1 $<br />
ENDMACRO$<br />
Instream macros are executed by providing <strong>the</strong> name preceeded by !! (two exclamation points).<br />
MACRO survey.def $<br />
layout question <strong>to</strong>tals labels body missing summary,<br />
places means 3, places percents 2,<br />
row.<strong>to</strong>tals on right,<br />
ENDMACRO $<br />
SURVEY PsFile;<br />
!!survey.def<br />
BANNER Age Education, STUB Q1 TO Q43;<br />
$<br />
MACRO Sales ( Month, Region ) ( jan, east ) $<br />
MACRO Sales ( 2 ) ( jan, '' ) $<br />
RUN rrr ( data1, age income education ) $
12.28 P-<strong>STAT</strong> MACROS<br />
Required:<br />
MACRO name $<br />
Optional Identifier:<br />
PAD nn<br />
This specifies <strong>the</strong> default padding for instream records as <strong>the</strong>y are inserted.<br />
ENDMACRO<br />
RUN<br />
ENDMACRO $ ends <strong>the</strong> macro definition.<br />
RUN SALARY $<br />
RUN SALES ( Sept ) $<br />
The run command causes a block macro <strong>to</strong> be executed. Argument substitution is supported.<br />
FULL.MACRO.ARGS<br />
All arguments must be supplied when a macro is called. An argument can be a replacement value, a null<br />
value, or a comma if default values are available. This is <strong>the</strong> default.<br />
FULL.MACRO.ARGS OFF<br />
The commas for trailing arguments need not be supplied if defaults are available.<br />
COUNT.MACROS<br />
COUNT.MACROS simply reports how many macros have been read, how many had errors, and how<br />
many are usable. Since TRANSFER only reports each macro activation when verbosity is 4, using<br />
COUNT.MACROS after a transfer <strong>to</strong> a macro library gives a sense of what went on.<br />
SHOW.MACROS<br />
Optional:<br />
NAMES<br />
SHOW.MACROS can be used <strong>to</strong> display <strong>the</strong> currently activated macros. This prints <strong>the</strong> entire contents<br />
of <strong>the</strong> activated macros.<br />
SHOW.MACROS, NAMES $ can be used <strong>to</strong> list <strong>the</strong> names of <strong>the</strong> currently activated macros.
P-<strong>STAT</strong> MACROS 12.29<br />
FILE “fn”<br />
Name for an external file where <strong>the</strong> SHOW.MACRO command is <strong>to</strong> put its results.<br />
SHOW.MACRO, FILE “MyMacros” $<br />
MACRO.PAD nn<br />
is a command that specifies <strong>the</strong> padding default for macros activated subsequently. N can be zero (which<br />
set a no-pad status) or some larger integer, like 1 or 80.<br />
SUBFILES<br />
Required:<br />
begins a SUBFILES loop. The SUBFILES command can only be used within a macro.<br />
SUBFILES Myfile, BY County State, FREQUENCIES $ or<br />
SUBFILES Myfile, GROUPS Mygroups $<br />
SUBFILES fn<br />
provides <strong>the</strong> name of <strong>the</strong> P-<strong>STAT</strong> system file<br />
BY vn vn<br />
provides <strong>the</strong> names of <strong>the</strong> BY variables. Up <strong>to</strong> 15 BY variables may be cited. A SUBFILES loop is done<br />
for each subgroup that is defined by <strong>the</strong> different values of <strong>the</strong> BY variables. The groups are usually processed<br />
in <strong>the</strong> order in which <strong>the</strong>y occur in <strong>the</strong> input file.<br />
Optional Identifiers:<br />
DOWN<br />
EXACT<br />
specifies that <strong>the</strong> groups are <strong>to</strong> be organized in descending order of <strong>the</strong> BY group values or, if FRE-<br />
QUENCIES is used, by descending size.<br />
specifies that character variable must match not only in <strong>the</strong>ir spelling but also in <strong>the</strong> case of <strong>the</strong> characters<br />
<strong>to</strong> be considered as members of <strong>the</strong> same group.<br />
FREQUENCIES<br />
UP<br />
specifies that <strong>the</strong> groups are <strong>to</strong> be ordered by <strong>the</strong>ir frequencies. UP and DOWN can be used <strong>to</strong> control<br />
whe<strong>the</strong>r <strong>the</strong> largest or smallest group comes first.<br />
specifies that <strong>the</strong> groups are <strong>to</strong> be organized in ascending order of <strong>the</strong> BY group values or, if FREQUEN-<br />
CIES is also used, by ascending size.<br />
GROUPS fn<br />
provides <strong>the</strong> name of a file that was created by a previous LOCATE.GROUPS command. If GROUPS<br />
is used, none of <strong>the</strong> o<strong>the</strong>r identifiers can be used. The groups file contains all <strong>the</strong> relevant information.
12.30 P-<strong>STAT</strong> MACROS<br />
Subfiles System Variables<br />
.SUBFILEPASS.<br />
counts <strong>the</strong> number of times through <strong>the</strong> subfile loop.<br />
.SUBFILEMAX.<br />
<strong>the</strong> <strong>to</strong>tal number of iterations <strong>to</strong> be done. This is <strong>the</strong> same as <strong>the</strong> <strong>to</strong>tal number of groups.<br />
.SUBFILECASES.<br />
<strong>the</strong> number of cases in <strong>the</strong> current group.<br />
ENDSUBFILES<br />
ends a SUBFILES loop<br />
LOCATE.GROUPS<br />
Required:<br />
LOCATE.GROUPS reads a P-<strong>STAT</strong> system file and counts <strong>the</strong> number of cases in each of <strong>the</strong> subgroups<br />
that are defined by <strong>the</strong> BY variables. If <strong>the</strong> <strong>PPL</strong> deletes any of <strong>the</strong> cases, <strong>the</strong> OUT file should also be<br />
created and used with <strong>the</strong> GROUPS file in any subsequent SUBFILES commands.<br />
LOCATE.GROUPS Myfile [ IF Age LT 20, EXCLUDE ], OUT Myfile2<br />
GROUPS MyGroup, FREQUENCIES $<br />
LOCATE.GROUPS fn<br />
provides <strong>the</strong> name of <strong>the</strong> P-<strong>STAT</strong> system file<br />
BY vn vn<br />
provides <strong>the</strong> names of <strong>the</strong> BY variables. Up <strong>to</strong> 15 BY variables may be cited. A SUBFILES loop is done<br />
for each subgroup that is defined by <strong>the</strong> different values of <strong>the</strong> BY variables. The groups are usually processed<br />
in <strong>the</strong> order in which <strong>the</strong>y occur in <strong>the</strong> input file.<br />
Optional Identifiers:<br />
DOWN<br />
EXACT<br />
specifies that <strong>the</strong> groups are <strong>to</strong> be organized in descending order of <strong>the</strong> BY group values or, if FRE-<br />
QUENCIES is used, by descending size.<br />
specifies that character variable must match not only in <strong>the</strong>ir spelling but also in <strong>the</strong> case of <strong>the</strong> characters<br />
<strong>to</strong> be considered as members of <strong>the</strong> same group.<br />
FREQUENCIES<br />
specifies that <strong>the</strong> groups are <strong>to</strong> be ordered by <strong>the</strong>ir frequencies. UP and DOWN can be used <strong>to</strong> control<br />
whe<strong>the</strong>r <strong>the</strong> largest or smallest group comes first.
P-<strong>STAT</strong> MACROS 12.31<br />
GROUPS fn<br />
UP<br />
provides <strong>the</strong> name for an output file containing information about each subgroup. <strong>Inc</strong>luded are <strong>the</strong> frequencies<br />
for each subgroup and <strong>the</strong> locations of <strong>the</strong> first and last cases of <strong>the</strong> subgroup.<br />
specifies that <strong>the</strong> groups are <strong>to</strong> be organized in ascending order of <strong>the</strong> BY group values or, if FREQUEN-<br />
CIES is used, by ascending size.<br />
OUT fn<br />
provides <strong>the</strong> name for an output file which is <strong>the</strong> same as <strong>the</strong> input file after <strong>the</strong> <strong>PPL</strong>, if any, has been<br />
processed.<br />
INQUIRE.EXTERNAL<br />
Required:<br />
INQUIRE.EXTERNAL 'cs'<br />
provides <strong>the</strong> name of an external file in quotes. The results are returned in <strong>the</strong> system variable .XIN-<br />
QUIRE. which is set <strong>to</strong> one if <strong>the</strong> file is found and is zero if <strong>the</strong> file is not found.<br />
INQUIRE<br />
Required:<br />
INQUIRE fn<br />
provies <strong>the</strong> name of a P-<strong>STAT</strong> system file. The results are returned in <strong>the</strong> system variable .INQUIRE.<br />
which is set <strong>to</strong> one if <strong>the</strong> file is found and is zero if <strong>the</strong> file is not found.<br />
DIALOG<br />
DIALOG #Mon<br />
'-------------------------------------------------'<br />
'Enter <strong>the</strong> three letter abbreviation for <strong>the</strong> month'<br />
HELP 'Expected abbreviations include'<br />
'jan feb mar apr may jun jul aug sep oct nov dec' $<br />
The DIALOG command has a scratch variable and some number of lines of text enclosed in quotes. The<br />
scratch variable is required only if a reply is expected. Each line of text is displayed on a separate line<br />
on <strong>the</strong> terminal. The lines of text can contain scratch variables. If so, <strong>the</strong>ir current value is displayed.<br />
Optional HELP text is also part of <strong>the</strong> DIALOG command. This is not displayed unless <strong>the</strong> user requests<br />
it by entering ei<strong>the</strong>r "H" or 'HELP' in reply <strong>to</strong> <strong>the</strong> prompt. The keyword 'HELP' separates <strong>the</strong> normal<br />
DIALOG text from <strong>the</strong> HELP text.<br />
There are two mechanisms for examining a user reply. The first is <strong>the</strong> user reply which is s<strong>to</strong>red in <strong>the</strong><br />
DIALOG scratch variable. The second is a numeric system variable .RESPONSE. which contains a code<br />
indicating <strong>the</strong> type of <strong>the</strong> reply. .RESPONSE. is set each time DIALOG is executed. .RESPONSE. values<br />
are:<br />
negative: no response, or an invalid response:
12.32 P-<strong>STAT</strong> MACROS<br />
-2 = entirely blank<br />
-4 = H or HELP, but <strong>the</strong> dialog had no help text<br />
-6 = 'abc' for a numeric scratch variable, or such<br />
-8 = a scratch variable was not supplied<br />
-9 = in batch mode<br />
zero: <strong>the</strong> response was Q or QUIT<br />
positive: a valid response:<br />
1 = integer, like 1990<br />
2 = non-integer, like 3.1416<br />
11 = Y or YES<br />
12 = N or NO<br />
14 = character response o<strong>the</strong>r than yes/no/quit<br />
that is a legal p-stat name or label<br />
16 = o<strong>the</strong>r character response
i Index<br />
Symbols<br />
^ MATCHES meta-character 9.21<br />
? MATCHES meta-character 9.21<br />
? variable name wildcard 2.7, 2.20<br />
_ MATCHES meta-character 9.21<br />
- MATCHES meta-character 9.21<br />
- <strong>PPL</strong> numeric opera<strong>to</strong>r 2.9, 2.25<br />
* MATCHES meta-character 9.21<br />
* <strong>PPL</strong> numeric opera<strong>to</strong>r 2.9, 2.25<br />
** <strong>PPL</strong> numeric opera<strong>to</strong>r 2.9, 2.25<br />
*/ comment ending 2.20, 3.20<br />
/ <strong>PPL</strong> numeric opera<strong>to</strong>r 2.9, 2.25<br />
/* comment beginning 2.20, 3.20<br />
// concatenate 9.4, 9.31<br />
/// squeeze concatenate 9.5, 9.31<br />
\\ MATCHES meta-character 9.21<br />
& MACRO substitution 12.4<br />
&& concatenation of character constants<br />
9.23, 9.31<br />
# MATCHES meta-character 9.21<br />
# scratch variables 8.3<br />
+ MATCHES meta-character 9.21<br />
+ <strong>PPL</strong> numeric opera<strong>to</strong>r 2.9, 2.25<br />
< > MATCHES meta-character 9.21<br />
| MATCHES meta-character 9.21<br />
$ MATCHES meta-character 9.21<br />
0 + MATCHES meta-character 9.21<br />
0 1 MATCHES meta-character 9.21<br />
1 + MATCHES meta-character 9.21<br />
1 1 MATCHES meta-character 9.21<br />
SystemVariables<br />
.ALL. 3.8, 3.11, 11.14, 11.34<br />
.CDATE. 6.26<br />
.CHARACTER. 2.6, 6.17, 6.24<br />
.COLLECTIONS. 8.27, 8.30<br />
.COLLECTMAX. 8.27, 8.30<br />
.COLLECTMIN. 8.27, 8.30<br />
.COLLECTSIZE. 8.23, 8.27, 8.30<br />
.COLLECTSUM. 8.27, 8.30<br />
.CTIME. 6.26<br />
.DATE. 6.19, 6.24<br />
.e. 6.14, 6.24<br />
.FILE. 6.18, 6.24, 8.2<br />
.G. 2.12, 6.14, 6.24<br />
.HERE. 3.6, 6.16, 6.24<br />
.INQUIRE. 12.23, 12.31<br />
.M. 2.12, 6.15, 6.24<br />
system variable 2.24<br />
.M1., .M2., .M3. 6.15, 6.25<br />
.N. 3.6, 6.16, 6.25<br />
.NDATE. 6.19, 6.26<br />
.NEW. 2.6, 6.5, 6.25<br />
.NTIME. 6.19, 6.26<br />
.NUMERIC. 2.6, 6.17, 6.25<br />
.NV. 6.15, 6.25<br />
.ON. 2.3, 6.25<br />
.OTHERS. 2.6, 6.25<br />
.PAGE. 6.19, 6.25<br />
.PI. 6.14, 6.25<br />
.PUT. 3.9, 6.18, 6.25<br />
with system variables 8.11<br />
.RDATE. 6.26<br />
.RESPONSE. 12.21, 12.31<br />
.RTIME. 6.26<br />
.SUBFILECASES. 12.26<br />
.SUBFILEMAX. 12.26<br />
.SUBFILEPASS. 12.26<br />
.TIME. 6.19, 6.25<br />
.USED. 6.16, 6.26<br />
.XDATE. 6.26<br />
.XINQUIRE. 12.22, 12.31<br />
.XTIME. 6.26<br />
( ) MATCHES meta-character 9.21<br />
[ ] MATCHES meta-character 9.21<br />
PUT and TEXTWRITER Controls<br />
@<br />
in PUT and PUTL 3.9, 3.19<br />
in TEXTWRITER command 11.9, 11.33<br />
@ MATCHES meta-character 9.21<br />
@BEFORE<br />
in PUT and PUTL 3.19<br />
in TEXTWRITER command 11.10,<br />
11.33<br />
@COMMAS<br />
in PUT and PUTL 3.19<br />
in TEXTWRITER command 11.11,<br />
11.34<br />
@EQUAL<br />
in PUTL 3.11, 3.19<br />
in TEXTWRITER command 11.13,
Index ii<br />
11.34<br />
@INDENT<br />
in TEXTWRITER command 11.10,<br />
11.34<br />
@JUST<br />
in TEXTWRITER command 11.10,<br />
11.34<br />
@LABEL<br />
in PUT and PUTL 3.19<br />
in TEXTWRITER command 11.14<br />
@MINUS<br />
in PUT and PUTL 3.19<br />
in TEXTWRITER command 11.9, 11.35<br />
@MISS<br />
in PUT and PUTL 3.19<br />
in TEXTWRITER command 11.14,<br />
11.34<br />
@NEXT<br />
in PUT and PUTL 3.11, 3.13, 3.19<br />
in TEXTWRITER command 11.10,<br />
11.35<br />
@PAGE<br />
in PUT and PUTL 3.13<br />
in TEXTWRITER command 11.10,<br />
11.35<br />
@PARA<br />
in PUT and PUTL 3.19<br />
in TEXTWRITER command 11.10,<br />
11.35<br />
@PLACES<br />
in PUT and PUTL 3.19<br />
in TEXTWRITER command 11.12,<br />
11.35<br />
@PLUS<br />
in PUT and PUTL 3.19<br />
in TEXTWRITER command 11.8, 11.9,<br />
11.35<br />
@SKIP<br />
in PUT and PUTL 3.13, 3.19<br />
in TEXTWRITER command 11.8, 11.35<br />
@SPREAD<br />
in TEXTWRITER command 11.13,<br />
11.35<br />
@TRIM<br />
in PUT AND PUTL 3.19<br />
in TEXTWRITER command 11.10,<br />
11.35<br />
@WIDTH<br />
in TEXTWRITER command 11.10,<br />
11.36<br />
A<br />
ABS<br />
<strong>PPL</strong> function 6.2, 6.20<br />
Absolute value function 6.2<br />
ACOS<br />
<strong>PPL</strong> function 6.3, 6.20<br />
Add dates and times 10.21<br />
Add variables<br />
see GENERATE<br />
Addition opera<strong>to</strong>r + 2.9<br />
MATCHES meta-character 9.21<br />
ALL<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.23<br />
AMONG<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.15, 2.23, 9.31<br />
AND<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.13, 2.24<br />
ANY<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.24<br />
Arc cosine function 6.3<br />
Arc sine function 6.3<br />
Arc tangent function 6.3<br />
Arguments<br />
in a macro 12.3<br />
ARRAY Commands<br />
DEFINE.ARRAY 8.7<br />
DROP.ARRAY 8.7<br />
SHOW.ARRAYS 8.7<br />
summary 8.28<br />
Arrays<br />
multi-dimensional user-defined 8.1<br />
user defined 8.7<br />
ASIN<br />
<strong>PPL</strong> function 6.3, 6.21<br />
Asterisk<br />
MATCHES meta-character 9.21<br />
multiplication opera<strong>to</strong>r 2.9<br />
Asterisk, double<br />
exponentiation opera<strong>to</strong>r 2.9<br />
ATAN<br />
<strong>PPL</strong> function 6.3, 6.21
iii Index<br />
B<br />
Backslash<br />
MATCHES meta-character 9.21<br />
Bernoulli distribution 7.4<br />
Binary random number 7.1<br />
Binomial distribution 7.4<br />
inverse 7.5<br />
BLANK<br />
<strong>PPL</strong> function 9.10, 9.24<br />
BLANKS<br />
in TEXTWRITER command 11.7<br />
Brackets<br />
MATCHES meta-character 9.21<br />
BRANCH<br />
conditional execution 12.18<br />
BY<br />
in COLLECT function 8.20<br />
in LOCATE.GROUPS command 12.30<br />
in SUBFILES command 12.23, 12.29<br />
C<br />
C.TRANSPOSE 2.20<br />
CAPS<br />
<strong>PPL</strong> function 9.7, 9.25<br />
CARRY<br />
in COLLECT function 8.20<br />
in SPLIT function 8.13<br />
CASE<br />
in TEXTWRITER command 11.6<br />
CASES<br />
<strong>PPL</strong> instruction 2.1, 2.3, 2.21<br />
Ceiling function 6.2<br />
CENTER<br />
<strong>PPL</strong> function 9.25<br />
CHANGE<br />
<strong>PPL</strong> function 9.10, 9.25<br />
CHARACTER<br />
<strong>PPL</strong> function 9.13, 9.25<br />
Character constants<br />
concatenation with && 9.23, 9.31<br />
CHAREX<br />
<strong>PPL</strong> function 9.14, 9.26<br />
CHECK 1.3, 1.6<br />
Chi-square<br />
distribution 7.4<br />
inverse 7.5<br />
CLAG<br />
character function 6.8<br />
<strong>PPL</strong> function 9.23<br />
COLLECT<br />
<strong>PPL</strong> function 8.29<br />
BY option 8.20<br />
CARRY option 8.21<br />
COLLECT counter 8.20<br />
complex usage 8.22<br />
example 9.17<br />
INDEX option 8.21<br />
SORT option 8.21<br />
COMBINATIONS<br />
<strong>PPL</strong> function 6.11, 6.22<br />
Comments 3.20<br />
in <strong>PPL</strong> clauses 11.6<br />
within or between commands 3.14<br />
COMPARE 1.3, 1.6<br />
P-<strong>STAT</strong> system files 1.6<br />
COMPRESS<br />
<strong>PPL</strong> function 9.11, 9.26<br />
Concatenation<br />
of files on-<strong>the</strong>-fly 3.4<br />
opera<strong>to</strong>r // 9.4<br />
opera<strong>to</strong>r /// 9.5<br />
Conditional execution<br />
BRANCH 12.18<br />
CONTAINS<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.4, 9.32<br />
Control words<br />
in TEXTWRITER command 11.8<br />
COS<br />
<strong>PPL</strong> function 6.3, 6.20<br />
Cosine function 6.3<br />
COUNT.GOOD<br />
<strong>PPL</strong> function 6.7, 6.23, 9.3, 9.30<br />
COUNT.MACROS 12.11, 12.28<br />
CREATE<br />
in SPLIT function 8.14<br />
CURRENT<br />
<strong>PPL</strong> instruction 1.4, 1.6<br />
CURRENT.DATE function 10.4, 10.20<br />
CVAL<br />
<strong>PPL</strong> function 9.14, 9.26<br />
CYCLE<br />
in SPLIT function 8.17
Index iv<br />
D<br />
Data<br />
cleaning 3.8, 8.11<br />
DATE.LANGUAGE 10.14, 10.23<br />
DATE.ORDER 10.14, 10.24<br />
Dates 10.1–10.13<br />
adding 10.9, 10.21<br />
changing 10.11, 10.23<br />
difference between 10.12, 10.23<br />
extracting 10.10, 10.22<br />
logical opera<strong>to</strong>rs 10.16, 10.24<br />
simple functions 10.3<br />
DAY.MONTH.YEAR 10.3<br />
DAY.YEAR.MONTH 10.3<br />
MONTH.YEAR.DAY 10.4<br />
YEAR.DAY.MONTH 10.4<br />
YEAR.MONTH.DAY 10.4<br />
subtracting 10.9, 10.21<br />
DAY.WITHIN.WEEK function 10.7, 10.21<br />
DAY.WITHIN.YEAR function 10.7, 10.21<br />
DAYS<br />
<strong>PPL</strong> function 10.20<br />
Decimal places function 6.10<br />
DECREASE<br />
<strong>PPL</strong> instruction 2.8, 2.21<br />
DEFINE.ARRAY 8.7, 8.28<br />
DELETE<br />
<strong>PPL</strong> instruction 2.11, 2.21<br />
DES<br />
in MODIFY command 3.2<br />
DIALOG 12.19, 12.31<br />
DIF<br />
<strong>PPL</strong> function 6.8, 6.22<br />
DIF function 6.9<br />
Difference function 6.8<br />
Digit extraction function 6.10<br />
Distribution functions 7.4<br />
inverse 7.5<br />
Division opera<strong>to</strong>r / 2.9<br />
DO loops 5.1–5.11<br />
<strong>PPL</strong> instruction 5.22<br />
Double slash<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.4<br />
DOWN<br />
in LOCATE.GROUPS command 12.30<br />
in SUBFILES command 12.23, 12.29<br />
DROP<br />
<strong>PPL</strong> instruction 2.1, 2.3, 2.21<br />
DROP.ARRAY 8.7, 8.28<br />
DROP.P.VECTOR 8.29<br />
Dummy variables<br />
creating 6.4, 6.15<br />
recoding in<strong>to</strong> one variable 6.5<br />
E<br />
ECHO 12.14<br />
Econometrics<br />
LAG, DIF functions 6.8<br />
Enclosures<br />
in MATCHES opera<strong>to</strong>r 9.21<br />
ENDDO<br />
<strong>PPL</strong> instruction 5.23<br />
ENDIF<br />
<strong>PPL</strong> instruction 5.25<br />
ENDMACRO 12.1, 12.11, 12.28<br />
ENDSUBFILES 12.23, 12.23<br />
summary 12.30<br />
EQ<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.12, 2.23<br />
Escape characters<br />
in MATCHES opera<strong>to</strong>r 9.21<br />
Escape codes, passing 9.14<br />
Exact string comparisons 2.12<br />
EXITDO<br />
<strong>PPL</strong> instruction 5.22<br />
EXP<br />
<strong>PPL</strong> function 6.3, 6.20<br />
EXPAND<br />
<strong>PPL</strong> function 6.11<br />
Exponentiation function 6.3<br />
Exponentiation opera<strong>to</strong>r ** 2.9<br />
F<br />
F distribution 7.4<br />
inverse 7.5<br />
FACTORIAL<br />
<strong>PPL</strong> function 6.3, 6.20<br />
FILE<br />
in SHOW.MACROS command 12.11,<br />
12.29<br />
FILE.IN 1.1<br />
FILES 12.15
v Index<br />
Filtering a file using <strong>PPL</strong> 2.11<br />
FIRST<br />
<strong>PPL</strong> function 6.23, 8.2, 8.10, 8.29<br />
FIRST.GOOD<br />
<strong>PPL</strong> function 6.7, 6.23, 9.3, 9.30<br />
FISCAL.QUARTER function 10.6, 10.21<br />
FISCAL.YEAR function 10.6, 10.21<br />
Floor function 6.2<br />
FOLD<br />
in LIST command 2.20<br />
FONT<br />
in TEXTWRITER command 11.21,<br />
11.36<br />
FONT1-FONT9<br />
in TEXTWRITER command 11.21<br />
Fonts<br />
changing<br />
in TEXTWRITER 11.22<br />
FRAC<br />
<strong>PPL</strong> function 6.2, 6.20<br />
Fractional portion function 6.2<br />
FREQUENCIES<br />
in LOCATE.GROUPS command 12.30<br />
in SUBFILES command 12.24<br />
FULL.MACRO.ARGS 12.11, 12.28<br />
FUZZ<br />
command 7.8, 7.12<br />
Fuzzy arithmetic 7.6, 7.12<br />
G<br />
GENERATE<br />
in DO loop 5.13, 5.23<br />
<strong>PPL</strong> instruction 2.1, 2.9, 2.21, 9.1<br />
GOOD<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.12, 2.24, 9.32<br />
GOTO<br />
<strong>PPL</strong> instruction 3.7, 3.17<br />
GROUPS<br />
in LOCATE.GROUPS command 12.25<br />
in SUBFILES command 12.25<br />
GT opera<strong>to</strong>r 2.12<br />
H<br />
HEX<br />
<strong>PPL</strong> function 7.7, 7.11<br />
I<br />
IF<br />
<strong>PPL</strong> instruction 2.1, 2.11, 2.17, 2.21, 3.7,<br />
9.2<br />
IF-THEN-ELSE 5.14–5.18, 5.24<br />
INCREASE<br />
<strong>PPL</strong> instruction 2.8, 2.22<br />
INDEX<br />
in COLLECT function 8.21<br />
in SPLIT function 8.16<br />
INQUIRE<br />
determine existence of P-<strong>STAT</strong> system<br />
file 12.23, 12.31<br />
INQUIRE.EXTERNAL 12.22, 12.31<br />
INRANGE<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.24<br />
INT<br />
<strong>PPL</strong> function 6.2, 6.20<br />
Integer function 6.2<br />
interaction with IF 6.9<br />
INVBIN<br />
<strong>PPL</strong> function 7.5, 7.11<br />
INVCHI<br />
<strong>PPL</strong> function 7.5, 7.11<br />
Inverse probability functions 7.5<br />
binomial distribution 7.5<br />
chi-square distribution 7.5<br />
F distribution 7.5<br />
normal distribution 7.6<br />
Poisson distribution 7.6<br />
t distribution 7.6<br />
INVF<br />
<strong>PPL</strong> function 7.5, 7.11<br />
INVNORM<br />
<strong>PPL</strong> function 7.6, 7.11<br />
INVPOIS<br />
<strong>PPL</strong> function 7.6, 7.11<br />
INVT<br />
<strong>PPL</strong> function 7.6, 7.11<br />
IVAL<br />
<strong>PPL</strong> function 9.14, 9.27<br />
J<br />
JUSTIFY<br />
in TEXTWRITER command 11.7
Index vi<br />
K<br />
KEEP<br />
<strong>PPL</strong> instruction 2.1, 2.3, 2.5, 2.22<br />
L<br />
LABELS<br />
in TEXTWRITER command 11.8, 11.32<br />
Labels<br />
for <strong>PPL</strong> statements 3.7<br />
LAG<br />
<strong>PPL</strong> function 6.8, 6.22<br />
LAG function 6.9<br />
Lagging function 6.8<br />
LANDSCAPE<br />
in TEXTWRITER command 11.21,<br />
11.36<br />
LAST<br />
<strong>PPL</strong> function 6.23, 8.2, 8.10, 8.29<br />
LAST.GOOD<br />
<strong>PPL</strong> function 6.7, 6.23, 9.3, 9.30<br />
LE<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.12<br />
LEADBLANK<br />
in TEXTWRITER command 11.7, 11.32<br />
LEFT<br />
<strong>PPL</strong> function 9.27<br />
LEFT.EDGE<br />
in TEXTWRITER command 11.36<br />
LENGTH<br />
<strong>PPL</strong> function 9.8, 9.27<br />
LIST<br />
identifiers<br />
FOLD 2.20<br />
MAX.PLACES 6.10<br />
MIN.PLACES 6.10<br />
LOC<br />
<strong>PPL</strong> function 6.3, 6.20<br />
LOCATE.GROUPS 12.24<br />
identifiers<br />
BY 12.30<br />
DOWN 12.30<br />
EXACT 12.30<br />
FREQUENCIES 12.30<br />
OUT 12.26, 12.31<br />
UP 12.31<br />
summary 12.30<br />
Location function 6.3<br />
LOG<br />
<strong>PPL</strong> function 6.3, 6.20<br />
LOG10<br />
<strong>PPL</strong> function 6.3, 6.20<br />
Logarithm functions 6.3<br />
Logical opera<strong>to</strong>rs 2.23<br />
date/time 10.16<br />
LOWER<br />
<strong>PPL</strong> function 9.7, 9.27<br />
LPAD<br />
<strong>PPL</strong> function 9.27<br />
LRPAD<br />
<strong>PPL</strong> function 9.28<br />
LRTRIM<br />
<strong>PPL</strong> function 9.12, 9.29<br />
LT<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.12<br />
LTRIM<br />
<strong>PPL</strong> function 9.12, 9.29<br />
M<br />
MACRO 12.1, 12.11<br />
summary 12.27<br />
MACRO.PAD 12.11, 12.29<br />
Macros<br />
activating 12.2<br />
arguments 12.5<br />
default values 12.6<br />
keyword 12.4<br />
positional 12.4<br />
block 12.2<br />
executing 12.2<br />
calling o<strong>the</strong>r macros 12.7<br />
comments 12.3<br />
correcting in edi<strong>to</strong>r 12.11<br />
format of 12.1<br />
in stream 12.1<br />
in subcommands 12.8<br />
in <strong>the</strong> edi<strong>to</strong>r 12.11<br />
Scratch variable usage 12.14<br />
s<strong>to</strong>ring 12.2<br />
temporary files 12.15<br />
using RUN 12.14<br />
MAKE 1.1<br />
MAKE.CHARACTER 9.26
vii Index<br />
MAKE.DATE function 10.4, 10.20<br />
MAKE.NUMERIC 9.27<br />
MARGIN<br />
in TEXTWRITER command 11.7, 11.32<br />
MASK<br />
in case, variable selection 2.6<br />
in SPLIT instruction 8.15<br />
Masks<br />
Complex for GENERATE 5.13<br />
for RENAME and GENERATE 5.24<br />
MATCHES<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.18, 9.32<br />
meta-characters 9.20<br />
MAX<br />
<strong>PPL</strong> function 6.21<br />
MAX.GOOD<br />
<strong>PPL</strong> function 6.21<br />
MEAN<br />
<strong>PPL</strong> function 6.21<br />
MEAN.GOOD<br />
<strong>PPL</strong> function 6.21<br />
Meta-characters<br />
in MATCHES opera<strong>to</strong>r 9.20, 9.21<br />
MIN<br />
<strong>PPL</strong> function 6.21<br />
MIN.GOOD<br />
<strong>PPL</strong> function 6.21<br />
MISSING<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.12, 2.24, 9.33<br />
MISSING1, MISSING2, MISSING3<br />
<strong>PPL</strong> opera<strong>to</strong>rs 2.13<br />
MOD<br />
<strong>PPL</strong> function 3.6, 6.9, 6.22<br />
MODIFY 1.3, 1.6, 3.2<br />
identifiers<br />
DES 3.2<br />
OUT 3.2, 3.16<br />
TEMPLATE 3.3, 3.16<br />
summary 3.16<br />
Modular function 6.9<br />
MONTH.CASE 10.15, 10.24<br />
MONTH.LENGTH 10.24<br />
MONTH.NAMES 10.15, 10.24<br />
MONTH.YEAR.DAY 10.3<br />
Multiplication opera<strong>to</strong>r * 2.9<br />
MATCHES meta-character 9.21<br />
N<br />
NAMES<br />
in SHOW.MACROS command 12.28<br />
NCOT<br />
<strong>PPL</strong> function 4.1, 4.14, 6.22<br />
NE<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.12, 2.23<br />
NEAR<br />
<strong>PPL</strong> logical opera<strong>to</strong>r 7.8<br />
NEXTDO<br />
<strong>PPL</strong> instruction 5.23<br />
NO LEADBLANK<br />
in TEXTWRITER command 11.7<br />
NO SHOWPAGE<br />
in TEXTWRITER command 11.21<br />
NO SPREAD<br />
in TEXTWRITER command 11.7<br />
No-break character<br />
in TEXT.WRITER command 11.2<br />
Normal distribution 7.4<br />
inverse 7.6<br />
Normal random number 7.1<br />
NOTAMONG<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.15, 2.24, 9.33<br />
NOTNEAR<br />
<strong>PPL</strong> logical opera<strong>to</strong>r 7.8<br />
NTOKEN<br />
<strong>PPL</strong> function 9.10, 9.29<br />
NUMBER<br />
<strong>PPL</strong> function 9.13, 9.27<br />
NUMBER.E<br />
<strong>PPL</strong> function 9.13<br />
NUMBER.W<br />
<strong>PPL</strong> function 9.13<br />
NUMEX<br />
<strong>PPL</strong> function 6.10, 6.23<br />
O<br />
OR<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.24<br />
OUT<br />
in LOCATE.GROUPS command 12.26,<br />
12.31<br />
in MODIFY command 3.16<br />
in TEXTWRITER command 11.8, 11.32<br />
OUTRANGE
Index viii<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.25<br />
P<br />
P vec<strong>to</strong>r 1.2, 8.5<br />
PAD<br />
<strong>PPL</strong> function 9.27<br />
Paren<strong>the</strong>ses<br />
MATCHES meta-character 9.21<br />
Permanent vec<strong>to</strong>r<br />
see P vec<strong>to</strong>r<br />
Phrases<br />
<strong>PPL</strong> 2.1<br />
PLACES<br />
<strong>PPL</strong> function 6.10, 6.23<br />
Poisson distribution 7.4, 7.6<br />
PORTRAIT<br />
in TEXTWRITER command 11.20,<br />
11.36<br />
POSITION<br />
<strong>PPL</strong> function 9.8, 9.28<br />
Positional notation, variables 2.4, 2.8, 6.3<br />
POSTSCRIPT<br />
in TEXTWRITER command 11.20,<br />
11.36<br />
POSTSCRIPT.SETUP 11.21, 11.28<br />
<strong>PPL</strong> 1.1, 2.1, 3.1, 4.1, 6.1, 8.1, 9.1<br />
case, variable selection 2.1<br />
character variables 9.1<br />
comments 3.20<br />
concatenation, on-<strong>the</strong>-fly 3.4<br />
Date and time summary 10.20<br />
DO loops 5.1–5.11, 5.22<br />
exact comparisons of characters 2.12<br />
generating variables 2.7<br />
IF tests 3.7<br />
introduction 1.1<br />
logical selection 2.11, 2.17<br />
modifying variables 2.7, 3.1<br />
order of numeric opera<strong>to</strong>rs 2.10<br />
permanent vec<strong>to</strong>r 8.5<br />
phrases with <strong>PPL</strong> clauses 2.1<br />
scratch variables 8.3<br />
size constraints 2.3<br />
standalone commands 1.3, 1.6, 3.12, 3.17<br />
summary 1.5, 2.20, 3.16, 4.14, 5.22, 6.20,<br />
8.28, 9.24<br />
wildcard notation 2.6, 8.15, 8.24<br />
<strong>PPL</strong> command 1.6<br />
<strong>PPL</strong> Functions 2.10<br />
char and numeric<br />
COLLECT 8.19–8.27<br />
COUNT.GOOD 6.7, 9.3<br />
EXPAND 6.11<br />
FIRST 8.2, 8.10<br />
FIRST.GOOD 6.7, 9.3<br />
LAST 8.2, 8.10<br />
LAST.GOOD 6.7, 9.3<br />
RECODE 4.3<br />
SPLIT 8.12–8.19<br />
SPLIT * 8.24<br />
VARNAME 3.11<br />
XRECODE 4.5<br />
character 9.6<br />
BLANK 9.10<br />
CAPS 9.7<br />
CENTER 9.7<br />
CHANGE 9.10<br />
CHARACTER 9.14<br />
CHAREX 9.14<br />
CLAG 9.23<br />
COMPRESS 9.11<br />
CVAL 9.14<br />
IVAL 9.14<br />
LENGTH 9.8<br />
LOWER 9.7<br />
LPAD 9.12<br />
LRPAD 9.12<br />
LRTRIM 9.12<br />
LTOKEN 9.10<br />
LTRIM 9.12<br />
NTOKEN 9.10<br />
NUMBER 9.13<br />
NUMBER.E 9.13<br />
NUMBER.W 9.13<br />
PAD 9.12<br />
POSITION 9.8<br />
RIGHT 9.7<br />
RPAD 9.12<br />
RTOKEN 9.10<br />
RTRIM 9.12<br />
SIZE 9.8<br />
SUBSTRING 9.9
ix Index<br />
TOKEN 9.9<br />
TRIM 9.12<br />
UPPER 9.7<br />
VARNAME 9.17<br />
VERIFY 9.8<br />
XBLANK 9.10<br />
XCHANGE 9.10<br />
XPOSITION 9.8<br />
complex, nested 9.16<br />
Date/time<br />
ADD dates and times 10.9, 10.21<br />
CHANGE dates and times 10.11,<br />
10.22<br />
CURRENT.DATE 10.4<br />
DAY.MONTH.YEAR 10.3, 10.20<br />
DAY.WITHIN.WEEK 10.7, 10.21<br />
DAY.WITHIN.YEAR 10.7, 10.21<br />
DAY.YEAR.MONTH 10.3, 10.20<br />
DAYS 10.5<br />
DIF - compare dates and times 10.23<br />
Difference between dates/times 10.12<br />
EXTRACT dates and times 10.10,<br />
10.22<br />
FISCAL.QUARTER 10.6, 10.21<br />
FISCAL.YEAR 10.6, 10.21<br />
MAKE.DATE 10.4, 10.20<br />
MONTH.DAY.YEAR 10.20<br />
MONTH.YEAR.DAY 10.3, 10.4,<br />
10.20<br />
QUARTER 10.7, 10.21<br />
REFORMAT.DATE 10.5, 10.20<br />
SECONDS 10.6, 10.21<br />
SECONDS.MIDNIGHT 10.6, 10.21<br />
<strong>STAT</strong>US.DATE 10.5, 10.20<br />
SUBTRACT dates and times 10.9<br />
UNDO.DAYS 10.6, 10.21<br />
UNDO.SECONDS 10.6, 10.21<br />
WEEK.WITHIN.YEAR 10.7<br />
YEAR.DAY.MONTH 10.4, 10.20<br />
YEAR.MONTH.DAY 10.4, 10.20<br />
numeric<br />
ABS 6.2<br />
ACOS 6.3<br />
ASIN 6.3<br />
ATAN 6.3<br />
CEIL 6.2<br />
COMBINATIONS 6.11<br />
COS 6.3<br />
DIF 6.8<br />
EXP 6.3<br />
FACTORIAL 6.3<br />
FLOOR 6.2<br />
FRAC 6.2<br />
HEX 7.7, 7.11<br />
INT 6.2<br />
INVBIN 7.5<br />
INVCHI 7.5<br />
INVF 7.5<br />
INVNORM 7.6<br />
INVPOIS 7.6<br />
INVT 7.6<br />
LAG 6.8<br />
LOC 6.3<br />
LOG 6.3<br />
LOG10 6.3<br />
MOD 3.6, 6.9<br />
NCOT 4.1<br />
NUMEX 6.10<br />
PLACES 6.10<br />
PROBCHI 7.4<br />
PROBF 7.4<br />
PROBIN 7.4<br />
PROBIT 7.6<br />
PROBNORM 7.4<br />
PROBPOIS 7.4<br />
PROBT 7.5<br />
RANBIN 7.1<br />
RANNORM 3.7, 7.1<br />
RANTABLE 7.1<br />
RANUNI 7.1<br />
ROUND 6.2<br />
SIN 6.3<br />
SQRT 6.3<br />
STEP.DOWN 7.7, 7.12<br />
STEP.UP 7.7, 7.11<br />
STEPS 7.7, 7.12<br />
TAN 6.3<br />
<strong>PPL</strong> Instructions<br />
CASES 2.1, 2.3<br />
using ranges, TO 2.4<br />
with MASK 2.6<br />
CURRENT 1.4, 1.6
Index x<br />
DECREASE 2.8<br />
DELETE 2.11<br />
DO loops 5.1–5.11, 5.22<br />
DROP 2.1, 2.3<br />
using ranges, TO 2.4<br />
with MASK 2.6<br />
with wildcard 2.6<br />
ENDDO 5.23<br />
EXITDO 5.22<br />
GENERATE 2.1, 2.8, 5.23, 9.1<br />
GOTO 3.7<br />
IF 2.1, 2.11, 2.17, 3.7, 9.2<br />
missing data 2.18<br />
T F M prefixes 2.18<br />
IF-THEN-ELSE 5.24<br />
INCREASE 2.8<br />
KEEP 2.1, 2.3, 2.5<br />
using ranges, TO 2.4<br />
with MASK 2.6<br />
with wildcard 2.6<br />
NEXTDO 5.23<br />
PREVIOUS 1.3, 1.7<br />
PUT 3.8, 3.9, 11.2<br />
PUTL 3.11, 11.2<br />
QUITCOMMAND 3.14<br />
QUITFILE 3.14<br />
QUITRUN 3.14<br />
RENAME 2.19, 2.23, 5.23<br />
REPEAT 3.6, 7.2<br />
RETAIN 2.11<br />
SET 2.1, 2.7, 9.2<br />
<strong>PPL</strong> Opera<strong>to</strong>rs<br />
character 9.3<br />
// concatenate 9.4<br />
/// squeeze concatenate 9.5<br />
&& concatenation of character constants<br />
9.23<br />
CONTAINS 9.4<br />
MATCHES 9.18<br />
XAMONG 2.16<br />
XCONTAINS 9.4<br />
XEQ 2.12, 9.5<br />
XMATCHES 9.18<br />
XNOTAMONG 2.16<br />
logical 2.12<br />
ALL 2.16<br />
AMONG 2.15<br />
AND 2.13<br />
ANY 2.16<br />
DATE.EQ 10.16<br />
DATE.GE 10.16<br />
DATE.GT 10.16<br />
DATE.LE 10.16<br />
DATE.LT 10.16<br />
DATE.NE 10.16<br />
EQ 2.12, 7.7<br />
fuzzy 7.8<br />
GE 7.7<br />
GOOD 2.12<br />
GT 2.12, 7.7<br />
INRANGE 2.16<br />
LE 2.12, 7.7<br />
LT 2.12, 7.7<br />
MISSING 2.12<br />
NE 2.12, 7.7<br />
NEAR 7.8, 7.12<br />
NOTAMONG 2.15<br />
NOTNEAR 7.8, 7.12<br />
OR 2.13<br />
OUTRANGE 2.16<br />
numeric 2.9<br />
- 2.9<br />
* 2.9<br />
** 2.9<br />
/ 2.9<br />
+ 2.9<br />
<strong>PPL</strong> System Variables<br />
.ALL. 3.8, 3.11, 11.14, 11.34<br />
.CHARACTER. 2.6, 6.17<br />
.DATE. 6.19<br />
.e. 6.14<br />
.FILE. 6.18, 8.2<br />
.G. 2.12, 6.14<br />
.HERE. 3.6, 6.16<br />
.M. 2.12, 6.15<br />
.M1., .M2., .M3. 6.15<br />
.N. 3.6, 6.16<br />
.NDATE. 6.19<br />
.NEW. 2.6, 6.5<br />
.NUMERIC. 2.6, 6.17<br />
.NV. 6.15<br />
.ON. 2.3
xi Index<br />
.OTHERS. 2.6, 6.15<br />
.PAGE. 6.19<br />
.PI. 6.14<br />
.PUT. 3.9, 6.18, 8.11<br />
.REPEAT. 6.19<br />
.RESPONSE. 12.21<br />
.SUBFILECASES. 12.26<br />
.SUBFILEMAX. 12.26<br />
.SUBFILEPASS> 12.26<br />
.TIME. 6.19<br />
.USED. 6.16<br />
.XINQUIRE. 12.22<br />
PREVIOUS<br />
<strong>PPL</strong> instruction 1.3, 1.7<br />
Probability functions 7.4<br />
PROBBIN<br />
<strong>PPL</strong> function 7.4, 7.10<br />
PROBCHI<br />
<strong>PPL</strong> function 7.4, 7.10<br />
PROBF<br />
<strong>PPL</strong> function 7.4, 7.10<br />
PROBIT<br />
<strong>PPL</strong> function 7.6<br />
Probit distribution 7.6<br />
PROBNORM<br />
<strong>PPL</strong> function 7.4, 7.10<br />
PROBPOIS<br />
<strong>PPL</strong> function 7.4, 7.10<br />
PROBT<br />
<strong>PPL</strong> function 7.5, 7.10<br />
PROCESS 1.3, 1.6, 3.9, 3.13<br />
summary 3.17<br />
P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong><br />
see <strong>PPL</strong><br />
P-<strong>STAT</strong> system file<br />
previous or current version 1.3<br />
PUT<br />
in TEXTWRITER command 11.2<br />
<strong>PPL</strong> instruction 3.8, 3.9, 3.19<br />
PUTL<br />
in TEXTWRITER command 11.2<br />
<strong>PPL</strong> instruction 3.11<br />
PUTL.CHARS<br />
in TEXTWRITER command 11.32<br />
Q<br />
QUARTER function 10.7, 10.21<br />
QUITCOMMAND<br />
<strong>PPL</strong> instruction 3.14, 3.18<br />
QUITFILE<br />
<strong>PPL</strong> instruction 3.14, 3.18<br />
QUITRUN<br />
<strong>PPL</strong> instruction 3.14, 3.18<br />
R<br />
RANBIN<br />
<strong>PPL</strong> function 7.1, 7.9<br />
Random<br />
assignment 7.3<br />
data generation 7.1<br />
number functions 7.1<br />
sampling 7.2<br />
with replacement 7.2<br />
RANNORM<br />
<strong>PPL</strong> function 3.7, 7.1, 7.9<br />
RANTABLE<br />
<strong>PPL</strong> function 7.1, 7.9<br />
RANUNI<br />
<strong>PPL</strong> function 7.1, 7.9<br />
RECODE<br />
arguments 4.6<br />
complex 4.6<br />
exact matches with XRECODE 4.12<br />
<strong>PPL</strong> function 4.3, 4.14, 6.24<br />
tests 4.6<br />
REFORMAT.DATE function 10.5, 10.20<br />
RENAME<br />
<strong>PPL</strong> instruction 2.22, 5.23<br />
variables 2.19, 2.23<br />
REPEAT<br />
<strong>PPL</strong> instruction 3.6, 3.18, 7.2<br />
Report writing 3.9<br />
RETAIN<br />
<strong>PPL</strong> instruction 2.11, 2.22<br />
RIGHT<br />
<strong>PPL</strong> function 9.7, 9.28<br />
ROUND<br />
<strong>PPL</strong> function 6.2, 6.21<br />
Rounding function 6.2<br />
RTOKEN<br />
<strong>PPL</strong> function 9.10, 9.29
Index xii<br />
RTRIM<br />
<strong>PPL</strong> function 9.12<br />
RUN 12.2, 12.11, 12.28<br />
S<br />
Scratch variables 1.2, 3.12, 8.3<br />
in SUBFILES command 12.23<br />
SDEV<br />
<strong>PPL</strong> function 6.21<br />
SDEV.GOOD<br />
<strong>PPL</strong> function 6.22<br />
SECONDS function 10.6, 10.21<br />
SECONDS.MIDNIGHT function 10.6, 10.21<br />
SET<br />
<strong>PPL</strong> instruction 2.1, 2.7, 2.22, 9.2<br />
SHOW.ARRAYS 8.7, 8.28<br />
SHOW.MACROS 12.11, 12.28, 12.29<br />
identifiers<br />
FILE 12.11, 12.29<br />
NAMES 12.28<br />
SHOWPAGE<br />
in TEXTWRITER command 11.21,<br />
11.37<br />
SIN<br />
<strong>PPL</strong> function 6.3, 6.21<br />
Sine function 6.3<br />
SIZE<br />
<strong>PPL</strong> function 9.8, 9.28<br />
Size constraints<br />
<strong>PPL</strong> modifications 2.3<br />
Slash<br />
division opera<strong>to</strong>r 2.9<br />
Slash, back<br />
in MATCHES opera<strong>to</strong>r 9.21<br />
Slash, double<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.4<br />
Slash, triple<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.5<br />
SORT<br />
in COLLECT function 8.21<br />
SPLIT<br />
example 9.18<br />
<strong>PPL</strong> function 8.30<br />
CARRY option 8.13<br />
CREATE option 8.14<br />
CYCLE option 8.17<br />
INDEX option 8.16<br />
SPLIT * 8.24<br />
STEP option 8.17<br />
USE option 8.14<br />
SPREAD<br />
in TEXTWRITER command 11.7, 11.32<br />
SPSS.IN 1.1<br />
SQRT<br />
<strong>PPL</strong> function 6.3, 6.21<br />
Square root function 6.3<br />
Standalone <strong>PPL</strong> commands 1.3, 1.6, 3.12,<br />
3.17<br />
<strong>STAT</strong>US.DATE function 10.5, 10.20<br />
STEP<br />
in SPLIT function 8.17<br />
STEP.DOWN<br />
<strong>PPL</strong> function 7.7, 7.12<br />
STEP.UP<br />
<strong>PPL</strong> function 7.7, 7.11<br />
STEPS<br />
<strong>PPL</strong> function 7.7, 7.12<br />
STREAM<br />
in TEXTWRITER command 11.7, 11.32<br />
SUBFILES 12.23, 12.23<br />
identifiers<br />
BY 12.23, 12.29<br />
DOWN 12.23, 12.29<br />
EXACT 12.29<br />
FREQUENCIES 12.24, 12.29<br />
GROUPS 12.25, 12.29<br />
UP 12.23, 12.29<br />
use of scratch variables 12.23<br />
SUBSTRING<br />
<strong>PPL</strong> function 9.9, 9.28<br />
SUBTRACT dates and times 10.21<br />
Subtraction opera<strong>to</strong>r - 2.9<br />
Subtraction sign<br />
in MATCHES opera<strong>to</strong>r 9.21<br />
SUM<br />
<strong>PPL</strong> function 6.22<br />
SUM.GOOD<br />
<strong>PPL</strong> function 6.22<br />
System files<br />
previous or current version 1.3<br />
template 3.3<br />
System variables 6.1
xiii Index<br />
.M. 2.24<br />
T<br />
t distribution 7.5<br />
inverse 7.6<br />
Tabled random number 7.1<br />
TAN<br />
<strong>PPL</strong> function 6.3, 6.21<br />
Tangent function 6.3<br />
TEMPLATE<br />
in MODIFY command 3.3, 3.16<br />
TEXTFILE.IN 1.1<br />
TEXTWRITER 1.3, 1.6, 11.1<br />
comments 11.6<br />
control words 11.8<br />
@ 11.9, 11.33<br />
@BEFORE 11.10, 11.33<br />
@BLACK 11.28<br />
@BLUE 11.28<br />
@BOTTOM 11.27<br />
@CINCH 11.24<br />
@CINCH.U 11.25<br />
@COMMAS 11.11, 11.34<br />
@DOWN 11.27<br />
@DRAW.BOX 11.26<br />
@DRAW.H 11.26<br />
@DRAW.U 11.26, 11.38<br />
@DRAW.V 11.26<br />
@EQUAL 11.13, 11.34<br />
@FLUSH 11.26<br />
@FONT1-@FONT9 11.22, 11.37<br />
@GREEN 11.28<br />
@INDENT 11.10, 11.34<br />
@JUST 11.10, 11.34<br />
@L.MARGIN 11.27<br />
@LABEL 11.14<br />
@LEADING 11.27<br />
@LINCH 11.25<br />
@LINCH.U 11.25<br />
@LINEWIDTH 11.27<br />
@MINUS 11.9, 11.35<br />
@MISS 11.14, 11.34<br />
@MOVETO 11.26<br />
@NEXT 11.10, 11.35<br />
@NOCOLOR 11.28<br />
@NOUNDERLINE 11.29<br />
@ORANGE 11.28<br />
@PAGE 11.10, 11.35<br />
@PARA 11.10, 11.35<br />
@PINCH 11.25<br />
@PINCH.CHAR 11.26<br />
@PINCH.U 11.26<br />
@PLACES 11.12, 11.35<br />
@PLUS 11.8, 11.9, 11.35<br />
@R.MARGIN 11.27<br />
@RED 11.28<br />
@RINCH 11.25<br />
@RINCH.U 11.25<br />
@SKIP 11.8, 11.35<br />
@SPREAD 11.13, 11.35<br />
@TOP 11.27<br />
@TRIM 11.10, 11.35<br />
@UNDERLINE 11.29<br />
@UP 11.27<br />
@VIOLET 11.28<br />
@WIDTH 11.10, 11.36<br />
@X1 11.26<br />
@X2 11.26<br />
@Y1 11.26<br />
@Y2 11.26<br />
@YELLOW 11.28<br />
identifiers<br />
BLANKS 11.7, 11.32<br />
BOTTOM.EDGE 11.21, 11.36<br />
CASE 11.6, 11.32<br />
FONT 11.21, 11.36<br />
FONT1-FONT9 11.21, 11.36<br />
JUSTIFY 11.7, 11.32<br />
LABELS 11.8, 11.32<br />
LANDSCAPE 11.21, 11.36<br />
LEADBLANK 11.7, 11.32<br />
LEFT.EDGE 11.21, 11.36<br />
MARGIN 11.7, 11.32<br />
NO LEADBLANK 11.7<br />
NO SHWPAGE 11.21<br />
NO SPREAD 11.7<br />
OUT 11.8, 11.32<br />
PORTRAIT 11.20, 11.36<br />
POSTSCRIPT 11.20, 11.36<br />
PUTL.CHARS 11.32<br />
RIGHT.EDGE 11.21<br />
SHOWPAGE 11.21, 11.37
Index xiv<br />
SPREAD 11.7, 11.32<br />
STREAM 11.7, 11.32<br />
TOP.EDGE 11.21, 11.37<br />
WIDTH 11.8, 11.33, 11.36<br />
justification 11.2<br />
no-break character 11.2<br />
PUT instructions 11.2<br />
summary 11.31<br />
Time functions 10.1–10.13<br />
TITLES<br />
system variables, use of 6.26<br />
TOKEN<br />
<strong>PPL</strong> function 9.9, 9.10, 9.29<br />
TOP.EDGE<br />
in TEXTWRITER command 11.37<br />
TRIM<br />
<strong>PPL</strong> function 9.29<br />
Triple slash<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.5<br />
U<br />
UNDO.DAYS function 10.6, 10.21<br />
UNDO.SECONDS function 10.6, 10.21<br />
Uniform random number 7.1<br />
UP<br />
in SUBFILES command 12.23<br />
UPPER<br />
<strong>PPL</strong> function 9.7, 9.30<br />
USE<br />
in SPLIT function 8.14<br />
using /* and */ 3.20<br />
V<br />
V vec<strong>to</strong>r 1.2<br />
Variables<br />
3 types 1.2<br />
accessing names 3.11<br />
across command (P Vec<strong>to</strong>r) 8.5<br />
across-case (scratch) 8.3<br />
generating names 5.8<br />
in P-<strong>STAT</strong> file 1.2, 1.5<br />
positional notation 2.4, 2.8, 6.3<br />
reordering 2.5<br />
scratch 1.2, 1.5<br />
system 1.2, 1.5<br />
VARNAME<br />
<strong>PPL</strong> function 9.17, 9.30<br />
Vec<strong>to</strong>rs<br />
dynamic 1.3, 1.5<br />
P 1.2, 1.5<br />
V 1.2, 1.5<br />
VERIFY<br />
<strong>PPL</strong> function 9.8, 9.30<br />
W<br />
WEEK.WITHIN. YEAR function 10.7<br />
WEEKDAY.CASE 10.15, 10.24<br />
WEEKDAY.LENGTH 10.24<br />
WEEKDAY.NAMES 10.15, 10.24<br />
Weighting<br />
integer 3.7<br />
WIDTH<br />
in TEXTWRITER command 11.8, 11.33,<br />
11.36<br />
Wildcard<br />
in MATCHES meta-characters 9.19<br />
in <strong>PPL</strong> instructions 8.15, 8.24<br />
in SPLIT instruction 8.15<br />
in variable selection 2.6<br />
X<br />
XAMONG<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.24, 9.3, 9.32<br />
XBLANK<br />
<strong>PPL</strong> function 9.10, 9.24<br />
XCHANGE<br />
<strong>PPL</strong> function 9.10, 9.25<br />
XCONTAINS<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.4, 9.32<br />
XEQ<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.12, 2.23, 9.2, 9.5, 9.32<br />
XGE<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.3<br />
XGT<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.3<br />
XLE<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.3<br />
XLT<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.3<br />
XMATCHES<br />
<strong>PPL</strong> opera<strong>to</strong>r 9.18, 9.33<br />
XNE
xv Index<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.23, 9.3<br />
XNOTAMONG<br />
<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.24, 9.3, 9.33<br />
XPOSITION<br />
<strong>PPL</strong> function 9.8, 9.28<br />
XRECODE 4.12<br />
<strong>PPL</strong> function 4.5, 4.15